Samsung 980 PRO 2TB M.2 2280 NAND Drive Failures



TLDR: Samsung is requiring customers who paid a premium for high-end storage devices and which are now failing due to defect into an unusable but still readable state to send in their drives with sensitive data in the clear in order to have their product replaced.

-=-=-=-=-=-=-

THE FAILURE AND TROUBLESHOOTING

A little over a year ago, when the Samsung 980 PRO 2TB m.2 drives went on sale on NewEgg, I bit the bullet and upgraded from my 1TB Intel 2280. These Samsung drives had great reviews and the performance was objectively superior to what was then my system drive.


That's a $420 (plus tax and shipping) drive kids.


And for a time, it was great - truly high performance. 

Then just 13 months later, suddenly - as computer problems are oft to be - while in the middle of my daily operations the drive just up and failed. My apps froze and I got a blue screen, something I hadn't seen in my home in literally years.

Ruh-roh.



It wouldn't boot. I still had the other Intel drive installed as a separate Win10 install, but couldn't get to that either. The BIOS alternated between seeing the drives and not seeing the drives at all. It almost seemed as if part of the PCI bus on my motherboard had stopped being able to work m.2 drives, which is really strange because they're basically piped right into the PCI bus - which is why they're so fast - but my GPU, which is also obviously piped into the PCI bus, was still operational. 

What are the chances that two drives fail simultaneously? Zero.

I pulled the Intel drive, as it was the most accessible - the Samsung being conveniently positioned underneath both the video card and the heat sink for the CPU - and ran it over to another machine. 

In hindsight I realized that this was a mistake, that I should have tested both, but I reasoned at the time that the Samsung was only a year old and since both drives were inaccessible in the bios, if I could test either of them and it worked elsewhere, then it was a problem local to that machine.

And, it mounted. Thus, strange as it was, something had apparently gone awry with how the main board handles the m.2 slots. Solar flares maybe, or quantum bit flipping.

Well that sucks because once you've gotten used to working on a system that uses NAND storage, even going back to plain old SATA SSD is painfully slow.

I wasn't too sad as I keep frequent backups on a TrueNAS array I run in my basement, it was really more of an inconvenience to have to source a new motherboard and move everything over and re-install the OS and get everything set back up like how I like it. I run a lot of really huge software packages on my workstation so even with gigabit fiber and high performance computing equipment, it takes some time to install and the devices are finicky and take some finessing to get talking to each other nicely. 

Well, I decided rather than replacing the motherboard on a 4 year old system at a cost greater than that of a current generation model I'll just stuff a plain-jane 2.5" SATA SSD into it to be re-purposed in a less demanding role and build a whole new system for myself; the i9-9900k was getting a little long in the tooth anyway and DDR5 prices are finally reasonable. 

So that's what I did, repurposing the old 9th gen as the security system and built myself a 13th gen system, the parts trickling in over the next few weeks. 



What's better than a motherboard with two m.2 drives? How about FIVE! We're upgrading after all, let's do this up. So I chose a board with five m.2 slots, the prices having dropped on the format to the point that I had no intention of installing any SATA drives at all. I've got about 70TB of storage in ZFS on my TrueNAS server and a bunch more on my Dell R730 so why shouldn't I have the fastest storage possible on board and leave all the slow-poke bulk storage to the spinner array. 

I also picked up an extra 2TB WD Black that was on sale - for about $150, almost a third of what I paid for the Samsung - with the intention of eventually grabbing some 4TB m.2's once they come down in price, and having eventually collected all the necessary parts, got to work assembling my new beast. 

I've built a lot of computers over the years. The first computer I built was a 486 DX-33, back when that was pretty much the fastest thing you could build. I enjoy it. It's strangely Zen to me.

So imagine my dismay when I go to power it up and after checking the bios to make sure XMP profiles are loaded and the temps were stable (ie the CPU heatsink was properly seated and wasn't going to fry the chip), and goddamn Windows would not install.



Pardon me? Just go ahead and blow away the old partition and do a fresh install.



I've run into a lot of stop codes over the years. I can tell you exactly what is wrong for a dozen or so 0x800's. But what the heck is a 0x80042428? What do you mean you can't erase partitions? 

I AM CURSED WTF IS GOING ON HERE?

Back to troubleshooting drives I guess, but this time I tested them one at a time - all of them. And wouldn't you know, it installed just fine to the new WD Black drive when the Samsung wasn't installed. So I got it all set up and started adding drives back in. The Intel worked. And then... and then the Samsung worked?

Imagine my surprise when, after all that, the damn thing mounted. I could see my whole C drive from my old system, despite having repeatedly blown away the partitions.



EaseUS was reporting 98% health .. and yet... "Bad". Why though? DISKMGMT could see the drive, see the partitions, but was reporting "READ ONLY". 

I was conferring with a friend when he sent me this: Samsung Issues Fix for Dying 980 Pro SSDs 



Lo and behold, this isn't some random failure, this is a known issue with my exact drive running firmware 3B2QGXA7 ... the exact version that was running on mine.


I installed their "Magic Drive" tool in a desperate attempt to abort the fail, but it was all too little too late. Having not been notified that this particular piece of internal hardware required a critical update to avoid failure during regular use, the boat had been missed and the drive had flipped irreversibly into READ ONLY mode.

Well that explains it. 

The Samsung drive started to fail, as so many of this model have, and it did so in such a way that it confused the bios into not being able to deal with any of the SSDs installed, thus leading me to the (incorrect) conclusion that it was something to do with the PCI/m.2 bus on the i9 and building a whole new computer around the premise that it wasn't worth replacing the mobo on an older computer just to have m.2 access.

Well, at least I got a spanky new computer. It was time to upgrade anyway, I use my workstation all day every day and everything I do is incredibly demanding so it's a worthwhile investment for me. 

I do love a conclusive explanation as to why something has stopped working though. Usually I resort to "solar flares" or "quantum bit flipping" but to know that this was actually a known problem with an entire series of chips produced gives me some sense of closure, like that time I had to RMA an iMac because the capacitors on the logic board were victims of the capacitor plague - both times actually, as they replaced the board with another board whose caps popped a year and a bit later. 

Let's be clear - this is a manufacturer defect

Just like the caps in that old iMac, the failure is a result of the manufacturer's practices and not misuse by the consumer. It could be a perfectly reasonable human or robot error, it could have been accidental contamination in the facility, it could be from trying to cheap out on components or process or sloppy coding in the firmware... hey I get it: shit happens. 

But regardless of what the root cause was, it follows from something that happened to the chip before it was sold to the customer.




SO IT REALLY BEGINS


Samsung advertises a 5 year warranty on these devices and since it was only released in September of 2020 and it was only now 14 months since the date of purchase and since that the failure was part of a well documented manufacturer fault, I was confident that I wasn't going to just have to throw it in the trash and call it a loss.

First of all, contacting their support is hilariously difficult. Check this out: if you visit https://www.samsung.com/ca/support/contact/#contactinfo and click on "Chat Support" it is self-referential. That is to say, the link to connect to chat support points back at its own page that offers to connect to chat support.




Fine, let's try E-Mail support. You can play along with me if you like: https://www.samsung.com/ca/support/emailsupport/

In order to email support you have to select your product group, type and subtype. Thinking that "Storage" would be the appropriate option, I selected it but only found "Optical Disk Drive" as an option from the drop down. 



Well fine. I poke around and find under PC & Office > SSD > SSD .. but then the Model Number never populates with anything, and if you just go ahead and fill out the rest anyway, as indicated by the red * you will see they require you to select a model from the list that does not populate and can not be manually filled in.



LOL. It's been like this at least since I've been trying to use it. Does their service department not notice that they get exactly ZERO chat or email requests and just think "well this job is easy"?

So I waited until the next day to call as it was after call center hours and then sat through the requisite "did you know you can talk to chat or email support any time 24/7" while yelling at the IVR "NO YOU CAN NOT" and eventually got to talk to a person, who after explaining the issue transferred me to another person, who took all the details, said he was escalating it and that I would be contacted back in two business days.

Four business days later, having not heard from them, I thought I should try and see if I could just get a replacement through Newegg and avoid having to deal with Samsung altogether, and the automated order warranty system very quickly said "sure here's a mail label send it in to us"... 

... and that's when I realized where the problem with this was going to come in.

You see, the obvious problem now is that this drive was the C drive for my daily driver, and now it is in a "data preservation" mode which will prevent me from writing any data to the disk.

Ostensibly, this is designed to be a good thing: in ye olde days, when a storage medium failed it just died and turned into a doorstop, all your data evaporated into the ether and you cursed yourself for not keeping regular backups. There's a two year gap in my children's baby photo timeline because this exact thing happened to me. So if you've got some irreplaceable baby photos or the plans for your next invention or an art project you've spent days working on and not backed up anywhere, having it fault into a READONLY state rather than just bursting into flames is great because at least then you can get your data.

However, it also means that you are unable to erase any data, as that would entail writing changes to the disk, marking those used sectors as available. Unable to securely erase the data, as is the prescribed best practice for storage media containing sensitive data, by overwriting with random data.

Your shit is just sitting there, unencrypted, and there's no way to secure it.

It's like a CD-ROM that's had the session closed - Read Only Memory.

Oh sure, Windows will complain and say "you're not the owner" if you mount and try to access it, but that is very easily circumvented by accessing it through Linux, which won't care about silly things like NTFS file permissions. If someone were so inclined, they could just take a drive image and sort through it at their leisure.

So this poses a real problem when they say "just mail it to us and we'll replace it".

The drive which was your main OS drive. Your Desktop, Photos, Documents, Downloads. Your browser cache and cookies. Spreadsheets with confidential business information and healthcare documents you scanned to send to your provider and your bank statements and tax returns. 

All the things that you know you need to keep safe and not just literally send in the mail.

Six days after Samsung told me they'd call me in two days, I called back and said "Hey, why haven't I gotten a call back?" I think the support department is very small because it was the same guy I'd talked to the week before. He said he was confused and re-escalated it and would definitely call me back. A couple hours later, Newegg calls me to say "hey you did an RMA request but haven't sent it and now Samsung is bugging us to see why we won't take it and we will you just need to send it."

The gentleman I spoke to very quickly understood the issue - C drive, unencrypted, frozen in read only state, information security, yadda yadda. He was like "Ok cool, you call Samsung and deal with them because our hands are tied, you really have to send it to us to get a replacement and I can see why you don't want to do that."

So I called Samsung back and talked to the same fellow again and tried to explain to him and he soon transferred me to his manager. 

We went back and forth on this. I have provided them all the documentation I've displayed here: here is my invoice showing that I purchased the drive (now) 14 months ago (more on why this itself is bullshit later). Here is SSD-Z showing the serial number and firmware it's running. Here's a photo of the physical drive showing the same serial number. Here is your own firmware flashing tool stating that it is running the version of the firmware which is known to be faulty as well as reporting that has failed to write the updated firmware as a result of the drive having entered a READONLY state.

"Sure, just send us the drive and we'll erase the drive and then you can send it to Newegg for your RMA replacement."

Again, how am I supposed to know that my data is going to be safe?

"I pinky promise that nobody's going to go through the drive." No matter how I explained to him what a serious security risk it is - and honestly, what a liability risk it is for Samsung to be requesting, nay, demanding and then taking receipt of such data - he just kept repeating that I had nothing to worry about, it wouldn't be going to any "third parties" (indicating an understanding that providing personal data to third parties is a security issue), and that nobody would go through the drive. 

Almost as if I were silly to be concerned about such a thing.

I tried to explain to him that, in fact, Samsung is a third party, but he insisted that they super duper promised that nobody would dig in. He even put it in writing for me:



Ok, let's take for a given just for the sake of argument that nobody at Samsung will have a voyeuristic or blackhat streak - and by the way, that is a legitimate attack vector, some blackhats will take positions in places where they get access to drives like this, either in repair centers or recycling depots, specifically so they can dig through drives for information that could be adventageous. But I digress. Let's just assume that Samsung has a really high standard of data custody.... what happens if someone along the way during delivery steals the drive? Can Samsung promise that nobody at Canada Post swipes packages? That's just objectively untrue, it does happen.

Here's the thing: I get it. Sorta. They want to make sure that the sneaky consumer isn't trying to pull a fast one on them and scam them out of a free drive. 

Fair enough. 

I offered that I could take a video of the drive, showing the serial number on the face of it and then me destroying it so that they knew I wasn't making off with two drives for the price of one. "I'll send it to you after I drill a hole in it" I offered, however, they declined these offers as that would constitute physical damage which is not covered, and they will indeed need to connect to the drive to verify its identity and the issue (while definitely not looking around, pinky swear), and so continued to insist that I need not worry and just send the drive in its current state and they will send me a replacement. 

After someone has plugged it in. 

But definitely not to look around.

I have a background (and indeed certifications) in information security and digital forensics, so besides being much more educated than the average bear on the sensitivity of this kind of data and the ease with which it can be harvested, I also work in a professional capacity in this field. Some of my clients are attorneys and part of my duties for them is receiving disclosures from the prosecutors office which contain photographs of crime scenes, witness interviews, surveillance footage... basically everything that goes into a criminal case. Extremely sensitive data. 

Normally I would take receipt of this evidence, which is sent encrypted, and to then decrypt it, present it to counsel, and then store it securely for later retrieval. 

As it happened, the workstation went down while I was in the middle of my work day, and some of those unencrypted disclosures are sitting on that drive. I would usually move the day's open files to an encrypted container kept on a server at the end of the day, but the irrecoverable fault occurred in the afternoon and and left the files permanently written to the device.

Here's a sample of the notice that is sent with every single disclosure I receive:



Take special note of sections C) and E) :

c) You must not copy or provide access to the material to any person or counsel who is not acting under your supervision..

e) You must not permit access to the material by or copy it for any person other than those referred to in the conditions b), c) and d) above, without prior written consent of the Crown or a court order.

If I were to send the drive as it is, regardless of any pinky promises made, I would in fact be permitting access to the material to anyone who happened to come into possession of the drive and would therefor have committed an offense. A chargable offense. It wouldn't even matter if anyone actually did open the drive and poke around or not, it's the mere principal that I had provided access.

Data confidentiality is a major factor in information security - it's the "C" in the CIA Triad: Confidentiality, Integrity, Availability. 

Which is why I'm so astounded that Samsung, a manufacturer of storage media, holds the devout position that these drives - which fail in such a way as to freeze data in a read only state - be sent to them in the mail in order to honor their warranty agreement. 

I happen to work in legal matters, but what of the doctor who had patient records open at the time of failure? It would be a violation of HIPAA. What if it was an accountant who had been processing payments for a business? That would be a violation of PCI DSS. 

What if it was just a regular person who doesn't know anything about any of this but whose data is no less important to them? 

When I explained this to the manager he finally seemed to understand and said that he would escalate it further. But why are doctors and lawyers (eventually) being offered alternatives to sending their data in order to not be scammed out of warranty service by Samsung, whose product has failed long before it rightfully ought to have?

Their entire process is incredibly backwards and unreasonably arduous. The self-referential chat support links and broken email support form. The need for customers to call and be transferred repeatedly as the issue gets escalated.

As I explained to gentlemen I was speaking to, this is a known issue. 

Just Google it.

And then, the fact that you have to provide your receipt to get warranty. Know how you can tell I bought it? BECAUSE I HAVE IT. Here's a picture of it. Here's the serial number. It was obviously purchased because it exists here, and since the devices only came out 2.5 years ago, regardless of how anyone came by it, it is obviously within the 5 year Warranty period so why would they require the proof of purchase for any reason other than to weed out people who don't keep good enough records and ultimately screw them out of a replacement they rightfully are owed?

Mine is not some outlier case, there are literally thousands of these drives deployed in the wild that have already or are going to fail. Why is Samsung support apparently unaware of the issue when it is so widely reported and clearly documented? 

Why do they seem so .. unprepared?

It is either the case that Samsung is so incompetent and will expect their customers to fumble through their nightmare of support channels before wasting the time of phone support personnel before being ordered to send their PII into the ether on a wing and a prayer, contrary to all known information security practices... or, they're doing this strategically.

Perhaps their strategy is to just make it as painful as possible so as few as possible make it to the end zone and actually get the defective device replaced.

Seems like an extremely stupid gambit for a company to make. I for one know I won't buy another Samsung SSD, and stories like mine are all over the internet. A friend already told me that he'd avoided the Samsung SSDs specifically because he'd heard about this.

Why wouldn't they get out in front of this, make a statement: 

"We know there's a problem so here's the link to the software to try the fix and if the fix doesn't work then just fill out this form with your serial number and a screenshot of the failed update along with your shipping address and we'll send you a replacement"?

Alas, such honor among corporations exists only in my fantasies.

At this time I'm still waiting to hear what Samsung comes back with. I may just degauss the drive with a speaker magnet and send it back. Depends on what they say, I will update this when they do.


Comments

Popular Posts