Another Netgate with storage failure, 6 in total so far
-
@ablephri come on, post the video of the 1500W heat gun that was used, like you did on reddit.

note to others in this thread: I'm working with him off-thread to get him a replacement in exchange for his broken unit.
-
Does the external LED start orange then change to blue?
The diag LEDs look like a normal POST.
Did you try both console ports?
-
To generally address this thread:
A larger flash device is only part of the solution. Netgate moved the entire product line to a larger eMMC device running in pSLC mode much earlier this year, without a price increase.
The larger pSLC isn't because there are more flash sectors to write to as much as it is to compensate for running the eMMC in pSLC mode, and yield the same or larger effective capacity. So now you'll see (for example) the Storage capacity on the 6100 Base as
Storage: 21.3 GB eMMC
This is really a 64GB eMMC running in pSLC mode.
Running the eMMC in pSLC mode provides a write endurance far beyond just the capacity increase of the new eMMC.
I'll try to write up something more ... technical to explain why soon.
Now I see a number of people desoldering the eMMC to try to recover. I'm not saying that isn't necessary, but I will say that it should not be necessary. I'm working with the vendor here to attempt to establish 'why' and issue a remedy.
I also want to work with the customer base for existing (pre-SLC eMMC) units to establish a path to getting a working, supported NVMe drive installed in their Netgate device prior to failure of the eMMC.
-
@jwt Hahaha yeah. I had to give it to a "professional" who used that heat gun.
I am delighted with Netgate's great customer service, and I really appreciate the patience Gonzo had with me trying to fix my problems and also get the unit replaced -
This post is deleted! -
@stephenw10 Out of curiosity any reason for the Innodidk 3TE6 being shipped with the SG6100/8200? Is it Silicom’s shipping default or Netgate’s selection?
https://www.texim-europe.com/getfile.ashx?id=131537
Seems endurance is only 93TBW for the 128 GB vs something like the Transcend MTE452T (or MTE352T DRAMless) also an industrial SSD with DRAM cache rated around 270TBW. https://cdn.transcend-info.com/products/images/modelpic/1164/Transcend-MTE452T2_202307.pdfThe Transcend model also seems cheaper like ~$86 (when I bought the 256GB) vs $90-100+ for the inndodisk 128 GB at least for consumers, along with lower power draw and faster overall (for 128/256). I take it steeper discounts in bulk on the Innodisk?
-
@aivxtla said in Another Netgate with storage failure, 6 in total so far:
Seems endurance is only 93TBW for the 128 GB
There’s no official TBW rating for the 128GB SSD version, but based on the type of flash used, I expect it to be similar to Transcend’s TBW.
-
@w0w it’s listed as 93TBW in the innodisk data sheet I linked, that texim link is to the official more detailed data sheet. It’s not listed in most places and hard to find normally. It’s not much different in endurance ~100TBW than the WD SN520 which is also cheaper but definitely less than the Transcend MTE352/452, I think the NAND controller also has impact in regards to things like curtailing write amplification etc. Then again some also warranty the TBW to less than max ability depending on sales tier like the WD Blue/Red SATA SSDs with same hw.
-
@aivxtla
Overall, this could simply be variability in how the manufacturer evaluates TBW. As far as I know, there is no strict standard defining how TBW should be tested or calculated. Some manufacturers focus on producing nice-looking numbers for the datasheet, while others prioritize long-term stability.
A real-world example: recently I ran into unexplained issues on a friend’s home PC. It took quite a while to track down the cause, which—unsurprisingly—turned out to be a 5-year-old SSD with about 88% wear reported. The total written data was still far below the rated TBW, and the drive was not at 100% wear, but the problems were severe enough to make the system practically unusable.
In the end, the datasheet even stated in a footnote that the specified TBW does not guarantee stable operation of the drive. Everything depends on the criteria chosen by the manufacturer. I don’t think TBW alone should be taken too seriously, although the controller may also play a role. -
@w0w different technology and more than twice the difference in listed performance. That’s not something I would attribute to testing methodology with out real data suggesting otherwise.
Imo a much more likely explanation is Netgate were not focusing on SSD wear as a major limit to device reliability and brand reputation at that time. They are now and I would be surprised if they made the same decision in the future.
-
I ended up on this thread after having THREEE SG1100s all die of the same issue, ALL IN THE LAST 60 days! I've used and loved pfsense for years, but I wasn't aware how big of a problem this issue was until recently. Super disappointing to find out... I've opened a ticket, but all of these devices are older than 1 year, so I'm gonna take a wild guess and say the netgate is gonna tell me to go pound sand. Insane that they are VERY aware of this issue yet have done nothing to rectify it in years... I've deployed over 20 1100's, mostly for small business with basic needs and simple networks. They have been great given how feature rich they are for the price tag. Id happily pay an extra $50 for these devices if they would take the time to use better storage. Fingers crossed that netgate will replace any of these failed devices, but seems unlikely..
-
@TVTS https://docs.netgate.com/pfsense/en/latest/releases/25-07.html#:~:text=Changed:%20Reduce%20writes%20to%20disk
-
Yup changes to address that went in in 25.07: https://redmine.pfsense.org/issues/16210
-
Apologies @andrew_cb for not responding directly to your boot environments observations, I do agree with you that heavy use of ZFS snapshots hurts write amplification, as it ultimately consumes more space on the storage, it also increases fragmentation, which can affect how ZFS operates if it hits a high enough level.
If the storage is over spec'd such as a 256 gig SSD, then I expect boot environments can be comfortably handled, but perhaps more care on something small like 32 gigs.
Luckily the txg delay increase should mitigate the majority of issues.