It is fine under low load but when downloading or stressing it under high usage it appears to momentarily drop out. BTRFS then reports checksum errrors and my test case for it, downloading from steam (1gb internet) reports files are corrupt.
Running Rampage4 extreme motherboard
Sabrent SB-ROCKET-2TB (no new firmware available)
GLOTRENDS M.2 PCIe NVMe/AHCI SSD to PCIe 3.0 x 4
Mobo bios has had the elements injected into it so it boots NVME drives.
I have tried turning off all overclocks: no change although multithreading seems to make things worse
I have tried applying “nvme_core.default_ps_max_latency_us=0” to no improvement.
I have tried “iommu=soft” to no improvement
The drive operates normally under windows
The drive behaved badly under ext4 too initially so I do not think it’s a clash with BTRFS
Running Xanmod custom kernel but i have tried with normal, lts and zen with the same results.
Tried running the bios PCI-E in gen2 more rather than gen3 with same results.
Pritty annoying as I have this setup just how I like it now so I don’t want to switch desktop back to windows.
If all else fails I can try getting another NVME or Sata SSD I suppose.
edit: reported drive and system temps are all normal. It has a heatsink and the errors occur rather quickly not indicative of any “heat soak”
Disagree on the cooling part as it does have a small heatsink and this problem only ever happens within linux.
Im getting full speed in windows as in linux (apart from the drops)
I agree its a cheap drive and its highly possible they have done something janky which happens to work ok in windows by chance. I bought it from amazon so if it hadn’t worked right id have sent it back when i got it.
Im just hoping someone comes in with a little thing i can try or maybe has experienced similar. If i cant fix it i’ll have to decide my exit strategy of what hardware to replace it with.
Linux is more sensitive to errors detection on drives, as far as my experience goes, so it’s no wonder (sadly it doesn’t mean you don’t have corrupted data on Windoze as well)
Have you tested it with SMART?
Since personally i don’t know what else to advice for that particular case, i can advice on replacement - Samsung evo or pro. Yes it costs, but for a good reason, it’s the most reliable SSD drives you can possibly find.
What do you mean with “behaved badly under ext4”? btrfs is reporting checksum errors but ext4 has no checksums. What is going bad with ext4?
And how do you know that Windows is all fine? Windows does not report an error but are you sure the data is not corrupt? Can you try downloading or copying a big ISO file for which you know the sha256 checksum and check if the data is corrupt or not?
Steam does verify checksums once files are downloaded and no errors are ever present on windows.
The reason i said bad behaviour was seen on ext4 was from when i tried to setup a virtual machine while distro hopping.
During windows update downloads of the VM’s initial setup it was complaining of bad download and data errors repeatidly. In hindsight these are the same symptoms - Data corruption under high load of updating or downloading files. At the time i didnt give it enough thought to check the kernel logs at all.
I can see the power supply temps and they are running normally as are the fans.
It is a high quality PSU as i have been bitten in the past with bad power supplys and i run a decent sized graphics card so its not under any load but its a good point.
The fact that it occurs more frequently with hyperthreading on is interesting and is probably related to the nature of its intermittent failure maybe when threads are switching etc.
Well i personally wouldn’t go by temps on power supply but actual testing of power supply unit itself. For current draw and output etc. Any chance at trying a new nvme drive to see if it’s the culprit?