Read-only filesystem

Recently my SSD turns on read-only mode a few minutes after logging in. My SSD has around 1 year, it is a Samsung SSD 970 EVO Plus 1TB

What’s going on?

I get this message:

[74.677955] BTRFS error (device nvme0n1p2): tree first key mismatch detected, bytenr=278864035840 parent_transid=171021 key expected=(4264120320,168,34359742464) has=(4264120320,168,4096)
[74.682523] BTRFS error (device nvme0n1p2): failed to run delayed ref for logical 4266262528 num_bytes 12288 type 184 action 1 ref_mod 1: -5
BTRFS error (device nvme0n1p2: state A): Transaction aborted (error -5)
BTRFS error (device nvme0n1p2: state A) in btrfs_run_delayed_refs: 2177: error=-5 IO failure 

It appears that the user is experiencing issues with their Samsung SSD 970 EVO Plus, where it turns into read-only mode and they’re receiving BTRFS errors. This can be concerning, but there are several potential steps to troubleshoot and resolve this problem. Here’s a comprehensive guide for the user:

  1. Backup Your Data:
    Before attempting any fixes, it’s essential to back up your data, as there could be issues with the filesystem or hardware. Use an external drive or cloud storage to ensure your important files are safe.

  2. Check the Health of the SSD:
    SSDs have a limited lifespan, and after one year of usage, it’s a good practice to check the health of the drive. You can use a utility like smartctl to assess the SSD’s status. Run the following command in the terminal:

    sudo smartctl -a /dev/nvme0n1
    

    Look for any signs of failing components or wear.

  3. Check for Filesystem Errors:
    The BTRFS errors suggest potential filesystem issues. You can check and repair your filesystem with the following command:

    sudo btrfs check --repair /dev/nvme0n1p2
    

    This command will attempt to fix any issues it finds with the BTRFS filesystem on your SSD.

  4. Update Firmware:
    Check if there is a firmware update available for your Samsung SSD. Manufacturers often release updates that address issues and improve performance. You can visit the Samsung website to find the latest firmware for your SSD model.

  5. Check for Overheating:
    Overheating can sometimes lead to SSD issues. Ensure that your SSD is adequately cooled. Check if your computer’s fans are working correctly and consider cleaning any dust buildup that might be causing overheating.

  6. Check for Hardware Connection Issues:
    Make sure the SSD is correctly connected to the motherboard. Check the cables and connectors for any loose or damaged parts.

  7. Monitor for Recurrence:
    After performing these steps, monitor your system for a while to see if the issue recurs. If it does, you may want to consider replacing the SSD if it’s still under warranty or seek professional help to diagnose the problem further.

  8. Consider Data Recovery:
    If you’re unable to resolve the issue and the SSD is showing signs of failure, you may need to consider data recovery services to retrieve your data. A professional data recovery service might be able to recover your files even if the drive is in a read-only state.

Remember that dealing with storage device issues can be complex, and there’s a risk of data loss. If you are unsure or uncomfortable performing these steps yourself, it’s advisable to seek assistance from a professional technician who can diagnose and fix the problem.

1 Like

This reads like a ChatGPT generated answer…

edit:
I’m not quite sure how to judge the act of answering questions in a forum by generating stuff in ChatGPT… Doesn’t seem to be the first time the user has done this.

2 Likes

I sometimes use LLMs, like Bard, ChatGPT, etc. But just to summarize, or improving my answer. Nothing more. Or add emojis. I try not to use/abuse them a lot.
If a prohibition policy of using LLMs is added (such as Stack Overflow), it will be OK for me, I won’t use them.

We cannot prevent the use of such apps, it is entirely up to the individual to use it. But, just like auto-correct, just check your response and review it before posting and “humanize” it a bit.
The way your post is set up certainly looks like a spamming bot and you don’t want to be banned for being mistaken by one. :wink:

2 Likes

This log shows the diff: 34359742464 (excepted) != 4096 (not excepted)

It looks like the metadata is corrupted. It would probably be caused by your RAM problem.
If I remember correctly, you are using zram with the ability compression.

How to disable zram-generator?

This problem is getting worse. Now I cannot mount my home dir:

# systemctl status home.mount
Loaded: loaded (/etc/fstab; generated)
Active: failed (Result: exit-code) since .....
Where: home
What: /dev/disk/by-uuid/....
....
Mounting /home...
mount: /home: wrong fs type, bad option, bad super lock on /dev/nvme0n1p2, missing codepage...
home.mount: Failed with result 'exit-code'
Failed to mount /home

From booting my filesystem is on read-only mode :smiling_face_with_tear:

Backup your data and replace the drive asap! :smile: This is not a bot! :rofl:

If my home dir cannot be mounted, how can I access to this dir from another disk or boot-usb?

Stop using the drive, at the very least. Get another drive, install the OS of choice, and worry about accessing your data later. The more you use the failing drive, the quicker it will fail completely.

Started getting the same errors you have on the same day you posted this. It only happened after I did a Windows 10 update on another drive. What’s weird is the eos/linux drive is a new Samsung 990 pro that is in perfect health, and it’s starting doing it again after a full reinstall, but it only started happening once more after I booted into windows and back in to linux. I hope it’s not windows doing something weird when it sees a btrfs drive it can’t mount. I’m going to try setting the drive as offline in windows disk management, then do another EOS reinstallation to see if that changes anything.

After the events reported, days later I was able to access my drive as a secondary disk, since the problem remained as a primary disk, at least this indicated that my disk was not dying, and that the problem was logical, its origin may come from of a failure of:

  • BTRFS file system, or
  • Zram-generator, or
  • SSD, or
  • a conjunction of the previous three.

With this in mind, I reinstalled the system with these changes:

  • EXT4 filesystem,
  • no SWAP system,
  • no Timeshift utility

For the last four days that I have been using my drive, except for an unexpected reboot, I have not had any major anomalies to note. :hand_with_index_finger_and_thumb_crossed: :pray:

Be careful, with EXT4 you can keep your system running without realizing data is corrupted.

This corrupt data will be overwritten into the good version of data during backup. After a long time, you would have a lot of corrupted data in backups or your cloud storage.

Test your physical memory (e.g. CPU cache, RAM and disk, cable, PCIE / Bus on Mainboard).
Of course, each hardware has its own limited lifespan.

1 Like

If this problem occurs only after visiting the Windows install, then make sure that

  • Secure Boot is disabled (in BIOS)
  • Fast Boot is disabled (in Windows)

I don’t use Windows anymore since 2 years ago as dual boot, I’m a Linux fanboy these days.

Well, there are those who recommend the use of EXT4 over other file systems for personal use and BTRFS for use with NAS and RAID. This problem has made me to generate backup images for my system and data at least. :smiling_face_with_tear:

I have been using ext2|3|4 for over twenty years on spinning hard drives with no perceptible data loss due to corruption. I would give one vote for ext4 above any other “new and improved” filesystem. :rofl:

You’re lucky, probably less than 5% of total data would be corrupted that you’ve never touched or not used until now.

I do not trust enough life-long stability of old hardware and new hardware after purchase.
Ext4 is fine, If I don’t care about “unnecessary” data (e.g. Games, ISO, Git, public multimedia …), I always push them to Ext4. I can re-download them from the internet after data loss.

If data is important or critical, I prefer to store it in two different filesystems with two abilities: checksum and self healing. of course I use automatic incremental backups.

I don’t see the output of smartctl -a in this thread. If there are errors there, it’s definitely the SSD.