Constant System freeze [BTRFS related]

I installed EndeavourOS on luks encrypted BTRFS. At the beginning, everything worked well. But after couple of weeks, system started to freeze randomly. Now after booting it is constantly freezing. like it responds for couple of seconds then freezes for minutes. and the freeze is weird, if a tab is open i can interact with it but new tab, terminal shells freezes.
There are multiple BTRFS related hung task error and stack traces in dmesg like this,

[ 1598.507615] INFO: task glean.dispatche:4681 blocked for more than 122 seconds.
[ 1598.507616] Tainted: G W 6.16.4-arch1-1 #1
[ 1598.507617] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
[ 1598.507619] task:glean.dispatche state:D stack:0 pid:4681 tgid:4632 ppid:1 task_flags:0x400040 flags:0x00004002
[ 1598.507622] Call Trace:
[ 1598.507623]
[ 1598.507625] __schedule+0x409/0x1330
[ 1598.507628] ? __reserve_bytes+0x33f/0x700
[ 1598.507632] schedule+0x27/0xd0
[ 1598.507635] wait_current_trans+0x107/0x170
[ 1598.507637] ? __pfx_autoremove_wake_function+0x10/0x10
[ 1598.507641] start_transaction+0x441/0x840
[ 1598.507644] btrfs_create_common+0xaa/0x130
[ 1598.507649] path_openat+0x1024/0x12e0
[ 1598.507652] do_filp_open+0xd8/0x180
[ 1598.507656] ? alloc_fd+0x12e/0x190
[ 1598.507658] do_sys_openat2+0x88/0xe0
[ 1598.507661] ? task_tick_fair+0x5e/0x4c0
[ 1598.507666] __x64_sys_openat+0x61/0xa0
[ 1598.507670] do_syscall_64+0x81/0x970
[ 1598.507672] ? smp_call_function_single_async+0x22/0x50
[ 1598.507677] ? update_process_times+0xa4/0xd0
[ 1598.507681] ? tick_nohz_handler+0xb1/0x140
[ 1598.507686] ? timerqueue_add+0xae/0xd0
[ 1598.507691] ? __hrtimer_run_queues+0x164/0x2a0
[ 1598.507693] ? rcu_accelerate_cbs+0x27/0x90
[ 1598.507698] ? sched_clock+0x10/0x30
[ 1598.507704] ? sched_clock_cpu+0xf/0x200
[ 1598.507707] ? rcu_core+0x199/0x350
[ 1598.507710] ? flush_tlb_func+0x23f/0x2a0
[ 1598.507715] ? sched_clock+0x10/0x30
[ 1598.507719] ? sched_clock_cpu+0xf/0x200
[ 1598.507722] ? __flush_smp_call_function_queue+0xab/0x410
[ 1598.507724] ? sched_clock_cpu+0xf/0x200
[ 1598.507728] ? irqtime_account_irq+0x3c/0xc0
[ 1598.507731] ? __irq_exit_rcu+0x4c/0xf0
[ 1598.507736] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 1598.507738] RIP: 0033:0x7fed0589f042
[ 1598.507743] RSP: 002b:00007fecf14fdf08 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[ 1598.507745] RAX: ffffffffffffffda RBX: 00007fed05827990 RCX: 00007fed0589f042
[ 1598.507746] RDX: 0000000000080241 RSI: 00007fecf14fdfd8 RDI: ffffffffffffff9c
[ 1598.507748] RBP: 00007fecf14fdf30 R08: 0000000000000000 R09: 0000000000000000
[ 1598.507749] R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000080241
[ 1598.507750] R13: 00007fed0590d270 R14: 00000000000001b6 R15: 00007fecf14fdfd8

EDIT [1]: attaching system info

CPU: 14-core (4-mt/10-st) Intel Core Ultra 5 125H (-MST AMCP-)
speed/min/max: 400/400/4500:3600:2500 MHz Kernel: 6.16.4-arch1-1 x86_64
Up: 43m Mem: 3.52/15.05 GiB (23.4%) Storage: 476.94 GiB (67.4% used)
Procs: 379 Shell: Sudo inxi: 3.3.39
Drives:
Local Storage: total: 476.94 GiB used: 321.62 GiB (67.4%)
ID-1: /dev/nvme0n1 vendor: SK Hynix model: HFS512GEJ4X112N
size: 476.94 GiB

EDIT [2]: added system info and btrfs logs
sysinfo: https://dpaste.com/2SNPJ3HQ9
btrfs logs: https://0x0.st/Kmgf.txt

Welcome to the forum :enos_flag: :enos:

Looking at your kernel version I can see you are running behind on updates, because we are now at kernel version 6.16.8-arch3-1. There have been people on this forum that solved their problems by updating the kernel, so I would think you should update (a bit more often at least) and hope it will solve your problem too.

Thanks! So, I have updated to latest kernel 6.16.8-arch3-1 as suggested. Constant freezing stopped but there are still random 2-3min freeze.

I have exams next week that I will give on this laptop. Any other ways I can investigate more?

Also, note in my btrfs / 60G is free out of 389G .

things I did so far
smartctl -a /ssd

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning: 0x00
Temperature: 32 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 3%
Data Units Read: 49,567,052 [25.3 TB]
Data Units Written: 34,795,843 [17.8 TB]
Host Read Commands: 817,156,633
Host Write Commands: 820,070,138
Controller Busy Time: 12,171
Power Cycles: 754
Power On Hours: 8,372
Unsafe Shutdowns: 254
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 35 Celsius
Temperature Sensor 2: 32 Celsius

Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged

Self-test Log (NVMe Log 0x06, NSID 0xffffffff)
Self-test status: No self-test in progress
No Self-tests Logged

sudo btrfs scrub status /

UUID: 02cf66e3-34b3-45b5-a397-5e04d873df62
Scrub started: Wed Aug 27 23:32:22 2025
Status: finished
Duration: 0:04:36
Total to scrub: 322.66GiB
Rate: 1.10GiB/s
Error summary: no errors found

any other tools/commands? do i need to do btrfs check and smartctl -t long

Any guidance you can offer would be greatly appreciated. Thanks in advance!

Well I don’t use BTRFS myself so I can not advice you on that, however look at this wiki and maybe someone might chime in that can help you any further.

The only idea that I have right now is that if the update maybe helped a bit it could be also kernel related. I would try to install the LTS kernel. (its never wrong to do that, just in case)

sudo pacman -Syu linux-lts linux-lts-headers

Just like I thought with my Laptop that me not being able to log into wayland after an update only getting freeze and blackscreen, is a wayland or nvidia issue its actually a kernel issue and almost everything works on the LTS-kernel… its weird, but maybe this helps you as well…

Are there many snapshots on the disc? Reduce their number.
Do you have btrfs quotas enabled? Disable it.
Those come to mind if the fault really is in the btrfs disk.

Edit:

sudo btrfs filesystem usage /
sudo btrfs device stats /

What nvidia card are you using ? I dont experience any issues on btrfs.

1 Like

Lets see what luks is saying.

Can you please share the output of:

cryptsetup status /dev/mapper/<your-luks-device-name>

and

cryptsetup benchmark

I am shure that my issue and the one from OP are not really related (unless it turns out to be kernel related), so better not use their topic to talk about my issue as i have a open thread anyway for this that you can use if you want. you also find all the information and troubleshooting i already did :sweat_smile:.

Unless you accidently responded to the wrong person, OP has not said they have a nvidia card (unless i missed it).

Here are the logs and system infos,
sysinfo: https://dpaste.com/2SNPJ3HQ9
btrfs logs: https://0x0.st/Kmgf.txt

```
❯ sudo btrfs filesystem usage /
Overall:
Device size: 388.17GiB
Device allocated: 387.15GiB
Device unallocated: 1.01GiB
Device missing: 0.00B
Device slack: 0.00B
Used: 295.14GiB
Free (estimated): 87.28GiB (min: 86.77GiB)
Free (statfs, df): 87.28GiB
Data ratio: 1.00
Metadata ratio: 2.00
Global reserve: 512.00MiB (used: 0.00B)
Multiple profiles: no

Data,single: Size:361.14GiB, Used:274.87GiB (76.11%)
/dev/mapper/luks-3950f916-f8f7-4fc7-9d2c-2ca623da5cfa 361.14GiB

Metadata,DUP: Size:13.00GiB, Used:10.13GiB (77.96%)
/dev/mapper/luks-3950f916-f8f7-4fc7-9d2c-2ca623da5cfa 26.00GiB

System,DUP: Size:8.00MiB, Used:64.00KiB (0.78%)
/dev/mapper/luks-3950f916-f8f7-4fc7-9d2c-2ca623da5cfa 16.00MiB

Unallocated:
/dev/mapper/luks-3950f916-f8f7-4fc7-9d2c-2ca623da5cfa 1.01GiB
❯ sudo btrfs device stats /
[/dev/mapper/luks-3950f916-f8f7-4fc7-9d2c-2ca623da5cfa].write_io_errs 0
[/dev/mapper/luks-3950f916-f8f7-4fc7-9d2c-2ca623da5cfa].read_io_errs 0
[/dev/mapper/luks-3950f916-f8f7-4fc7-9d2c-2ca623da5cfa].flush_io_errs 0
[/dev/mapper/luks-3950f916-f8f7-4fc7-9d2c-2ca623da5cfa].corruption_errs 0
[/dev/mapper/luks-3950f916-f8f7-4fc7-9d2c-2ca623da5cfa].generation_errs 0
``

```
❯ sudo cryptsetup status /dev/mapper/luks-3950f916-f8f7-4fc7-9d2c-2ca623da5cfa
/dev/mapper/luks-3950f916-f8f7-4fc7-9d2c-2ca623da5cfa is active and is in use.
type: LUKS2
cipher: aes-xts-plain64
keysize: 512 [bits]
key location: keyring
device: /dev/nvme0n1p2
sector size: 512 [bytes]
offset: 32768 [512-byte units] (16777216 [bytes])
size: 814043136 [512-byte units] (416790085632 [bytes])
mode: read/write
❯ sudo cryptsetup benchmark

Tests are approximate using memory only (no storage IO).

PBKDF2-sha1 1989707 iterations per second for 256-bit key
PBKDF2-sha256 4040755 iterations per second for 256-bit key
PBKDF2-sha512 1165084 iterations per second for 256-bit key
PBKDF2-ripemd160 580606 iterations per second for 256-bit key
PBKDF2-whirlpool 452753 iterations per second for 256-bit key
argon2i 4 iterations, 865602 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
argon2id 4 iterations, 858304 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)

Algorithm Key Encryption Decryption
aes-cbc 128b 781.0 MiB/s 3260.2 MiB/s

serpent-cbc 128b 53.6 MiB/s 396.8 MiB/s
twofish-cbc 128b 193.8 MiB/s 567.2 MiB/s
aes-cbc 256b 1337.1 MiB/s 5878.6 MiB/s
serpent-cbc 256b 121.1 MiB/s 897.4 MiB/s
twofish-cbc 256b 262.8 MiB/s 580.6 MiB/s
aes-xts 256b 9271.4 MiB/s 6310.7 MiB/s
serpent-xts 256b 352.7 MiB/s 359.2 MiB/s
twofish-xts 256b 240.0 MiB/s 242.0 MiB/s
aes-xts 512b 5922.7 MiB/s 5206.8 MiB/s
serpent-xts 512b 636.8 MiB/s 807.1 MiB/s
twofish-xts 512b 540.3 MiB/s 544.0 MiB/s
```

After backing up, you could run a btrfs balance.

1 Like

What is missing here, from my point of view, are LUKS flags.

This is how it looks in my case:

setup status /dev/mapper/Samsung-SSD-970-EVO-Plus-1TB
/dev/mapper/Samsung-SSD-970-EVO-Plus-1TB is active and is in use.
  type:    LUKS2
  cipher:  aes-xts-plain64
  keysize: 512 [bits]
  key location: keyring
  device:  /dev/nvme2n1p3
  sector size:  512 [bytes]
  offset:  32768 [512-byte units] (16777216 [bytes])
  size:    1829560320 [512-byte units] (936734883840 [bytes])
  mode:    read/write
  flags:   discards no_read_workqueue no_write_workqueue

discard is important for any ssd or nvme. Otherwise you can not trim the drive.
no_read_workqueue and no_write_workqueue are performance relevant.

1 Like

As per the suggestions from the forum, I enabled the LUKS flags and performed a full balance. Also did some house cleaning. cleared pacman/yay caches, docker system prune -f, and moved some stuff to backup.

 sudo fstrim -av
/efi: 482.7 MiB (506146816 bytes) trimmed on /dev/nvme0n1p1
/: 279.8 GiB (300471656448 bytes) trimmed on /dev/mapper/luks-3950f916-f8f7-4fc7-9d2c-2ca623da5cfa
❯ sudo btrfs filesystem usage /
Overall:
    Device size:                 388.17GiB
    Device allocated:            113.06GiB
    Device unallocated:          275.10GiB
    Device missing:                  0.00B
    Device slack:                    0.00B
    Used:                        110.49GiB
    Free (estimated):            276.40GiB      (min: 138.85GiB)
    Free (statfs, df):           276.40GiB
    Data ratio:                       1.00
    Metadata ratio:                   2.00
    Global reserve:              258.12MiB      (used: 0.00B)
    Multiple profiles:                  no

I haven’t experienced any freezing for the past 10-12 hours, which is a significant improvement. I will continue monitoring and provide another update in a couple of days.

Thanks again to everyone for the help!

1 Like

I noticed the following kernel parameter also in the btrfs log. rd.driver.pre=btrfs

This specific parameter is generally not needed anymore and could be removed from info i have read about it. Not saying it will help with your issue but just something i noticed.

1 Like

Update: Issue Resolved – No More Freezing

It’s been a few days, and I’m happy to report that the freezing issue has been resolved. Just to summarize for anyone else who might come across this thread with a similar problem:

I’m using BTRFS with snapshots on a 512GB NVMe SSD. My disk usage had filled up to around 60-70%, and that’s when I started noticing system freezes. After checking with dmesg, I saw several messages about “hung tasks” along with BTRFS stack traces.

If you’re using BTRFS, make sure to:

  • Regularly run btrfs scrub and btrfs balance (depending on your use case).
  • Automate this process using tools like btrfs-assistant and btrfsmaintenance, as mentioned in a previous comment. as mentioned here

If you’re also using LUKS on BTRFS (especially on an SSD), ensure that the discards no_read_workqueue no_write_workqueue flags are enabled in your /etc/crypttab file as mentioned in this comment.

This setup has worked for me, and the system is running smoothly now.

Big thanks to @mbod, @EOS, and everyone else for the helpful advice!

2 Likes

Maybe it’s best to just balance the data: balance only data (and not metadata), since the later is risky/harmful