Has anyone run into split_lock_detect?

I’ve been running into random system crashes lately…without warning the system will just crash & reboot…

I have tried the LTS kernel, Zen kernel, mainline kernel, changed kernel boot parameters & different settings in the BIOS without any effect (resetting several parameters–memory speed…etc) …

I was just looking at the logs & this came up: x86/split lock detection: #AC: crashing the kernel on kernel split_locks and warning on user-space split_locks.

In most of the crashes (not all) I was using Firefox…nothing out of the normal…just browsing around.
Just grasping at straws I’ve set split_lock_detect=off to see if that helps…

The crashes “seem” to fit the split-lock symptoms as per this page: https://x86.lol/generic/2023/11/07/split-lock.html

The log gives only this as information:

Feb 11 08:23:39 ASUS-Z790 rtkit-daemon[1224]: Successfully made thread 125953 of process 125679 owned by '1001' RT at priority 10.
Feb 11 08:23:39 ASUS-Z790 rtkit-daemon[1224]: Supervising 10 threads of 6 processes of 1 users.
Feb 11 08:23:41 ASUS-Z790 rtkit-daemon[1224]: Supervising 9 threads of 5 processes of 1 users.
Feb 11 08:23:41 ASUS-Z790 rtkit-daemon[1224]: Supervising 9 threads of 5 processes of 1 users.
Feb 11 08:23:42 ASUS-Z790 rtkit-daemon[1224]: Supervising 9 threads of 5 processes of 1 users.
Feb 11 08:23:42 ASUS-Z790 rtkit-daemon[1224]: Supervising 9 threads of 5 processes of 1 users.
Feb 11 08:23:45 ASUS-Z790 NetworkManager[786]: <info>  [1707668625.1242] dhcp6 (enp10s0): state changed new lease, address=fddb:26c1:867f:10::f27 2605:59c8:42f:5610::f27
-- Boot e94ff9cca32e4a8492076146db8ae0ad --
Feb 11 08:24:26 ASUS-Z790 kernel: Linux version 6.6.16-1-lts (linux-lts@archlinux) (gcc (GCC) 13.2.1 20230801, GNU ld (GNU Binutils) 2.42.0) #1 SMP PREEMPT_DYNAMIC Mon, 05 Feb 2024 21:20:21 +0000
Feb 11 08:24:26 ASUS-Z790 kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-linux-lts root=UUID=50127e7b-dbfe-44ad-a2b2-f1cc88c60656 rw loglevel=3
Feb 11 08:24:26 ASUS-Z790 kernel: x86/split lock detection: #AC: crashing the kernel on kernel split_locks and warning on user-space split_locks

I’m running fairly new hardware…just changed the motherboard/CPU/memory last month & this problem started shortly after…

Thoughts anyone???

A split lock is any atomic operation whose operand crosses two cache lines. Since the operand spans two cache lines and the operation must be atomic, the system locks the bus while the CPU accesses the two cache lines.

A bus lock is acquired through either split locked access to writeback (WB) memory or any locked access to non-WB memory. This is typically thousands of cycles slower than an atomic operation within a cache line. It also disrupts performance on other cores and brings the whole system to its knees.

I would use the kernel paramater split_lock_detect=off

https://www.kernel.org/doc/html/v5.18/x86/buslock.html#

2 Likes

Yes—I have that set…just testing to see if that is the “cure” or not…

I’m tending to think that the “problem” is coming from BOINC…it’s the application that is always running in the background on my system…

Hmm i don’t know. Just because something is running in the background that is an application that is supposed to be running i wouldn’t think would cause this. :thinking: I mean if it’s an application that’s not supposed to be would make more sense. But i know little when it comes to the depth of this.

The reasoning behind that is BOINC runs various applications that are very CPU/GPU/Memory intensive that are coded by the various teams that want the results…I’m guessing that they don’t tend to observe coding standards as much as “mainstream” applications…Mission-driven & use lots of resources.

I’m just going to follow with split_lock_detect=off and see what happens. Interesting that this cropped up when I went to 13th gen i7 & Z790…it was not happening with the 12th gen i7 & Z690…

I regularly see about 50% of my memory used by BOINC applications.

Maybe just to reassure you … slightly I guess … the 2 machines with the I3 13100f are still ok. I keep all machines updated everyday for Milkyway@home.

But like you know, I don’t do the GPU calculations … just the Cpus.
Still no problems with BOINC since april 2023 on EndeavourOS.

Hope you will find your fix :slight_smile: Crunchers are good people :wink:
:wave:

Yes…I’ve been crunching since the Seti@home days in the late 90’s…first time I’ve run into this…as I do more research into the topic…it looks like this will affect hardware that is “new” enough to be able to have the flag set…I think that I just “tripped” into that realm…

Interesting article on the topic: https://lwn.net/Articles/790464/

Oh! seti was my first one when I came back to it, but in ~2014 ish I think. 2000’s I had started with Einstein@home with my beloved pentium D :wink:

Sending you positive energy for a quick fix :slight_smile: :pray:

Sure if what ever is running is loading the cpu and it can’t handle the load i can see that because then the processes will starting to overlap or stop and wait? There’s only so much each core and thread can handle.

More info on the topic: https://x86.lol/generic/2023/11/07/split-lock.html

And Thank You…crossing fingers that split_lock_detect=off is the fix…seems to be good so far. Back on the Zen kernel (the worst offender) & running BOINC at 70% on 8 of the 16 cores…current memory usage is at 33% with Numbers@home, World Community Grid & Einstein@home up.

That’s the real question…I ran just the same on my i7 12th gen & Z690…I would “think” that the 13th gen & Z790 " should" be able to at least do the same—if not more…

All the “normal” stuff I have covered…the CPUs are running @ 50c all other temps are very low…I was “trying” to move forward with this new build (DDR5-6400 instead of DDR4-4133—2 more cores—better motherboard—etc).

Just a bit frustrating…but I know that this is just “teething troubles” I just change 1 thing at a time until the problem is not there anymore…it just is taking more time than normal to “get the bugs out” :wink:

Was seeing this while booting so tried the suggested, worked like a charm.
Thanks!