Internet unstable after recent update

This thread seems to be more about the 6.16 kernel-level internet connection issue as opposed to the recent Arch DDoS issues.

1 Like

Sure. I wouldn’t have time for kernel updates if being constantly attacked. But anyway, too dystopic, that viewpoint…

Sure but this issue one is critical and has a fix as you pointed it out. :face_with_diagonal_mouth:

1 Like

In my opinion, the folks at netdev bear a significant part of the responsibility. I joined their mailing list a week ago, and the truth is that they are juggling a lot of work. But let’s go over the history a bit—it’s quite interesting. Here is the commit that introduced the bug, dated July 15.

They removed the line fi = NULL in __mkroute_output().

More than a month passed without anyone noticing the issue, until August 21–22, when kernel 6.16.2 was released. On that same day, Brett Sheffield spotted the regression (as did many of us—I remember experiencing severe internet problems on the 22nd and opening a post on Bugzilla) and he published a fix:

“Fix the regression introduced in 9e30ecf23b1b whereby IPv4 broadcast packets were having their Ethernet destination field mangled. This broke WOL magic packets and likely other IPv4 broadcasts.”

However, this initial, flawed fix was later included by distributions such as CachyOS and Nobara in their kernels, without realizing it contained a critical problem. The issue had already been reported by Jakub Kicinski on the 22nd, and Brett corrected it on the 23rd. The original problematic code was:

if (type == RTN_BROADCAST) {
			/* ensure MTU value for broadcast routes is retained */
			ip_dst_init_metrics(&rth->dst, res->fi->fib_metrics);
		}

Jakub warned Brett: “You need to check if res->fi is actually set before using it”

In other words, the original patch could hypothetically crash the kernel in certain scenarios because it unconditionally dereferenced res->fi->fib_metrics, while fi had been set to NULL. If res->fi is also NULL, this leads to a NULL pointer dereference in __mkroute_output(). Adding && res->fi prevents this scenario.

Brett corrected the patch on August 23rd:

if (type == RTN_BROADCAST && res->fi) {
			/* ensure MTU value for broadcast routes is retained */
			ip_dst_init_metrics(&rth->dst, res->fi->fib_metrics);
		}

This way you ensure that metrics are only copied if the pointer exists.

Even so, this fix does not fully preserve the original intent of the commit, which is why it was later superseded by Oscar Maes’ version on August 26, which appears to be the correct solution.

And here we are, still waiting for this critical bug, which has already propagated widely—including the LTS branch—to finally be resolved.

2 Likes

Well, I got really tired of this internet connection issue with the current 6.16 kernel. Installed linux-lts and linux-lts-headers. Currently running linux-lts 6.12.44-1 without issues so far.

Hopefully, this issue gets addressed in the next kernel update.

I’m curious about that. The other day I tested LTS 6.12.43-1, and noticed it was dropping packets and showing similar traffic issues, so I figured the bug must have made its way into that branch as well. Did you test 6.12.44-1 thoroughly?

It’s only been about half an hour of my typical forum hopping, Linux news source hopping, YouTube, and so forth. So far, all seems well.

OK, when you get a chance, run a test. Open a browser and try loading 10–15 web pages in separate tabs while running a ping in a terminal. Then check if all the pages load properly or if some get stuck halfway or show connection errors. After that, see if any packets are being lost in the ping. Repeat the test two or three times, and that should give you a pretty good idea of whether it’s working fine or not.

@albersc2 Doing that now with 20 tabs open. Pinged google.com. How do I “see if any packets are being lost in the ping.”

Never mind. Found it…

--- google.com ping statistics ---
203 packets transmitted, 202 received, 0.492611% packet loss, time 202309ms
rtt min/avg/max/mdev = 7.104/12.522/15.889/1.647 ms

ctrl+c in the terminal to finish the ping.

As for the other thing, hold Ctrl while clicking the links, one after another without waiting too long. And open as many websites as possible, the more the better.

1 Like

Running linux-lts 6.12.44-1 - 2nd test, 45 tabs open in Waterfox, 30 tabs open in Vivaldi…

--- google.com ping statistics ---
167 packets transmitted, 167 received, 0% packet loss, time 166185ms
rtt min/avg/max/mdev = 5.852/12.816/24.378/2.606 ms

Do all websites open properly? None of them have gone blank or displayed “unable to connect” or similar messages?

Every website loaded as it should. No “unable to connect” messages, no blank pages.

Well, that’s good news, I guess.

I would say so. Making this, what would seem to be, a 6.16 issue.

lts 6.12.43-1 release from August 20 was affected by the bug. In contrast, the 6.12.44-1 release from August 28 does not seem to be. This is good news, because it may mean that at least in that branch the issue has already been fixed :wink:

1 Like

With kernel 6.16.4 i do have unable to connect messages.

Just dropping in to say I’m experiencing the same problem (6.16.4-arch1-1),

and to say thank you ( now I can stop pulling my hair out to find the source of the problem :face_savoring_food: )

I even got myself a new shiny network switch since I thought that was the issue, as I have multiple PCs running EOS :woman_facepalming: well that switch was due to an upgrade anyway, downgraded to 6.16.1 and all is fine

1 Like