Internet unstable after recent update

In my opinion, the folks at netdev bear a significant part of the responsibility. I joined their mailing list a week ago, and the truth is that they are juggling a lot of work. But let’s go over the history a bit—it’s quite interesting. Here is the commit that introduced the bug, dated July 15.

They removed the line fi = NULL in __mkroute_output().

More than a month passed without anyone noticing the issue, until August 21–22, when kernel 6.16.2 was released. On that same day, Brett Sheffield spotted the regression (as did many of us—I remember experiencing severe internet problems on the 22nd and opening a post on Bugzilla) and he published a fix:

“Fix the regression introduced in 9e30ecf23b1b whereby IPv4 broadcast packets were having their Ethernet destination field mangled. This broke WOL magic packets and likely other IPv4 broadcasts.”

However, this initial, flawed fix was later included by distributions such as CachyOS and Nobara in their kernels, without realizing it contained a critical problem. The issue had already been reported by Jakub Kicinski on the 22nd, and Brett corrected it on the 23rd. The original problematic code was:

if (type == RTN_BROADCAST) {
			/* ensure MTU value for broadcast routes is retained */
			ip_dst_init_metrics(&rth->dst, res->fi->fib_metrics);
		}

Jakub warned Brett: “You need to check if res->fi is actually set before using it”

In other words, the original patch could hypothetically crash the kernel in certain scenarios because it unconditionally dereferenced res->fi->fib_metrics, while fi had been set to NULL. If res->fi is also NULL, this leads to a NULL pointer dereference in __mkroute_output(). Adding && res->fi prevents this scenario.

Brett corrected the patch on August 23rd:

if (type == RTN_BROADCAST && res->fi) {
			/* ensure MTU value for broadcast routes is retained */
			ip_dst_init_metrics(&rth->dst, res->fi->fib_metrics);
		}

This way you ensure that metrics are only copied if the pointer exists.

Even so, this fix does not fully preserve the original intent of the commit, which is why it was later superseded by Oscar Maes’ version on August 26, which appears to be the correct solution.

And here we are, still waiting for this critical bug, which has already propagated widely—including the LTS branch—to finally be resolved.

2 Likes