Disable a processor on boot in a persistent way?

Recently there have been some unintended reboots with MCE errors about Processor 2.

Am I correct to think I could disable Processor 2 with the following command, and that it would be a good workaround for now?*
[root ~]# echo “0” > /sys/devices/system/cpu/cpu2/online

If so, what’s the best way to make that persist between reboots?

I’m generally using the stock linux kernel with systemd-boot.

(*Hoping to buy some time before trying to redo the thermal paste or buying a new CPU/CPU+mobo.)


Edit:

Not urgent for these reasons:

I only disabled cpu2 an hour ago and should give it more time (days) to see if it helps at all.

If this doesn’t help with the mystery reboots, there’s no point making a non-fix persistent or automatic.

And if it does help, it’s not a great hardship to run that command from a terminal after an intentional reboot.

It would be more helpful if you were to provide logs and error messaging output and hardware info first in order to ask for help for a specific issue.

https://discovery.endeavouros.com/forum-log-tool-options/how-to-include-systemlogs-in-your-post/2021/03/

1 Like

Thank you. I appreciate that link.

Let me generalize the question instead of discussing a specific error:

If I wanted the following command to run at every boot, what would be the best way to do that (allowing for the possibility that I might be booting to an updated or different kernel)?

[root ~]# echo “0” > /sys/devices/system/cpu/cpu2/online

Or is there a better way to disable a cpu that would persist between boots?

I would prefer to see the error messaging to understand what the issue is so i would have a clearer understanding of what you are attempting to do with the hardware on boot.

Examples would be such as: hardware info log, dmesglog, specific boot logs etc.

Edit:
A machine check exception (MCE) is an error generated by the CPU when the CPU detects that a hardware error or failure has occurred.

Edit: MCE errors happen for different reasons. It may be a kernel parameter is needed. It could be a UEFI Bios update is required because it’s way out of date. Or some other reason.

Edit: If you can provide hardware log, dmesg log, boot log etc. would be more helpful

Edit: Thank you for the offer to conduct some troubleshooting based on my log files. That is not a process I originally sought to engage in.

My preference would be to have my question answered as posed.

If you want to do it on boot, then you’re going to have to use kernel parameters. Take a look at this article which describes a bunch of kernel parameters you can play around with:

https://www.kernel.org/doc/html/latest/core-api/cpu_hotplug.html

1 Like

Thank you. I’ll have a look at that tomorrow.

I know where to edit parameters for individual kernels, but I’m wondering if there’s a system file that can pass kernel parameters no matter which kernel is used for booting. Just thinking out loud – it’s something I can research on my own. Maybe such a thing would be a bad idea.

That would depend on the bootloader, I suppose. Perhaps the bootloader’s documentation is a good place to start looking for clues. Maybe there’s a configuration option that allows kernel parameters to be applied to all boot entries.

1 Like

[sorry – meant to reply to the thread, not to anthony93]

Additional details:

The kernel boot option “isolcpus=2” * does what I want (i.e., reserving cpu 2 to have no tasks assigned to it), but that did not prevent a mouse and keyboard freeze after about half a day of uptime. However, there was no MCE error when I rebooted from that freeze.

Meanwhile, I have dual-booted into Windows 10 to use some specialized apps and will stay in Windows for a week or so.

If I’m not having a daily freeze or mystery reboot in Windows, that might indicate the hardware is still okay despite being about six or seven years old and that it would be worthwhile to proceed with uploading logs and other details for the sake of troubleshooting. (I’m already missing my dropdown terminal and yay.) Of course, if the hardware is failing there’s no point troubleshooting now if the hardware is going to be replaced soon.
__________
* Yes, I know isolcpus is deprecated, but I think that’s because it cannot be changed without a reboot – more of a concern for servers than for a desktop PC.

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.