My PC is frequently rebooting itself

Log of random reboots:

$ journalctl | grep microcode | grep "Hardware Error"
Sep 06 18:39:40 antares kernel: mce: [Hardware Error]: PROCESSOR 2:800f11 TIME 1662507574 SOCKET 0 APIC 1 microcode 8001138
Sep 21 13:03:33 antares kernel: mce: [Hardware Error]: PROCESSOR 2:800f11 TIME 1663783406 SOCKET 0 APIC 8 microcode 8001138
Sep 23 10:12:41 antares kernel: mce: [Hardware Error]: PROCESSOR 2:800f11 TIME 1663945955 SOCKET 0 APIC 9 microcode 8001138
Oct 01 18:35:42 antares kernel: mce: [Hardware Error]: PROCESSOR 2:800f11 TIME 1664667335 SOCKET 0 APIC 0 microcode 8001138
Oct 15 10:29:42 antares kernel: mce: [Hardware Error]: PROCESSOR 2:800f11 TIME 1665847776 SOCKET 0 APIC 1 microcode 8001138
Oct 19 21:58:41 antares kernel: mce: [Hardware Error]: PROCESSOR 2:800f11 TIME 1666234714 SOCKET 0 APIC 8 microcode 8001138
Oct 27 00:16:28 antares kernel: mce: [Hardware Error]: PROCESSOR 2:800f11 TIME 1666847781 SOCKET 0 APIC 0 microcode 8001138
Oct 29 08:40:19 antares kernel: mce: [Hardware Error]: PROCESSOR 2:800f11 TIME 1667050812 SOCKET 0 APIC 0 microcode 8001138
Oct 29 18:13:16 antares kernel: mce: [Hardware Error]: PROCESSOR 2:800f11 TIME 1667085189 SOCKET 0 APIC 8 microcode 8001138
Oct 29 18:13:16 antares kernel: mce: [Hardware Error]: PROCESSOR 2:800f11 TIME 1667085189 SOCKET 0 APIC 9 microcode 8001138
Oct 31 12:12:12 antares kernel: mce: [Hardware Error]: PROCESSOR 2:800f11 TIME 1667236326 SOCKET 0 APIC 9 microcode 8001138

Verifying that microcode got updated on boot

$ sudo journalctl -k --grep=microcode                
Oct 31 12:12:12 antares kernel: microcode: CPU0: patch_level=0x08001138
Oct 31 12:12:12 antares kernel: microcode: CPU1: patch_level=0x08001138
Oct 31 12:12:12 antares kernel: microcode: CPU2: patch_level=0x08001138
Oct 31 12:12:12 antares kernel: microcode: CPU3: patch_level=0x08001138
Oct 31 12:12:12 antares kernel: microcode: CPU4: patch_level=0x08001138
Oct 31 12:12:12 antares kernel: microcode: CPU5: patch_level=0x08001138
Oct 31 12:12:12 antares kernel: microcode: CPU6: patch_level=0x08001138
Oct 31 12:12:12 antares kernel: microcode: CPU7: patch_level=0x08001138
Oct 31 12:12:12 antares kernel: microcode: CPU8: patch_level=0x08001138
Oct 31 12:12:12 antares kernel: microcode: CPU9: patch_level=0x08001138
Oct 31 12:12:12 antares kernel: microcode: CPU10: patch_level=0x08001138
Oct 31 12:12:12 antares kernel: microcode: CPU11: patch_level=0x08001138
Oct 31 12:12:12 antares kernel: microcode: CPU12: patch_level=0x08001138
Oct 31 12:12:12 antares kernel: microcode: CPU13: patch_level=0x08001138
Oct 31 12:12:12 antares kernel: microcode: CPU14: patch_level=0x08001138
Oct 31 12:12:12 antares kernel: microcode: CPU15: patch_level=0x08001138
Oct 31 12:12:12 antares kernel: microcode: Microcode Update Driver: v2.2.
Oct 31 12:12:12 antares kernel: mce: [Hardware Error]: PROCESSOR 2:800f11 TIME 1667236326 SOCKET 0 APIC 9 microcode 8001138

Verifying microcode in grub

$ sudo cat /boot/grub/grub.cfg | grep initrd         
   initrd       /@/boot/amd-ucode.img /@/boot/initramfs-linux-zen.img
   initrd       /@/boot/amd-ucode.img /@/boot/initramfs-linux-zen-fallback.img
   initrd       /@/boot/amd-ucode.img /@/boot/initramfs-linux-lts.img
   initrd       /@/boot/amd-ucode.img /@/boot/initramfs-linux-lts-fallback.img

CPU info:

OS: EndeavourOS Linux x86_64 
Kernel: 6.0.6-zen1-1-zen 
Shell: zsh 5.9 
Resolution: 3440x1440 
DE: Plasma 5.26.2 
WM: KWin 
WM Theme: Layan 
Theme: Breeze Light [Plasma], Layan [GTK2/3] 
Icons: Tela [Plasma], Tela [GTK2/3] 
Terminal: HyperTerm 
Terminal Font: JetBrainsMono Nerd Font 
CPU: AMD Ryzen 7 1700 (16) @ 3.000GHz 
GPU: NVIDIA GeForce GTX 1070 
Memory: 3380MiB / 15914MiB 

Any solution?

Do you have the amd-ucode installed?

pacman -Qi amd-ucode
[ricklinux@kde-plasma ~]$ pacman -Qi amd-ucode
Name            : amd-ucode
Version         : 20220913.f09bebf-1
Description     : Microcode update image for AMD CPUs
Architecture    : any
URL             : https://git.kernel.org/?p=linux/kernel/git/firmware/linux-firmware.git;a=summary
Licenses        : custom
Groups          : None
Provides        : None
Depends On      : None
Optional Deps   : None
Required By     : None
Optional For    : None
Conflicts With  : None
Replaces        : None
Installed Size  : 53.67 KiB
Packager        : Laurent Carlier <lordheavym@archlinux.org>
Build Date      : Sat 24 Sep 2022 01:46:05 PM
Install Date    : Mon 26 Sep 2022 10:29:42 AM
Install Reason  : Explicitly installed
Install Script  : No
Validated By    : Signature

[ricklinux@kde-plasma ~]$ 
$ sudo pacman -Qi amd-ucode                 
Name            : amd-ucode
Version         : 20220913.f09bebf-1
Description     : Microcode update image for AMD CPUs
Architecture    : any
URL             : https://git.kernel.org/?p=linux/kernel/git/firmware/linux-firmware.git;a=summary
Licenses        : custom
Groups          : None
Provides        : None
Depends On      : None
Optional Deps   : None
Required By     : None
Optional For    : None
Conflicts With  : None
Replaces        : None
Installed Size  : 53,67 KiB
Packager        : Laurent Carlier <lordheavym@archlinux.org>
Build Date      : Sat 24 Sep 2022 12:46:05 PM -05
Install Date    : Mon 26 Sep 2022 09:33:34 AM -05
Install Reason  : Explicitly installed
Install Script  : No
Validated By    : Signature

What are your temperatures like?

How do I get that info?

Have you tested the memory?

sudo pacman -S lm_sensors
sudo sensors detect # Accept all the defaults.

Then:

sensors
k10temp-pci-00c3
Adapter: PCI adapter
Tctl:         +45.6°C

nvme-pci-0200
Adapter: PCI adapter
Composite:    +15.8°C  (low  = -273.1°C, high = +80.8°C)
                       (crit = +86.8°C)
Sensor 1:     +22.9°C  (low  = -273.1°C, high = +65261.8°C)

amdgpu-pci-0600
Adapter: PCI adapter
vddgfx:      843.00 mV
vddnb:       937.00 mV
edge:         +36.0°C
PPT:         1000.00 uW

nvme-pci-0500
Adapter: PCI adapter
Composite:    +25.9°C  (low  =  -0.1°C, high = +69.8°C)
                       (crit = +84.8°C)

BAT0-acpi-0
Adapter: ACPI interface
in0:          16.01 V

Maybe provide the following

sudo dmesg | eos-sendlog
journalctl -b -0 | eos-sendlog

Edit:

journalctl -k --grep=mce | eos-sendlog
$ sudo sensors-detect
# sensors-detect version 3.6.0+git
# Board: ASUSTeK COMPUTER INC. CROSSHAIR VI HERO
# Kernel: 6.0.6-zen1-1-zen x86_64
# Processor: AMD Ryzen 7 1700 Eight-Core Processor (23/1/1)

This program will help you determine which kernel modules you need
to load to use lm_sensors most effectively. It is generally safe
and recommended to accept the default answers to all questions,
unless you know what you're doing.

Some south bridges, CPUs or memory controllers contain embedded sensors.
Do you want to scan for them? This is totally safe. (YES/no): 
Silicon Integrated Systems SIS5595...                       No
VIA VT82C686 Integrated Sensors...                          No
VIA VT8231 Integrated Sensors...                            No
AMD K8 thermal sensors...                                   No
AMD Family 10h thermal sensors...                           No
AMD Family 11h thermal sensors...                           No
AMD Family 12h and 14h thermal sensors...                   No
AMD Family 15h thermal sensors...                           No
AMD Family 16h thermal sensors...                           No
AMD Family 17h thermal sensors...                           Success!
    (driver `k10temp')
AMD Family 15h power sensors...                             No
AMD Family 16h power sensors...                             No
Hygon Family 18h thermal sensors...                         No
AMD Family 19h thermal sensors...                           No
Intel digital thermal sensor...                             No
Intel AMB FB-DIMM thermal sensor...                         No
Intel 5500/5520/X58 thermal sensor...                       No
VIA C7 thermal sensor...                                    No
VIA Nano thermal sensor...                                  No

Some Super I/O chips contain embedded sensors. We have to write to
standard I/O ports to probe them. This is usually safe.
Do you want to scan for Super I/O sensors? (YES/no): 
Probing for Super-I/O at 0x2e/0x2f
Trying family `National Semiconductor/ITE'...               No
Trying family `SMSC'...                                     No
Trying family `VIA/Winbond/Nuvoton/Fintek'...               No
Trying family `ITE'...                                      Yes
Found `ITE IT8665E Super IO Sensors'                        Success!
    (address 0x290, driver `to-be-written')
Probing for Super-I/O at 0x4e/0x4f
Trying family `National Semiconductor/ITE'...               No
Trying family `SMSC'...                                     No
Trying family `VIA/Winbond/Nuvoton/Fintek'...               No
Trying family `ITE'...                                      No

Some systems (mainly servers) implement IPMI, a set of common interfaces
through which system health data may be retrieved, amongst other things.
We first try to get the information from SMBIOS. If we don't find it
there, we have to read from arbitrary I/O ports to probe for such
interfaces. This is normally safe. Do you want to scan for IPMI
interfaces? (YES/no): 
Probing for `IPMI BMC KCS' at 0xca0...                      No
Probing for `IPMI BMC SMIC' at 0xca8...                     No

Some hardware monitoring chips are accessible through the ISA I/O ports.
We have to write to arbitrary I/O ports to probe them. This is usually
safe though. Yes, you do have ISA I/O ports even if you do not have any
ISA slots! Do you want to scan the ISA I/O ports? (yes/NO): 

Lastly, we can probe the I2C/SMBus adapters for connected hardware
monitoring devices. This is the most risky part, and while it works
reasonably well on most systems, it has been reported to cause trouble
on some systems.
Do you want to probe the I2C/SMBus adapters now? (YES/no): 
Using driver `i2c-piix4' for device 0000:00:14.0: AMD KERNCZ SMBus
Module i2c-dev loaded successfully.

Next adapter: NVIDIA i2c adapter 1 at b:00.0 (i2c-0)
Do you want to scan it? (yes/NO/selectively): 

Next adapter: NVIDIA i2c adapter 2 at b:00.0 (i2c-1)
Do you want to scan it? (yes/NO/selectively): 

Next adapter: NVIDIA i2c adapter 4 at b:00.0 (i2c-2)
Do you want to scan it? (yes/NO/selectively): 

Next adapter: NVIDIA i2c adapter 6 at b:00.0 (i2c-3)
Do you want to scan it? (yes/NO/selectively): 

Next adapter: NVIDIA i2c adapter 7 at b:00.0 (i2c-4)
Do you want to scan it? (yes/NO/selectively): 

Next adapter: NVIDIA i2c adapter 8 at b:00.0 (i2c-5)
Do you want to scan it? (yes/NO/selectively): 

Next adapter: NVIDIA i2c adapter 9 at b:00.0 (i2c-6)
Do you want to scan it? (yes/NO/selectively): 

Next adapter: SMBus PIIX4 adapter port 0 at 0b00 (i2c-7)
Do you want to scan it? (YES/no/selectively): 
Client found at address 0x52
Probing for `Analog Devices ADM1033'...                     No
Probing for `Analog Devices ADM1034'...                     No
Probing for `SPD EEPROM'...                                 Yes
    (confidence 8, not a hardware monitoring chip)
Client found at address 0x53
Probing for `Analog Devices ADM1033'...                     No
Probing for `Analog Devices ADM1034'...                     No
Probing for `SPD EEPROM'...                                 Yes
    (confidence 8, not a hardware monitoring chip)

Next adapter: SMBus PIIX4 adapter port 2 at 0b00 (i2c-8)
Do you want to scan it? (YES/no/selectively): 

Next adapter: SMBus PIIX4 adapter port 1 at 0b20 (i2c-9)
Do you want to scan it? (YES/no/selectively): 
Client found at address 0x4e
Probing for `National Semiconductor LM75'...                No
Probing for `National Semiconductor LM75A'...               No
Probing for `Dallas Semiconductor DS75'...                  No
Probing for `Analog Devices ADM1021'...                     No
Probing for `Analog Devices ADM1021A/ADM1023'...            No
Probing for `Maxim MAX1617'...                              No
Probing for `Maxim MAX1617A'...                             No
Probing for `Maxim MAX1668'...                              No
Probing for `Maxim MAX1805'...                              No
Probing for `Maxim MAX1989'...                              No
Probing for `Maxim MAX6642'...                              No
Probing for `Maxim MAX6655/MAX6656'...                      No
Probing for `TI THMC10'...                                  No
Probing for `National Semiconductor LM84'...                No
Probing for `Genesys Logic GL523SM'...                      No
Probing for `Onsemi MC1066'...                              No
Probing for `Maxim MAX1618'...                              No
Probing for `Maxim MAX1619'...                              No
Probing for `National Semiconductor LM82/LM83'...           No
Probing for `Maxim MAX6654'...                              No
Probing for `Maxim MAX6690'...                              No
Probing for `Maxim MAX6659'...                              No
Probing for `Maxim MAX6647'...                              No
Probing for `Maxim MAX6680/MAX6681'...                      No
Probing for `Maxim MAX6695/MAX6696'...                      No
Probing for `Texas Instruments TMP400'...                   No
Probing for `Texas Instruments TMP411C'...                  No
Probing for `Texas Instruments TMP421'...                   No
Probing for `Texas Instruments TMP422'...                   No
Probing for `Texas Instruments TMP435'...                   No
Probing for `Texas Instruments TMP441'...                   No
Probing for `Texas Instruments AMC6821'...                  No
Probing for `National Semiconductor LM95234'...             No
Probing for `National Semiconductor LM64'...                No
Probing for `National Semiconductor LM73'...                No
Probing for `Maxim MAX6633/MAX6634/MAX6635'...              No
Probing for `NXP/Philips SA56004'...                        No
Probing for `Fintek F75121R/F75122R/RG (VID+GPIO)'...       No
Probing for `Fintek F75111R/RG/N (GPIO)'...                 No
Probing for `ITE IT8201R/IT8203R/IT8206R/IT8266R'...        No


Now follows a summary of the probes I have just done.
Just press ENTER to continue: 

Driver `k10temp' (autoloaded):
  * Chip `AMD Family 17h thermal sensors' (confidence: 9)

Driver `to-be-written':
  * ISA bus, address 0x290
    Chip `ITE IT8665E Super IO Sensors' (confidence: 9)

Note: there is no driver for ITE IT8665E Super IO Sensors yet.
Check https://hwmon.wiki.kernel.org/device_support_status for updates.

No modules to load, skipping modules configuration.

Unloading i2c-dev... OK

Just type sensors.

$ sensors              
asus_wmi_sensors-virtual-0
Adapter: Virtual device
CPU Core Voltage:          1.09 V  
CPU SOC Voltage:           1.13 V  
DRAM Voltage:              1.35 V  
VDDP Voltage:            916.00 mV 
1.8V PLL Voltage:          1.83 V  
+12V Voltage:             11.97 V  
+5V Voltage:               4.99 V  
3VSB Voltage:              3.33 V  
VBAT Voltage:              3.18 V  
AVCC3 Voltage:             3.33 V  
SB 1.05V Voltage:          1.04 V  
CPU Fan:                 2360 RPM
Chassis Fan 1:           1080 RPM
Chassis Fan 2:              0 RPM
Chassis Fan 3:              0 RPM
AIO Pump:                   0 RPM
Water Pump:                 0 RPM
CPU OPT:                    0 RPM
Water Flow:                 0 RPM
CPU Temperature:          +32.0°C  
CPU Socket Temperature:   +33.0°C  
Motherboard Temperature:  +30.0°C  
Chipset Temperature:      +59.0°C  
Tsensor 1 Temperature:   +216.0°C  
CPU VRM Temperature:      +36.0°C  
Water In:                +216.0°C  
Water Out:               +216.0°C  
CPU VRM Output Current:    2.00 A  

nvme-pci-0100
Adapter: PCI adapter
Composite:    +38.9°C  (low  = -273.1°C, high = +81.8°C)
                       (crit = +84.8°C)
Sensor 1:     +38.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +43.9°C  (low  = -273.1°C, high = +65261.8°C)

k10temp-pci-00c3
Adapter: PCI adapter
Tctl:         +32.0°C  
$ sudo dmesg | eos-sendlog
https://0x0.st/oY3b.txt

$ journalctl -b -0 | eos-sendlog
https://0x0.st/oY3A.txt

$ journalctl -k --grep=mce | eos-sendlog
https://0x0.st/oY3a.txt

zen kernel ain’t the most stable thing in the universe, let’s say…Try LTS to exclude kernel.

Doesn’t appear to have picked up the cpu, no Tctl reading.

image

I see lots of errors related to docker and app image? :man_shrugging:

My Docker containers are stopped since several weeks ago.

I’m using this distro since a couple months ago, and it comes with a Firewall enabled, I’ve configured nothing but it seems it has certain problems with Docker, I’ve posted a case in StackOverflow.

About appimages, well I don’t know what’s going on but the contents of .gtkrc-2.0 is:

gtk-theme-name="Layan"
gtk-enable-animations=1
gtk-primary-button-warps-slider=0
gtk-toolbar-style=3
gtk-menu-images=1
gtk-button-images=1
gtk-cursor-theme-size=24
gtk-cursor-theme-name="Layan-white-cursors"
gtk-icon-theme-name="Tela"
gtk-font-name="Noto Sans,  10"

gtk-modules=appmenu-gtk-module

What is your hardware?

inxi -Faz | eos-sendlog

Edit: Oh see it in the log so never mind.

How do I test my memory?