Issues with WoW in Linux

Broke this out into its own topic in the gaming forum, as it got flagged as off-topic in the original place (whatever, I guess, it was really just in passing, but I’ll expand on it here).

Anyhoo, I’ve been playing WoW on Linux for the past 18 months, but for some reason all of a sudden about 3 weeks ago it started throwing up this error:

It usually happens after about 10 minutes or so, and it now persists across various distros, game re-installations etc.

Thought it was faulty ram, so bought another 16Gb and swapped it out: Same result.

Thought it might be the GPU: Took it out and ran it on the i7-7700k’s built in intel chip - still crashes.

Tried it on different SSD/HDDs, still crashing.

Tried various versions of Wine & DXVK, disabling DXVK/VK3D altogether, even disabling all addons in WoW, but nothing works.

The only components that are still in common are the motherboard and CPU, and I’m tending to think it might be the CPU, because I can see microcode errors when starting any arch-based linux distro (Endeavour/Manjaro/Arch).

Guess I’ll have to stick with playing on Windows until I get a new PC, which is depressing. That said I can still use Linux for everything else (browsing/office etc), it’s just WoW constantly crashing…

I’ve had a look at temps, and when the system crashes temps are absolutely normal. Nothing anywhere near hot enough to crash like this, with both CPU and GPU in and around 40-50c. Used a hardware monitor on my 2nd screen, and the DXVK Mangohud to monitor temps 2 different ways, just in case. I’ve also got my fans set to turbo mode now, keeping everything cool (I use water cooling for the CPU and it’s set to TPU2 in the Bios).

Unfortunately the last BIOS update my motherboard had was in April of 2018, which was about 6 months after I bought the parts. Whether I use the XMP settings or manually set everything it still crashes. Interestingly XMP actually underclocks my ram, which is odd. I have (now) 32Gb of 3200 DDR4 ram, which gets set to 2933 by XMP. Still crashes. I’ve also set it to run at 2133 and still it crashes. But I know it’s not the memory per se, because I’ve run different configurations from 1 stick of 8Gb to swapping out the ram entirely and runnning on the new 16Gb alone. No luck.

As mentioned, the only other issue is that whenever I’m booting an Arch-based distro, such as Endeavour/Manjaro/Vanilla Arch, I get Microcode errors during the OS boot process such as the following, which don’t appear during Ubuntu-based distro boots (Xubuntu/Pop!_OS etc), but they may just hide the same messages during boot:

This is an i7-7700k running at 4.2Ghz, turboing when needed to 4.5Ghz. XMP would push that further to 4.8Ghz, but I try to rein it in so it doesn’t fry itself.

The only other thing I should mention is the dreaded Windows 10 (Pro). It works fine. No issues whatsoever. WoW runs great, never crashes and I don’t see any microcode errors in the event/system logs.

Which motherboard?
And exact models of RAM parts please :slight_smile:

1 Like

MCE errors is not good for your motherboard
( return inxi -Fxxza if possible )

1 Like

I downgraded my Bios to see if the most recent version was being the issue, so far no crashes, but it’s early days yet.

My ram is:

4 sticks of Corsair 8Gb 3200 DDR4 ram:
2 x CMK16GX4M2B3200C16
2 x CM4X8GD3200C16K4

Both sets are 16-18-18-36, and I’ve tried both sets independently, and 1 stick at a time of both in different DIMM slots.

inxi output:

System:    Kernel: 5.7.10-arch1-1 x86_64 bits: 64 compiler: gcc v: 10.1.0 
           parameters: BOOT_IMAGE=/boot/vmlinuz-linux 
           root=UUID=d30052f5-3119-450f-872e-e5b9c4cda8b3 rw quiet loglevel=3 nowatchdog 
           Desktop: Xfce 4.14.2 tk: Gtk 3.24.20 info: xfce4-panel wm: xfwm4 dm: LightDM 1.30.0 
           Distro: EndeavourOS 2020.07.15 
Machine:   Type: Desktop Mobo: ASUSTeK model: STRIX Z270E GAMING v: Rev 1.xx serial: <filter> 
           UEFI: American Megatrends v: 1203 date: 12/26/2017 
CPU:       Topology: Quad Core model: Intel Core i7-7700K bits: 64 type: MT MCP arch: Kaby Lake 
           family: 6 model-id: 9E (158) stepping: 9 microcode: 7C L2 cache: 8192 KiB 
           flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 bogomips: 67224 
           Speed: 3000 MHz min/max: 800/4500 MHz Core speeds (MHz): 1: 3000 2: 3000 3: 3000 
           4: 3000 5: 3000 6: 3000 7: 3000 8: 3000 
           Vulnerabilities: Type: itlb_multihit status: KVM: Vulnerable 
           Type: l1tf mitigation: PTE Inversion 
           Type: mds 
           status: Vulnerable: Clear CPU buffers attempted, no microcode; SMT vulnerable 
           Type: meltdown mitigation: PTI 
           Type: spec_store_bypass status: Vulnerable 
           Type: spectre_v1 mitigation: usercopy/swapgs barriers and __user pointer sanitization 
           Type: spectre_v2 mitigation: Full generic retpoline, STIBP: disabled, RSB filling 
           Type: srbds status: Vulnerable: No microcode 
           Type: tsx_async_abort 
           status: Vulnerable: Clear CPU buffers attempted, no microcode; SMT vulnerable 
Graphics:  Device-1: AMD Vega 10 XL/XT [Radeon RX Vega 56/64] vendor: Sapphire Limited 
           driver: amdgpu v: kernel bus ID: 03:00.0 chip ID: 1002:687f 
           Display: x11 server: X.Org 1.20.8 driver: amdgpu,ati unloaded: fbdev,modesetting,vesa 
           display ID: :0.0 screens: 1 
           Screen-1: 0 s-res: 3840x1080 s-dpi: 96 s-size: 1016x285mm (40.0x11.2") 
           s-diag: 1055mm (41.5") 
           Monitor-1: DisplayPort-0 res: 1920x1080 hz: 60 dpi: 96 size: 510x287mm (20.1x11.3") 
           diag: 585mm (23") 
           Monitor-2: HDMI-A-0 res: 1920x1080 hz: 60 dpi: 82 size: 598x336mm (23.5x13.2") 
           diag: 686mm (27") 
           OpenGL: renderer: Radeon RX Vega (VEGA10 DRM 3.37.0 5.7.10-arch1-1 LLVM 10.0.0) 
           v: 4.6 Mesa 20.1.4 direct render: Yes 
Audio:     Device-1: Intel 200 Series PCH HD Audio vendor: ASUSTeK driver: snd_hda_intel 
           v: kernel bus ID: 00:1f.3 chip ID: 8086:a2f0 
           Device-2: AMD Vega 10 HDMI Audio [Radeon Vega 56/64] driver: snd_hda_intel v: kernel 
           bus ID: 03:00.1 chip ID: 1002:aaf8 
           Device-3: Blue Microphones Yeti Stereo Microphone type: USB driver: snd-usb-audio 
           bus ID: 1-6:3 chip ID: b58e:9e84 
           Sound Server: ALSA v: k5.7.10-arch1-1 
Network:   Device-1: Intel Ethernet I219-V vendor: ASUSTeK driver: e1000e v: 3.2.6-k port: f000 
           bus ID: 00:1f.6 chip ID: 8086:15b8 
           IF: enp0s31f6 state: up speed: 1000 Mbps duplex: full mac: <filter> 
           Device-2: Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter vendor: ASUSTeK 
           driver: ath10k_pci v: kernel port: e000 bus ID: 06:00.0 chip ID: 168c:003e 
           IF: wlan0 state: down mac: <filter> 
Drives:    Local Storage: total: 5.12 TiB used: 1.19 TiB (23.2%) 
           SMART Message: Unable to run smartctl. Root privileges required. 
           ID-1: /dev/nvme0n1 vendor: A-Data model: SX8200PNP size: 476.94 GiB block size: 
           physical: 512 B logical: 512 B speed: 31.6 Gb/s lanes: 4 serial: <filter> 
           rev: 42B2S7JA scheme: MBR 
           ID-2: /dev/sda vendor: Western Digital model: WD20EZRX-00D8PB0 size: 1.82 TiB 
           block size: physical: 4096 B logical: 512 B speed: 6.0 Gb/s rotation: 5400 rpm 
           serial: <filter> rev: 0A80 scheme: GPT 
           ID-3: /dev/sdb vendor: Western Digital model: WD2003FZEX-00Z4SA0 size: 1.82 TiB 
           block size: physical: 4096 B logical: 512 B speed: 6.0 Gb/s rotation: 7200 rpm 
           serial: <filter> rev: 1A01 scheme: MBR 
           ID-4: /dev/sdc vendor: Crucial model: CT120BX500SSD1 size: 111.79 GiB block size: 
           physical: 512 B logical: 512 B speed: 6.0 Gb/s serial: <filter> rev: R013 scheme: GPT 
           ID-5: /dev/sdd vendor: Samsung model: SSD 860 EVO 1TB size: 931.51 GiB block size: 
           physical: 512 B logical: 512 B speed: 6.0 Gb/s serial: <filter> rev: 1B6Q scheme: GPT 
Partition: ID-1: / raw size: 111.49 GiB size: 109.24 GiB (97.98%) used: 14.73 GiB (13.5%) 
           fs: ext4 dev: /dev/sdc2 
Swap:      Alert: No Swap data was found. 
Sensors:   System Temperatures: cpu: 50.0 C mobo: N/A gpu: amdgpu temp: 56 C 
           Fan Speeds (RPM): N/A gpu: amdgpu fan: 1345 
Info:      Processes: 247 Uptime: 8m Memory: 31.30 GiB used: 4.01 GiB (12.8%) Init: systemd 
           v: 245 Compilers: gcc: 10.1.0 Packages: pacman: 972 lib: 350 Shell: Bash v: 5.0.17 
           running in: xfce4-terminal inxi: 3.1.05 

Also running ras-mc-ctl --errors returns a shedload of errors like this:

1397 2020-07-30 11:46:51 +0100 error: Internal parity error, mcg mcgstatus=0, mci Corrected_error Error_enabled, mcgcap=0x00000c0a, status=0x9000004000010005, walltime=0x5f22970d, cpu=0x00000001, cpuid=0x000906e9, apicid=0x00000002

I’m assuming my CPU is not long for this world by the looks of it…

@Sar
Ok i wanna make sure you follow extreme RAM 101, which in my experience are:

  1. :white_check_mark: Make sure your motherboard supports at least 100 mHz more, than your RAM xmp

  2. :white_check_mark: Make sure you have i7 with K (i5/ i3 usually have weaker controllers unable to really handle extreme memory, compared to i7)

  3. :x: To be completely safe - you must always manually set voltages and main set of timings (16-18-18-36), XMP can’t be considered 100% stable (at least it wasn’t in DDR3 times, i’ve heard that DDR4 times are better, but…Wouldn’t bet on it.)

  4. :x: RAM pack (if more than one stick) must be bought from 1 package, otherwise it’s not guaranteed to work well on extreme frequency / timings for various reasons:

    • Same pack got tested properly with each module together
    • Small differences in tech-process, can lead to additional overheat / instability under extreme conditions, even when you use same chips with same models bought independently

Good thinking, new != always best in BIOS world. :wink:

This is an i7-7700k running at 4.2Ghz, turboing when needed to 4.5Ghz. XMP would push that further to 4.8Ghz, but I try to rein it in so it doesn’t fry itself.

That is my concern here, it absolutely should NOT go over default turbo of 4.5 with or without XMP…
For this CPU alone 4.8GHz is not a problem at all, and your temps are all good, but in conjunction with extreme DDR4 RAM (actually your case is far from today’s standard of extreme with 3200mHz, but still by definition it is) - it might be little hard for CPU / motherboard to chew this turbo, plus personally i have no experience with ASUSTeK motherboards, maybe they lie in some specifications like a lot of chinese companies do, but let’s assume it’s not the case here.

Both sets are 16-18-18-36, and I’ve tried both sets independently, and 1 stick at a time of both in different DIMM slots.

So, both RAM have exactly same voltage and timings on side stickers?
If so, there’s no need to set independently, you can use master voltage / timings coz subtimings which could be in play with tech-process differences you wouldn’t guess anyway…

Not to bash or anything, but A-Data & Crucial are not most reliable SSDs out there, hopefully they’re not failing (which may lead to all kinds of random crazy stuff).

So just to exclude all possibilities i would advice:

  1. Remove all unnecessary peripherals including drives (to exclude them out of equation)
  2. Reset your BIOS by removing battery for 1 minute
  3. Get your battery back in place, and load “Safe defaults” (always do that after flashing BIOS btw, when there are some problems like that)
  4. Disable TurboBoost
  5. Setup RAM according to 3 above :point_up_2: , save, reboot
  6. Load from live USB and use some heavy stress test to see if there are any errors for ~1-2 hours of heavy stress testing

In this test use only 2 x CMK16GX4M2B3200C16.
If it passes fine - add 2 x CM4X8GD3200C16K4 and stress again for ~1-2 hours.
Then, if it’s fine - enable Turbo boost (i would still advice not to, since manual overclock is more precise and less stresfull for CPU and system in general, but if you need it…)

If tests are fine - then it’s drives (SSDs to be precise) start with Crucial and try to exclude them one by one.

I don’t know, doubt it…Although i’m not expert on mce errors, perhaps @Stephane will help you here.

I’m also not sure that this is the cause of WoW crash here, still system must be intact so it’s good for you to check all this things :+1:

P.S. Oh, and start tests with single screen too, just in case as minimal as you can :slight_smile:
Once upon a time, when i wasn’t experienced i throw away videocard, just to find out that there was a freaking dvi CABLE problem :rofl:

4 Likes

Thank you very much for the detailed reply Key, I’ll have a run at that later today. Still running WoW atm to see if it remains stable for the time being.

Not overly concerned with the MCE errors, because it could just be something that’s not overly worrying creating errors that look like it, and they might’ve been happening all the time anyway, and it’s only recently I’ve noticed it as I’ve been looking out for reasons for the WoW crashes.

1 Like

Definitely not good sign anyway, shouldn’t really get them

1 Like

Agreed, but if Windows isn’t crashing or showing anything in the error logs because of it then I’m guessing it’s a linux specific thing?

Either way, been in WoW for about 45 minutes so far and no crashes yet, which is a good sign, but I’ll run through your suggested tests after lunch and see how I get on :slight_smile:

1 Like

Depends, Windows logs are…Well, let’s just say, unless you run some stress test like OCCT with large data set - you won’t see any, unless there is something really hardcore which will lead to BSOD, but those errors which are silent in Windows without stess-test - could lead to loss of data and program crashes.

Linux is way more strict, which is good :slight_smile:

2 Likes

Linux makes me go wow, every single day.

1 Like

come now, there must be some pfff days too

1 Like

XMP can’t be considered 100% stable

This

Good thinking, new != always best in BIOS world.

This #2

it absolutely should NOT go over default turbo of 4.5 with or without XMP

ASUSTeK motherboards

ASUS absolutely LOVES to enable “Auto Overclocking” on their boards. I love their hardware, and their motherboards are exceptionally good at overclocking, but I hate how they always do that by default.

1 Like

Oh, i thought ASUSTeK is some “more chinese” ASUS imposter :laughing:

Yep…never a good idea from the go :upside_down_face:

Some memory modules won’t run at spec speeds unless XMP is enabled: the manufacturer expects the users to enable XMP so they can get the performance they have paid for. Apparently, as it has been in the news recently, Intel refuses to cover parts under warranty if overclocking has has been used and they include XMP in there.

Really? There are modules which you can’t run manually?? :exploding_head:
Can you please give an example?

That was always the case for any warranty though…Well unless they can’t otherwise prove it of course :upside_down_face:

I meant you are forced to do overclocking and they have an XMP profile set up to allow nominal performance. You can definitely do that manually. I didn’t realise you are against the automated part of the oveclocking. But anyway, you’d argue the manufacturer knows the tolerances of their products best so the XMP should be the safest.

The problem with that is motherboard. For example, some early AMD X370 chipsets would not go to XMP settings. Some RAM will not reach XMP speed and timings on certain boards, and I don’t think I’ve ever had a client in my line of work who’s actually matched their RAM with the RAM tested on their motherboard manufacturer’s QVL.

The memory should be able to work at the XMP specified settings, that’s what I’m saying.
It’s true there are many issues with motherboards supporting those settings. For example my machine won’t POST when I enable the XMP profile on the RAM, even if the profile sets the memory at the frequency both the RAM manufacturer and the laptop manufacturer support. But the thing is, it won’t post even if i set the same settings manually. Nor even when I set them to some more relaxed settings.
For this reason I can use the 2666mhz memory I have bought at only 2400Mhz.

2 Likes

Well i mean…In perfect world, for newbies maybe it’s safest, but in reality even then - most of times it is just not the case :frowning:

Makes me extremely sad btw.

So technically speaking at least for now, with rare exceptions only XMP = off may be considered 100% safe (given that motherboard and CPU at least supports default specs, of course) :slight_smile:

1 Like