Amdgpu failure during boot process

Hi,

When I try to boot up the latest live image, I am unable to get to the actual OS. It gets stuck while trying to start the amd gpu.


The graphics card is amd rx 540.

Some more background to this issue. I have used endeavouros in the past. Back then, the gpu was correctly initialized and working as expected. However, after an update, it stopped working and the OS did not recognize my gpu. I tried both with both DRI_PRIME=1 as I used to earlier and trying to check the output of vulkaninfo, but the gpu was just not recognized. I remember this was definitely a case of some update breaking this as I downgraded the packages to an older date and it was recognized again. I thought this was a temporary issue and removed endeavour os, in the hope that it would get fixed. But further updates have not worked. The Atlantis release boots to the OS, but still does not recognize the gpu.

To be clear, this is not an issue with only EndeavourOS, or arch linux or other arch based distros like Garuda or Manjaro. I have tried to use Fedora as well, and that does not recognize the gpu as well. However, and this is the catch, ubuntu based distros seem to have no trouble recognizing the gpu. I have tried Linux Mint, Ubuntu, Kubuntu, and Pop OS and all of them recognize the gpu. Even after I manually installed the latest kernel to Ubuntu, the gpu works as expected.

I have tried comparing the code for linux and linux-firmware packages that fedora and ubuntu are using to see if there was some issue there, but I couldn’t find any meaningful change, at least in relation to the amdgpu module that is causing the issue.

If anyone has any idea how to fix this issue, or anything I can try to make the gpu recognized, I would be grateful. Thanks a lot.

Have you checked the grub command line on ubuntu based distros to see what parameters they may be using to boot on?

Edit: Are there any Bios updates for your hardware?

Thanks for taking the time to help me.

Ubuntu doesn’t seem to be using any special parameters to boot.

$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-5.17.2-051702-generic root=UUID=0b879ce2-e363-484e-902f-3474257a13f3 ro quiet splash vt.handoff=7

The only odd one here is vt.handoff=7 but that is specific to Ubuntu and doesn’t seem to be something that affects the gpu.

I am on the latest BIOS available for the laptop on the manufacturer’s website. However, this issue was seen in earlier versions of BIOS as well.

I wish i could be more helpful but have you tried amdgpu.dc=0 as a kernel parameter in the default grub command line? You could add it even on the live ISO when you boot edite the brub menu by pressing e and add it then i think F10 to continue booting or press enter? :thinking: See if that makes any difference.

I have tried it previously with manjaro, but it did not solve the problem. I have also tried amdgpu.dpm=0 flag. When I tried this, it was able to recognize the gpu and I could use it. However, this resulted in very bad performance. My question on the manjaro forum where this was suggested - link

Edit - I tried it with the live image of EndeavourOS and it still got stuck in the above stage, didn’t boot to the OS.

Have you tried these kernel parameters to see if it will boot.

options amdgpu si_support=1 options amdgpu cik_support=1

Or

options radeon si_support=0 options radeon cik_support=0

Sorry, I forgot to check this answer in a while. Meanwhile, I tried a few other things, and I found one thing that works for me. Setting amdgpu.aspm=0 kernel parameter allowed the detection of the gpu and now the live iso boots. This also works in Fedora so I think this is an underlying problem. Seems like ASPM is something related to power management which probably sets it to some lower power state in which I can’t use the GPU ever. Perhaps Ubuntu by default turns it off for amdgpu kernel driver.

So, in conclusion, my problem seems to be fixed. I apologize again for wasting your time on this issue which seems to be pretty specific to my case. And thanks a lot again for taking the time to help me.

Edit - Link where I found this solution, though it seems to be for a completely different problem about the device hanging. I would also like to point out that there is a certain reduction in performance with this so maybe the power management is something important and this is more of a workaround.

This is similar but turns off active state power management. Glad you found what works.

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.