cross-posted from: https://lemmy.ml/post/37817953

Hi all, when I am using software with high gpu load(in the case AI model). It also happens with game. It just kinda happens after a random amount of with games(I can play for like 30 mins then crash or sometime not at all).

here is my journalctl log:

Oct 20 12:57:18 Linux kernel: amdgpu 0000:10:00.0: amdgpu: Dumping IP State
Oct 20 12:57:18 Linux kernel: amdgpu 0000:10:00.0: amdgpu: Dumping IP State Completed
Oct 20 12:57:18 Linux kernel: amdgpu 0000:10:00.0: amdgpu: [drm] AMDGPU device coredump file has been created
Oct 20 12:57:18 Linux kernel: amdgpu 0000:10:00.0: amdgpu: [drm] Check your /sys/class/drm/card1/device/devcoredump/data
Oct 20 12:57:18 Linux kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.1.1 timeout, signaled seq=618, emitted seq=620
Oct 20 12:57:18 Linux kernel: amdgpu 0000:10:00.0: amdgpu:  Process python pid 4571 thread python pid 5777
Oct 20 12:57:18 Linux kernel: amdgpu 0000:10:00.0: amdgpu: GPU reset begin!
Oct 20 12:57:18 Linux kernel: amdgpu 0000:10:00.0: amdgpu: device lost from bus!
Oct 20 12:57:18 Linux kernel: amdgpu 0000:10:00.0: [drm] device wedged, but recovered through reset
Oct 20 12:57:18 Linux kernel: amdgpu 0000:10:00.0: [drm] *ERROR* [CRTC:61:crtc-0] flip_done timed out

I tried to check the path /sys/class/drm/card1/device/devcoredump/data after reboot, but there isn’t any thing(in fact, devcoredump folder dont even exist.

My specs: Distro: Arch Kernel: 6.17.3.arch2-1 Driver: Mesa 1:25.2.4-2 Gpu: rx 580 Cpu: r5 5500 PSU: EVGA 650 N1 650w I am on latest version of my bios)

Edit: my

Is there anything I can do to diagnose the issue? Any help is appreciated. Thanks you!

Solved my GPU is dead

  • Mike Wooskey@lemmy.thewooskeys.com
    link
    fedilink
    English
    arrow-up
    3
    ·
    1 day ago

    When I experienced the same symptoms, i eventually found out if was because ROCm didn’t support having an AMD GPU as well as an AMD iGPU (iGPU is an integrated GPU, on the motherboard). Once i disabled the iGPU, those symptoms stopped.

    l don’t remember how i disabled the iGPU. Might have been in the bios settings, might have been a kernel parameteretc in /default/grub.

    If it doesn’t fix your issue, you can just re-enable the iGPU.

    • Kiuyn@lemmy.mlOP
      link
      fedilink
      arrow-up
      1
      ·
      1 day ago

      Hi ty, for the comment, I don’t have IGPU though, so I don’t think it is my issue.