PyTorch on AMD-GPU

PyTorch on AMD-GPU

Ezabatutako erabiltzailea -
Erantzun kopurua: 0

Hello everyone,

has anyone experience in using PyTorch on an AMD-GPU? I'm trying to get it running but at the moment I'm facing some issues. Any help to get it running is appreciated! In the following, I provide some info about what I already tried.

To be clear: it is not really necessary to me to solve the errors I got, I just need to be able to use PyTorch on my AMD-GPU. If somebody has an easy and straight forward way to achieve this, then this solution would be very appreciated. :D

Also, I'm new to linux and only switched from windows as it seems like linux is the mainly supported OS for using PyTorch on an AMD-GPU. I'm aware of pytorch-directml which enables to use AMD-GPUs on Windows, however, when running the python code I'm using for my master thesis, I get an error that a certain function is not implemented. I think this is due to the fact that pytorch-directml is still a pre-release.

- As far as I know, it is necessary to install ROCm in order to use PyTorch with AMD. Is this correct or are there any other/easier ways?

- If I want to use pip to install PyTorch for AMD, it seems like I'm limited to ROCm 4.2 (stable) and ROCm 4.3 (preview). In terms of GPU-Support: According to the AMD docs (https://docs.amd.com/bundle/Deprecations/page/Deprecations_and_Warnings.html) I can't go higher than ROCm 4.5 since it is the last release which supports my GPU (Vega 10 Chip). So it seems to me that these are the ROCm versions which can work for me.

I tried the following two approaches:

1. I tried using ROCm 4.2 which reqiures the 5.8.0-48 Kernel for Ubuntu 20.04.2. So far, the only way I found to achieve this is by downloading the latest Ubuntu release and then downgrading the kernel by downloading the specific kernel version and installing it via terminal. However, this seems to break some dependencies as I my second screen is suddenly not supported anymore and the whole UI is very slow and laggy (I was also warned by various websites that this will very probably happen - and it did :D) and I was not able to fix it. Also, the ROCm 4.2 installation did not succeed due to dependency errors which I could not resolve. I followed the steps in the AMD ROCm manual: https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html

(Installing ROCm 4.2 on the 5.13 Kernel, which was the default one I had after installing Ubuntu, also gave an error.)

2. I tried using ROCm 4.5 which requires Ubuntu 20.04.3 LTS with 5.11 HWE Kernel. My Ubuntu installation indeed gives me the opportunity to select the 5.11 Kernel from the bootloader directly without any manual kernel installation like in the previous approach. However, the ROCm installation script also gives an error:

dpkg: error processing package amdgpu-dkms (--configure):
 installed amdgpu-dkms package post-installation script subprocess returned error exit status 10

[...]

Errors were encountered while processing:
 amdgpu-dkms

And again, suddenly the second screen is not working anymore and the whole UI is very slow and laggy. Also, the amdgpu-driver is not listed anymore when running "sudo lshw -class display". Re-installing the amdgpu-driver does also not work as it uninstalls the ROCm stuff again and the UI is still broken. :D


I think running PyTorch for AMD in Windows would be the easiest solution for me since it is my main OS, but getting it running in another way (= on Linux) is also totally fine.

I hope anyone can help and thanks in advance!