This project has been a roll-coaster and one of the hardest things I have attempted my goal when I set out was simple to run local Ollama models on a cheap £60 k80 GPU. My first attempted end very quickly when I found out that the GPU I got was faulty and it spewed out smoke within a second of first boot.
After I confirmed the rest of my server was unharmed I got a full refund and purchased another K80. This one did not fail on first boot and so I was able to start setting up my GPU. I spent the next month trying on and off failing every time.
I tired so many things I am not going to list them all here instead I will provided Instructions to try and recreate what I finally did:
- https://www.youtube.com/watch?v=_hOBAGKLQkI&t=616s this video from techno Tim walks through host setup – you also need to ensure you have above 4GB decoding.
- Setup a new VM with no ballooning, host cpu, q35 machine and the pcie device.
- Install Ubuntu 20.04
- sudo apt install -y nvidia-driver-470
sudo apt install -y software-properties-common lsb-release
wget -O - https://apt.kitware.com/keys/kitware-archive-latest.asc 2>/dev/null | gpg --dearmor - | sudo tee /etc/apt/trusted.gpg.d/kitware.gpg >/dev/null
sudo apt-add-repository "deb https://apt.kitware.com/ubuntu/ $(lsb_release -cs) main"
sudo apt update
sudo apt install cmake
- sudo apt install gcc-10
- sudo apt install golang
- wget https://developer.download.nvidia.com/compute/cuda/11.4.0/local_installers/cuda_11.4.0_470.42.01_linux.run
- sudo sh cuda_11.4.0_470.42.01_linux.run –silent –toolkit –samples
- git clone https://github.com/idream3000/ollama37.git
- cd ollama37
- nano CMakeLists.txt
- change “native” on line 73 to “37”
- save and exit
- cmake -B build
- cmake –build buildy
- Environment=”OLLAMA_HOST=0.0.0.0:11434″
- Environment=”CUDA_VISIBLE_DEVICES=0,1″
- Environment=”OLLAMA_USE_CUDA=1″
- Environment=”PATH=/usr/local/cuda/bin:/usr/bin:/bin”
- Environment=”LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/lib:/lib”
- go run . serve