Ollama-cuda not using gpu acceleration

rabcor · May 20, 2024, 5:28pm

Actually i figured out what was causing this issue, so ollama-cuda loads all it can of the model to the vram and then runs that off the gpu, then the rest of the model which didn’t fit in the vram it runs on the cpu, this makes it look like it’s not using the gpu but in reality it is (it’s just that the parts on the gpu process really fast while the rest process really slow), it just can’t fit the whole model into the vram.

You can try using a smalelr model, one that fits entirely on your vram, to confirm this.