I've tried running Llamafile on my Lenovo Legion Pro 5 laptop with 8GB VRAM, but it has a dashboard that shows the GPU and CPU utilisation in real time, and almost all the processing is done on the CPU. Is there a way to shift the processing to the GPU like using gpt-fast?
https://pytorch.org/blog/accelerating-generative-ai-2/