I've been running it with llama-server from llama.cpp (compiled for CUDA backend...

mistercheph · 2026-01-19T18:36:40 1768847800

I think the recently introduced -fit option which is on by default means it's no longer necesary to -ngl, can also probably drop -c which is "0" by default and reads metadata from the gguf to get the model's advertised context size

johndough · 2026-01-19T21:00:20 1768856420

I had already removed three parameters which were no longer needed, but I hadn't yet heard that the other two had also become superfluous. Thank you for the update! llama.cpp sure develops quickly.