I read the opposite, that you don't have to be locked-in Ollama's registry if you don't want to.
Could you share a bit more of what you do with llama.cpp? I'd rather use llama-serve but it seems to require a good amount of fiddling with the parameters to have good performance.
Recently llama.cpp made a few common parameters default (-ngl 999, -fa on) so it got simpler: --model and --context-size and --jinja generally does it to start.
We end up fiddling with other parameters because it provides better performance for a particular setup so it's well worth it. One example is the recent --n-cpu-moe switch to offload experts to CPU while filling all available VRAM that can give a 50% boost on models like gpt-oss-120b.
Could you share a bit more of what you do with llama.cpp? I'd rather use llama-serve but it seems to require a good amount of fiddling with the parameters to have good performance.