Keep in mind that gguf/llama.cpp, although highly performant and portable, is not the best performing way to launch certain models if you have a GPU. (even though llama.cpp does support GPU acceleration)
ExLLAmA v2 + elx2 quantization, and maybe tensorrt-llm might be the contender for the top performer
Almost none of you already have python. Download exl2, exui from github and run a few terminal commands. This let's me run the 120b param models, which won't fit in vram if I use llamacpp
Panchovix/goliath-120b-exl2 (there's a different branch for each size)
Some of them I've had to do myself eg. I wanted a Q2 GGUF of Falcon 180b
There's a guy on huggingface called "TheBloke" who does GGUF, AWQ and GPTQ for most models. For exl2, you can usually just search for exl2 and find them.
ExLLAmA v2 + elx2 quantization, and maybe tensorrt-llm might be the contender for the top performer