Zephyr 7B – Mistral Finetune that responds like ChatGPT

brucethemoose2 · on Oct 15, 2023

IMO the title is a bit misleading:

> Zephyr-7B-α has not been aligned to human preferences with techniques like RLHF or deployed with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so). It is also unknown what the size and composition of the corpus was used to train the base model (mistralai/Mistral-7B-v0.1), however it is likely to have included a mix of Web data and technical sources like books and code.

Its very deliberately unlike ChatGPT, which is what makes its performance interesting.

nullc · on Oct 16, 2023

A machine that follows instructions, refreshing.

So sad that this is where we are and that some companies and the doomsday cultist partners are pushing to enshrine the sad state into law.

Flux159 · on Oct 15, 2023

Online chat interface here: https://huggingface.co/spaces/HuggingFaceH4/zephyr-chat

You should be able to run this locally on CPU or GPU as long as you have 16GB RAM. Less if using a quantized model like https://huggingface.co/TheBloke/zephyr-7B-alpha-GGUF with llama.cpp

brucethemoose2 · on Oct 15, 2023

And this will fit in 6GB of VRAM with very little quality loss:

https://huggingface.co/LoneStriker/zephyr-7b-alpha-4.0bpw-h6...

7734128 · on Oct 15, 2023

I tried it out and I am super impressed by something so small. It hallucinated a lot, but formulated its incorrect answers wonderfully.

It's certainly in the realm of being able to voice a NPC in an RPG game by now, as occasional hallucinations doesn't really matter there.

brucethemoose2 · on Oct 15, 2023

These are not even the best 7B models for an RPG. The RP Llama/Mistral finetunes should do much better.

MuffinFlavored · on Oct 16, 2023

> It hallucinated a lot

How do we (the "LLM community") solve this/overcome this?

brucethemoose2 · on Oct 16, 2023

This is kind of an inherent property of the llama architecture.

Maybe new architectures will be better, but companies will likely be making the best foundational models.

In the meantime, you can hook it up to a vectordb and ask the llm to check itself with zero temperature, but that only helps so much.

7734128 · on Oct 16, 2023

I suggest that we make a universal set of truths of everything which the LLM can compare its answer to. Perhaps a relational database? Or just one huge JSON?

behnamoh · on Oct 16, 2023

million dollar startup idea.

matteoraso · on Oct 15, 2023

looks at collection of local LLMs Eh, one more couldn't hurt.

behnamoh · on Oct 16, 2023

I wonder if I should delete the old ones. I bought a 4T external SSD exactly for LLMs.