Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Zephyr 7B – Mistral Finetune that responds like ChatGPT (huggingface.co)
37 points by Flux159 on Oct 15, 2023 | hide | past | favorite | 12 comments


IMO the title is a bit misleading:

> Zephyr-7B-α has not been aligned to human preferences with techniques like RLHF or deployed with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so). It is also unknown what the size and composition of the corpus was used to train the base model (mistralai/Mistral-7B-v0.1), however it is likely to have included a mix of Web data and technical sources like books and code.

Its very deliberately unlike ChatGPT, which is what makes its performance interesting.


A machine that follows instructions, refreshing.

So sad that this is where we are and that some companies and the doomsday cultist partners are pushing to enshrine the sad state into law.


Online chat interface here: https://huggingface.co/spaces/HuggingFaceH4/zephyr-chat

You should be able to run this locally on CPU or GPU as long as you have 16GB RAM. Less if using a quantized model like https://huggingface.co/TheBloke/zephyr-7B-alpha-GGUF with llama.cpp


And this will fit in 6GB of VRAM with very little quality loss:

https://huggingface.co/LoneStriker/zephyr-7b-alpha-4.0bpw-h6...


I tried it out and I am super impressed by something so small. It hallucinated a lot, but formulated its incorrect answers wonderfully.

It's certainly in the realm of being able to voice a NPC in an RPG game by now, as occasional hallucinations doesn't really matter there.


These are not even the best 7B models for an RPG. The RP Llama/Mistral finetunes should do much better.


> It hallucinated a lot

How do we (the "LLM community") solve this/overcome this?


This is kind of an inherent property of the llama architecture.

Maybe new architectures will be better, but companies will likely be making the best foundational models.

In the meantime, you can hook it up to a vectordb and ask the llm to check itself with zero temperature, but that only helps so much.


I suggest that we make a universal set of truths of everything which the LLM can compare its answer to. Perhaps a relational database? Or just one huge JSON?


million dollar startup idea.


looks at collection of local LLMs Eh, one more couldn't hurt.


I wonder if I should delete the old ones. I bought a 4T external SSD exactly for LLMs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: