Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I think it was swapping every now and then causing pauses.

What you're seeing is probably "context swapping", not swapping memory to disk. The model can't keep the entire history of its output in context at all times, so LLaMA periodically resets the context and re-prompts it with a portion of its recent output.

https://github.com/ggerganov/llama.cpp/blob/f4c55d3bd7e124b1...



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: