llamafile is designed with the hope of being a permanently working artifact wher...

llamafile is designed with the hope of being a permanently working artifact where upgrades are optional. You can upgrade to new llamafile releases in two ways. The first, is you can redownload the full weights I re-upload to Hugging Face with each release. However you might have slow Internet. In that case, you don't have to re-download the whole thing to upgrade.

What you'd do instead, is first take a peek inside using:

    unzip -vl wizardcoder-python-13b-main.llamafile
    [...]
           0  Stored        0   0% 03-17-2022 07:00 00000000  .cosmo
          47  Stored       47   0% 11-15-2023 22:13 89c98199  .args
    7865963424  Stored 7865963424   0% 11-15-2023 22:13 
    fba83acf  wizardcoder-python-13b-v1.0.Q4_K_M.gguf
    12339200  Stored 12339200   0% 11-15-2023 22:13 02996644  ggml-cuda.dll

Then you can extract the original GGUF weights and our special `.args` file as follows:

    unzip wizardcoder-python-13b-main.llamafile wizardcoder-python-13b-v1.0.Q4_K_M.gguf .args

You'd then grab the latest llamafile release binary off https://github.com/Mozilla-Ocho/llamafile/releases/ along with our zipalign program, and use it to insert the weights back into the new file:

    zipalign -j0 llamafile-0.4.1 wizardcoder-python-13b-v1.0.Q4_K_M.gguf .args

Congratulations. You've just created your first llamafile! It's also worth mentioning that you don't have to combine it into one giant file. It's also fine to just say:

    llamafile -m wizardcoder-python-13b-v1.0.Q4_K_M.gguf -p 'write some code'

You can do that with just about any GGUF weights you find on Hugging Face, in case you want to try out other models.

Enjoy!