Phi-4-mini + Bug Fixes Details

Hey guys! Once again like Phi-4...Phi-4-mini was released with bugs. We uploaded the fixed versions of Phi-4-mini, including GGUF + 4-bit + 16-bit versions on HuggingFace!

We’ve fixed over 4 bugs in the model, mainly related to tokenizers and chat templates which affected inference and finetuning workloads. If you were experiencing poor results, we recommend trying our GGUF upload.

Bug fixes:

  1. Padding and EOS tokens are the same - fixed this.
  2. Chat template had extra EOS token - removed this. Otherwise you will be <|end|> during inference.
  3. EOS token should be <|end|> not <|endoftext|>. Otherwise it'll terminate at <|endoftext|>
  4. Changed unk_token to � from EOS.

View all Phi-4 versions with our bug fixes: Collection

Do the Bug Fixes + Dynamic Quants Work?

  • Yes! Our fixed Phi-4 uploads show clear performance gains, with even better scores than Microsoft's original uploads on the Open LLM Leaderboard.

https://preview.redd.it/7ea4dpzd8zle1.png?width=2084&format=png&auto=webp&s=586790c8d56dd818fab60f4d74858646a9a180f7

  • Microsoft officially pushed in our bug fixes for the Phi-4 model a few weeks ago.
  • Our dynamic 4-bit model scored nearly as high as our 16-bit version—and well above standard Bnb 4-bit (with our bug fixes) and Microsoft's official 16-bit model, especially for MMLU.
Phi-4 Uploads (with our bug fixes)
GGUFs including 2, 3, 4, 5, 6, 8, 16-bit
Unsloth Dynamic 4-bit
4-bit Bnb
Original 16-bit

We uploaded Q2_K_L quants which works well as well - they are Q2_K quants, but leaves the embedding as Q4 and lm_head as Q6 - this should increase accuracy by a bit!

To use Phi-4 in llama.cpp, do:

./llama.cpp/llama-cli
    --model unsloth/phi-4-mini-instruct-GGUF/phi-4-mini-instruct-Q2_K_L.gguf
    --prompt '<|im_start|>user<|im_sep|>Provide all combinations of a 5 bit binary number.<|im_end|><|im_start|>assistant<|im_sep|>'
    --threads 16

And that's it. Hopefully we don't encounter bugs again in future model releases....