You can now train your own o3-mini model on your local device!
Hey guys! I run an open-source project Unsloth with my brother who worked at NVIDIA, so optimizations are our thing! Today, we're excited to announce that you can now train your own reasoning model like o3-mini locally with just 5GB VRAM!
- o3-mini was trained with an algorithm called 'PPO' and DeepSeek-R1 was trained with an a more optimized version called 'GRPO'. We made the algorithm use 90% less memory.
- We're not trying to replicate the entire o3-mini model as that's unlikely (unless you're super rich). We're trying to recreate o3-mini's chain-of-thought/reasoning/thinking process
- We want a model to learn by itself without providing it any reasons to how it derives answers. GRPO allows the model figure out the reason automatously. This is called the "aha" moment.
- GRPO can improve accuracy for tasks in medicine, law, math, coding + more.
- You can transform Llama 3.1 (8B), Phi-4 (14B) or any open model into a reasoning model. You'll need a minimum of 5GB of VRAM to do it!
- In a test example below, even after just one hour of GRPO training on Phi-4 (Microsoft's open-source model), the new model developed a clear thinking process and produced correct answers—unlike the original model.
Highly recommend you to read our really informative blog + guide on this: https://unsloth.ai/blog/grpo
- Also we spent a lot of time on our Guide (with pics) for everything on GRPO + reward functions/verifiers so would highly recommend you guys to read it: docs.unsloth.ai/basics/reasoning
- I also know some of you guys don't have GPUs, but worry not, as you can do it for free on Google Colab/Kaggle using their free 15GB GPUs they provide. Our notebook to train GRPO with Phi-4 (14B) for free: https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Phi_4_(14B)-GRPO.ipynb-GRPO.ipynb)
Have a lovely weekend! :)