You can now train your own o3-mini model on your local device!

Hey guys! I run an open-source project Unsloth with my brother who worked at NVIDIA, so optimizations are our thing! Today, we're excited to announce that you can now train your own reasoning model like o3-mini locally with just 5GB VRAM!

  1. o3-mini was trained with an algorithm called 'PPO' and DeepSeek-R1 was trained with an a more optimized version called 'GRPO'. We made the algorithm use 90% less memory.
  2. We're not trying to replicate the entire o3-mini model as that's unlikely (unless you're super rich). We're trying to recreate o3-mini's chain-of-thought/reasoning/thinking process
  3. We want a model to learn by itself without providing it any reasons to how it derives answers. GRPO allows the model figure out the reason automatously. This is called the "aha" moment.
  4. GRPO can improve accuracy for tasks in medicine, law, math, coding + more.
  5. You can transform Llama 3.1 (8B), Phi-4 (14B) or any open model into a reasoning model. You'll need a minimum of 5GB of VRAM to do it!
  6. In a test example below, even after just one hour of GRPO training on Phi-4 (Microsoft's open-source model), the new model developed a clear thinking process and produced correct answers—unlike the original model.

https://preview.redd.it/r4g8juxomrhe1.png?width=3812&format=png&auto=webp&s=95fd3ba3a3389a48e43d61df11a7c8475b067a36

Highly recommend you to read our really informative blog + guide on this: https://unsloth.ai/blog/grpo

Have a lovely weekend! :)