We GRPO-ed a 1.5B model to test LLM Spatial Reasoning by solving MAZE