ParallelReasoning is a reasoning model built to test how far chain-of-thought reasoning can be pushed. It combines multiple instances of Google's Gemma 3 (27b parameters) to generate the reasoning processes, and achieves performance getting close to o3 or o4-mini.
Unlike other reasoning models, ParallelReasoning runs multiple instances of Gemma 3 in parallel, each independently attempting to reason through the same prompt. Through aggregation and refinement, it produces a more complete and stable response than normal chain-of-thought models.
ParallelReasoning is organized into three distinct layers:
(In the "mini" version of the model, Layers 2 and 3 are combined into a single step for faster reasoning. Additionally, the "mini" model uses 6 parallel instances in Layer 1, compared to 16 in the full version.)
By generating multiple chains-of-thought independently, ParallelReasoning addresses a weakness in traditional LLMs: early reasoning bias. In a traditional chain-of-thought model, a small mistake early in the reasoning often leads to compounding errors. Here, each model begins its chain without influence from the others, allowing different reasoning paths to explore different directions. The refinement/aggregation layer then selects the most promising trajectories.
This massively parallel approach ends up generating between 20,000 and 80,000 tokens during the reasoning phase — far more than other reasoning models like DeepSeek R1 typically produce (around 4,000 reasoning tokens). Thanks to parallelization, this deeper thinking does not lead to proportionally higher latency.
From my informal testing, ParallelReasoning has surprisingly demonstrated reasoning quality approaching top reasoning models like O3 and O4-mini, despite starting from a small base model (Gemma 3 at 27B parameters).
While traditional reasoning models are constrained by latency and context length available for reasoning generation, ParallelReasoning achieves broader reasoning by generating a significantly higher volume of tokens.
Fun fact: I "vibe coded" the entire website where the model is deployed (a ChatGPT-like interface), exclusively using this model. ;)
The demonstration website offers two models:
ParallelReasoning is currently available for public demo on this site.
It began as a pet project and led to interesting results, so I believe it is worth sharing ;) If you have any doubts or suggestions, feel free to reach out to me! rodolfo43393@gmail.com
Try the Model