

Mercury 2 is a reasoning language model designed specifically for production AI applications where speed and responsiveness are critical. It represents a fundamental shift from traditional autoregressive models by employing diffusion-based parallel refinement technology.
Key features include tunable reasoning capabilities, 128K context length, native tool use functionality, and schema-aligned JSON output. The model achieves speeds of 1,009 tokens per second on NVIDIA Blackwell GPUs while maintaining competitive quality with leading speed-optimized models.
Unlike traditional sequential decoding models that generate one token at a time, Mercury 2 generates responses through parallel refinement, producing multiple tokens simultaneously and converging over a small number of steps. This diffusion-based approach provides a fundamentally different speed curve that's more than 5x faster than conventional methods.
The model excels in latency-sensitive applications where user experience is non-negotiable, including coding and editing workflows, agentic loops, real-time voice interactions, and search/RAG pipelines. It enables reasoning-grade quality within real-time latency budgets that traditional models cannot achieve.
Mercury 2 targets developers and enterprises building production AI systems that require high-speed inference, particularly those working on agentic workflows, coding assistants, voice interfaces, and search applications. The model is OpenAI API compatible and can be integrated into existing stacks without requiring rewrites.
admin
Mercury 2 targets developers and enterprises building production AI systems that require high-speed inference and low latency. It's specifically designed for teams working on agentic workflows, coding assistants, voice interfaces, and search applications where user experience responsiveness is critical. The model serves organizations needing reasoning-grade AI quality within real-time constraints, particularly those deploying AI at enterprise scale across customer support, compliance, analytics, and e-commerce applications.
Updated 2026-02-26