Nvidia has released Nemotron 3 Super, a new open AI model built to run faster and handle very long prompts. Nvidia is aiming it at developers building AI agents, where costs can rise fast when models need to reason through many steps.
Good to Know
Nemotron 3 Super does not use all of its parameters every time it answers. Instead, it uses a Mixture of Experts design, where only part of the model turns on for each task. Nvidia says that helps lower inference costs and makes the model more useful for AI agents that often burn through large amounts of tokens.
The model uses a mix of Mamba and Transformer layers across 88 layers. In simple terms, one part helps it deal with very long inputs more efficiently, while the other helps it stay accurate. Nvidia says that setup gives the model a native context window of up to 1 million tokens.
Nvidia also added a routing system called LatentMoE. It sends each task to a smaller group of experts inside the model instead of using the full system. According to Nvidia, that allows more specialization without raising inference costs in the same way as regular MoE systems.
The company says Nemotron 3 Super delivers 2.2x the throughput of GPT OSS 120B and 7.5x the throughput of Qwen3.5 122B A10B in the stated test setup. Nvidia also says it offers more than 5x the throughput and up to 2x the accuracy of the earlier Nemotron Super version.
Training was done on 25 trillion tokens, followed by an extra phase on 51 billion tokens to stretch context length to 1 million tokens. Nvidia then used supervised fine tuning and reinforcement learning to improve performance.
Benchmark results were also strong. Nvidia reports scores of 83.73 on MMLU Pro, 90.21 on AIME25, 60.47 on SWE Bench with OpenHands, 85.6% on PinchBench, and 91.64 on RULER 1M. The model also powers Nvidia AI Q, a research agent that reached the top of the Deepresearch Bench leaderboard.
Nvidia trained the model in NVFP4, a format built for Blackwell GPUs. On B200 hardware, Nvidia says inference can run up to 4x faster than FP8 on H100, with no reported loss in accuracy.
Nemotron 3 Super is available under the Nvidia Nemotron Open Model License. Developers can get checkpoints in BF16, FP8, and NVFP4 on Hugging Face. Nvidia also supports inference through Nvidia NIM, build.nvidia.com, Perplexity, Openrouter, Together AI, Google Cloud, AWS, Azure, Coreweave, Dell Enterprise Hub, and HPE. More guides and recipes are available through NeMo.