ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2508.14444
255
0
v1v2v3v4 (latest)

NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

20 August 2025
Nvidia
Aarti Basant
Abhijit Khairnar
Abhijit Paithankar
Abhinav Khattar
Adithya Renduchintala
Aditya Malte
Akhiad Bercovich
Akshay Hazare
Alejandra Rico
Aleksander Ficek
Alex Kondratenko
Alex Shaposhnikov
Alexander Bukharin
Ali Taghibakhshi
Amelia Barton
Ameya Mahabaleshwarkar
Amy Shen
Andrew Tao
Ann Guan
Anna Shors
Anubhav Mandarwal
Arham Mehta
Arun Venkatesan
Ashton Sharabiani
Ashwath Aithal
Ashwin Poojary
Ayush Dattagupta
Balaram Buddharaju
Banghua Zhu
Barnaby Simkin
Bilal Kartal
Bita Darvish Rouhani
Bobby Chen
Boris Ginsburg
Brandon Norick
Brian Yu
Bryan Catanzaro
Charles Wang
Charlie Truong
Chetan Mungekar
Chintan Patel
Chris Alexiuk
Christian Munley
Christopher Parisien
Dan Su
Daniel Afrimi
Daniel Korzekwa
Daniel Rohrer
Daria Gitman
David Mosallanezhad
Deepak Narayanan
Dima Rekesh
Dina Yared
Dmytro Pykhtar
Dong Ahn
Duncan Riach
E. Long
Elliott Ning
Eric S. Chung
Erick Galinkin
Evelina Bakhturina
Gargi Prasad
Gerald Shen
Haifeng Qian
Haim Elisha
Harsh Sharma
Hayley Ross
Helen Ngo
Herman Sahota
Hexin Wang
Hoo-Chang Shin
Hua Huang
Iain Cunningham
Igor Gitman
Ivan Moshkov
Jaehun Jung
Jan Kautz
Jane Polak Scowcroft
Jared Casper
Jian Zhang
Jiaqi Zeng
Jimmy Zhang
Jinze Xue
Jocelyn Huang
Joey Conway
John Kamalu
Jonathan Cohen
Joseph Jennings
Julien Veron Vialard
Junkeun Yi
Jupinder Parmar
Kari Briski
Katherine Cheung
Katherine Luna
Keith Wyss
Keshav Santhanam
Kezhi Kong
Krzysztof Pawelec
Kumar Anik
    LRM
ArXiv (abs)PDFHTMLHuggingFace (31 upvotes)Github (1132★)
Main:27 Pages
4 Figures
Bibliography:2 Pages
9 Tables
Appendix:14 Pages
Abstract

We introduce Nemotron-Nano-9B-v2, a hybrid Mamba-Transformer language model designed to increase throughput for reasoning workloads while achieving state-of-the-art accuracy compared to similarly-sized models. Nemotron-Nano-9B-v2 builds on the Nemotron-H architecture, in which the majority of the self-attention layers in the common Transformer architecture are replaced with Mamba-2 layers, to achieve improved inference speed when generating the long thinking traces needed for reasoning. We create Nemotron-Nano-9B-v2 by first pre-training a 12-billion-parameter model (Nemotron-Nano-12B-v2-Base) on 20 trillion tokens using an FP8 training recipe. After aligning Nemotron-Nano-12B-v2-Base, we employ the Minitron strategy to compress and distill the model with the goal of enabling inference on up to 128k tokens on a single NVIDIA A10G GPU (22GiB of memory, bfloat16 precision). Compared to existing similarly-sized models (e.g., Qwen3-8B), we show that Nemotron-Nano-9B-v2 achieves on-par or better accuracy on reasoning benchmarks while achieving up to 6x higher inference throughput in reasoning settings like 8k input and 16k output tokens. We are releasing Nemotron-Nano-9B-v2, Nemotron-Nano12B-v2-Base, and Nemotron-Nano-9B-v2-Base checkpoints along with the majority of our pre- and post-training datasets on Hugging Face.

View on arXiv
Comments on this paper