ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2512.20848
178
0

Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

23 December 2025
NVIDIA
Aaron Blakeman
Aaron Grattafiori
Aarti Basant
Abhibha Gupta
Abhinav Khattar
Adi Renduchintala
Aditya Vavre
Akanksha Shukla
Akhiad Bercovich
Aleksander Ficek
Aleksandr Shaposhnikov
Alex Kondratenko
Alexander Bukharin
Alexandre Milesi
Ali Taghibakhshi
Alisa Liu
Amelia Barton
Ameya Sunil Mahabaleshwarkar
Amir Klein
Amit Zuker
Amnon Geifman
Amy Shen
Anahita Bhiwandiwalla
Andrew Tao
Ann Guan
Anubhav Mandarwal
Arham Mehta
Ashwath Aithal
Ashwin Poojary
Asif Ahamed
Asma Kuriparambil Thekkumpate
Ayush Dattagupta
Banghua Zhu
Bardiya Sadeghi
Barnaby Simkin
Ben Lanir
Benedikt Schifferer
Besmira Nushi
Bilal Kartal
Bita Darvish Rouhani
Boris Ginsburg
Brandon Norick
Brandon Soubasis
Branislav Kisacanin
Brian Yu
Bryan Catanzaro
Carlo del Mundo
Chantal Hwang
Charles Wang
Cheng-Ping Hsieh
Chenghao Zhang
Chenhan Yu
Chetan Mungekar
Chintan Patel
Chris Alexiuk
Christopher Parisien
Collin Neale
Damon Mosk-Aoyama
Dan Su
Dane Corneil
Daniel Afrimi
Daniel Rohrer
Daniel Serebrenik
Daria Gitman
Daria Levy
Darko Stosic
David Mosallanezhad
Deepak Narayanan
Dhruv Nathawani
Dima Rekesh
Dina Yared
Divyanshu Kakwani
Dong Ahn
Duncan Riach
Dusan Stosic
Edgar Minasyan
Edward Lin
Eileen Long
Eileen Peters Long
Elena Lantz
Ellie Evans
Elliott Ning
Eric Chung
Eric Harper
Eric Tramel
Erick Galinkin
Erik Pounds
Evan Briones
Evelina Bakhturina
Faisal Ladhak
Fay Wang
Fei Jia
Felipe Soares
Feng Chen
Ferenc Galko
Frankie Siino
Gal Hubara Agam
Ganesh Ajjanagadde
Gantavya Bhatt
    MoELRM
ArXiv (abs)PDFHTMLHuggingFace (28 upvotes)Github (245★)
Main:28 Pages
10 Figures
Bibliography:7 Pages
8 Tables
Appendix:5 Pages
Abstract

We present Nemotron 3 Nano 30B-A3B, a Mixture-of-Experts hybrid Mamba-Transformer language model. Nemotron 3 Nano was pretrained on 25 trillion text tokens, including more than 3 trillion new unique tokens over Nemotron 2, followed by supervised fine tuning and large-scale RL on diverse environments. Nemotron 3 Nano achieves better accuracy than our previous generation Nemotron 2 Nano while activating less than half of the parameters per forward pass. It achieves up to 3.3x higher inference throughput than similarly-sized open models like GPT-OSS-20B and Qwen3-30B-A3B-Thinking-2507, while also being more accurate on popular benchmarks. Nemotron 3 Nano demonstrates enhanced agentic, reasoning, and chat abilities and supports context lengths up to 1M tokens. We release both our pretrained Nemotron 3 Nano 30B-A3B Base and post-trained Nemotron 3 Nano 30B-A3B checkpoints on Hugging Face.

View on arXiv
Comments on this paper