Nemotron

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2023 Nov 21 11:6
Editor
Edited
Edited
2025 Dec 18 18:30
Refs

Multimodal

 
 
 
 
 
cosmos-nemotron-34b Model by NVIDIA | NVIDIA NIM
Multi-modal vision-language model that understands text/img/video and creates informative responses
cosmos-nemotron-34b Model by NVIDIA | NVIDIA NIM
NVIDIA Releases Open Synthetic Data Generation Pipeline for Training Large Language Models
Nemotron-4 340B, a family of models optimized for NVIDIA NeMo and NVIDIA TensorRT-LLM, includes cutting-edge instruct and reward models, and a dataset for generative AI training.
NVIDIA Releases Open Synthetic Data Generation Pipeline for Training Large Language Models

Mistral Minitron

Mistral-NeMo-Minitron 8B Foundation Model Delivers Unparalleled Accuracy | NVIDIA Technical Blog
Last month, NVIDIA and Mistral AI unveiled Mistral NeMo 12B, a leading state-of-the-art large language model (LLM). Mistral NeMo 12B consistently outperforms similarly sized models on a wide range of…
Mistral-NeMo-Minitron 8B Foundation Model Delivers Unparalleled Accuracy | NVIDIA Technical Blog
Hybrid
Mamba Model
–Transformer language model with only about 3.2B out of 31.6B total parameters activated for high efficiency
Attention mechanisms become computationally and memory-intensive (especially KV cache) as sequence length increases, whereas Mamba-based models (State Space Models) are structurally designed to scale more efficiently on long sequences. Therefore, using Mamba for most layers makes it easier to gain advantages in throughput/memory such as
Grouped-query Attention
research.nvidia.com
 
 

Backlinks

MoE Model

Recommendations