GPT-NeoX-20B: An Open-Source Autoregressive Language Model

14 April 2022

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)Github (7200★)

Papers citing "GPT-NeoX-20B: An Open-Source Autoregressive Language Model"

50 / 603 papers shown

One Attention, One Scale: Phase-Aligned Rotary Positional Embeddings for Mixed-Resolution Diffusion Transformer

24 Nov 2025

Selective Rotary Position Embedding

307

21 Nov 2025

Diffusion Language Models are Super Data Learners

140

05 Nov 2025

SCALE: Upscaled Continual Learning of Large Language Models

...

509

05 Nov 2025

From Prompts to Power: Measuring the Energy Footprint of LLM Inference

Francisco Caravaca

Ángel Cuevas

R. Cuevas

116

05 Nov 2025

The Structure of Relation Decoding Linear Operators in Large Language Models

142

30 Oct 2025

MossNet: Mixture of State-Space Experts is a Multi-Head Attention

273

30 Oct 2025

Understanding and Improving Length Generalization in Hierarchical Sparse Attention Models

125

20 Oct 2025

Every Language Model Has a Forgery-Resistant Signature

Matthew Finlayson

Xiang Ren

Swabha Swayamdipta

110

15 Oct 2025

High-Power Training Data Identification with Provable Statistical Guarantees

168

10 Oct 2025

Vision-Language-Action Models for Robotics: A Review Towards Real-World ApplicationsIEEE Access (IEEE Access), 2025

277

08 Oct 2025

When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs

124

08 Oct 2025

Membership Inference Attacks on Tokenizers of Large Language Models

415

07 Oct 2025

Distributed Low-Communication Training with Decoupled Momentum Optimization

102

03 Oct 2025

xLSTM Scaling Laws: Competitive Performance with Linear Time-Complexity

141

02 Oct 2025

Uncovering the Computational Ingredients of Human-Like Representations in LLMs

162

01 Oct 2025

VietBinoculars: A Zero-Shot Approach for Detecting Vietnamese LLM-Generated Text

Trieu Hai Nguyen

Sivaswamy Akilesh

138

30 Sep 2025

Sanitize Your Responses: Mitigating Privacy Leakage in Large Language Models

162

29 Sep 2025

Pretraining with hierarchical memories: separating long-tail and common knowledge

249

29 Sep 2025

MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources

...

Aleksandra Krasnodębska

239

29 Sep 2025

Adaptive Token-Weighted Differential Privacy for LLMs: Not All Tokens Require Equal Protection

137

27 Sep 2025

Efficient Fine-Grained GPU Performance Modeling for Distributed Deep Learning of LLM

26 Sep 2025

Etude: Piano Cover Generation with a Three-Stage Approach -- Extract, strucTUralize, and DEcode

Tse-Yang Che

Yuh-Jzer Joung

20 Sep 2025

Open-sci-ref-0.01: open and reproducible reference baselines for language model and dataset comparison

179

10 Sep 2025

The Fools are Certain; the Wise are Doubtful: Exploring LLM Confidence in Code Completion

120

22 Aug 2025

Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory and Test-Time Compute Scaling

...

241

22 Aug 2025

Exploiting Vocabulary Frequency Imbalance in Language Model Pre-training

Woojin Chung

Jeonghoon Kim

201

21 Aug 2025

Can Transformers Break Encryption Schemes via In-Context Learning?

13 Aug 2025

Matrix-Driven Instant Review: Confident Detection and Reconstruction of LLM Plagiarism on PC

Ruichong Zhang

206

08 Aug 2025

Trainable Dynamic Mask Sparse Attention

356

04 Aug 2025

FMimic: Foundation Models are Fine-grained Action Learners from Human VideosThe international journal of robotics research (IJRR), 2025

...

158

28 Jul 2025

IQ Test for LLMs: An Evaluation Framework for Uncovering Core Skills in LLMs

138

27 Jul 2025

Supernova: Achieving More with Less in Transformer Architectures

Andrei-Valentin Tanase

Elena Pelican

151

21 Jul 2025

Opus: A Prompt Intention Framework for Complex Workflow Generation

108

15 Jul 2025

Understanding and Improving Length Generalization in Recurrent Models

Ricardo Buitrago Ruiz

Albert Gu

253

03 Jul 2025

MambaMia: A State-Space-Model-Based Compression for Efficient Video Understanding in Large Multimodal Models

Geewook Kim

Minjoon Seo

243

16 Jun 2025

Is your batch size the problem? Revisiting the Adam-SGD gap in language modeling

195

14 Jun 2025

Exploring Cultural Variations in Moral Judgments with Large Language Models

Hadi Mohammadi

Efthymia Papadopoulou

202

14 Jun 2025

Long-Short Alignment for Effective Long-Context Modeling in LLMs

191

13 Jun 2025

Surprisal from Larger Transformer-based Language Models Predicts fMRI Data More Poorly

Yi-Chien Lin

William Schuler

139

12 Jun 2025

TransXSSM: A Hybrid Transformer State Space Model with Unified Rotary Position Embedding

325

11 Jun 2025

Beyond Text Compression: Evaluating Tokenizers Across ScalesAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

278

03 Jun 2025

IF-GUIDE: Influence Function-Guided Detoxification of LLMs

455

02 Jun 2025

G2S: A General-to-Specific Learning Framework for Temporal Knowledge Graph Forecasting with Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

132

31 May 2025

HELM: Hyperbolic Large Language Models via Mixture-of-Curvature Experts

238

30 May 2025

Mamba Knockout for Unraveling Factual Information FlowAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

152

30 May 2025

The Arabic AI Fingerprint: Stylometric Analysis and Detection of Large Language Models Text

Maged S. Al-Shaibani

Moataz Ahmed

DeLMO

219

29 May 2025

Evaluation of LLMs in Speech is Often Flawed: Test Set Contamination in Large Language Models for Speech Recognition

299

28 May 2025

Learning in Compact Spaces with Approximately Normalized Transformer

Katharina Eggensperger

Michael Hefenbrock

266

28 May 2025

In Search of Adam's Secret Sauce

Antonio Orvieto

Robert Gower

370

27 May 2025