ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.06745
  4. Cited By
GPT-NeoX-20B: An Open-Source Autoregressive Language Model

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

14 April 2022
Sid Black
Stella Biderman
Eric Hallahan
Quentin G. Anthony
Leo Gao
Laurence Golding
Horace He
Connor Leahy
Kyle McDonell
Jason Phang
Michael Pieler
USVSN Sai Prashanth
Shivanshu Purohit
Laria Reynolds
J. Tow
Benqi Wang
Samuel Weinbach
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)Github (7200★)

Papers citing "GPT-NeoX-20B: An Open-Source Autoregressive Language Model"

50 / 602 papers shown
Title
One Attention, One Scale: Phase-Aligned Rotary Positional Embeddings for Mixed-Resolution Diffusion Transformer
One Attention, One Scale: Phase-Aligned Rotary Positional Embeddings for Mixed-Resolution Diffusion Transformer
Haoyu Wu
Jingyi Xu
Qiaomu Miao
Dimitris Samaras
H. Le
44
0
0
24 Nov 2025
Selective Rotary Position Embedding
Selective Rotary Position Embedding
Sajad Movahedi
Timur Carstensen
Arshia Afzal
Frank Hutter
Antonio Orvieto
Volkan Cevher
125
0
0
21 Nov 2025
SCALE: Upscaled Continual Learning of Large Language Models
SCALE: Upscaled Continual Learning of Large Language Models
Jin-woo Lee
Junhwa Choi
Bongkyu Hwang
Jinho Choo
Bogun Kim
...
Joonseok Lee
DongYoung Jung
Jaeseon Park
Kyoungwon Park
Suk-hoon Jung
CLLLRM
314
0
0
05 Nov 2025
Diffusion Language Models are Super Data Learners
Diffusion Language Models are Super Data Learners
Jinjie Ni
Qian Liu
Longxu Dou
Chao Du
Zili Wang
Hang Yan
Tianyu Pang
Michael Shieh
AI4CE
100
8
0
05 Nov 2025
From Prompts to Power: Measuring the Energy Footprint of LLM Inference
From Prompts to Power: Measuring the Energy Footprint of LLM Inference
Francisco Caravaca
Ángel Cuevas
R. Cuevas
64
0
0
05 Nov 2025
MossNet: Mixture of State-Space Experts is a Multi-Head Attention
MossNet: Mixture of State-Space Experts is a Multi-Head Attention
Shikhar Tuli
James Smith
Haris Jeelani
Chi-Heng Lin
Abhishek Patel
Vasili Ramanishka
Yen-Chang Hsu
Hongxia Jin
MoE
235
0
0
30 Oct 2025
The Structure of Relation Decoding Linear Operators in Large Language Models
The Structure of Relation Decoding Linear Operators in Large Language Models
Miranda Anna Christ
Adrián Csiszárik
Gergely Becsó
D. Varga
72
0
0
30 Oct 2025
Understanding and Improving Length Generalization in Hierarchical Sparse Attention Models
Understanding and Improving Length Generalization in Hierarchical Sparse Attention Models
Jiaqi Leng
Xiang Hu
Junxiong Wang
Jianguo Li
Wei Wu
Yucheng Lu
72
1
0
20 Oct 2025
Every Language Model Has a Forgery-Resistant Signature
Every Language Model Has a Forgery-Resistant Signature
Matthew Finlayson
Xiang Ren
Swabha Swayamdipta
64
0
0
15 Oct 2025
High-Power Training Data Identification with Provable Statistical Guarantees
High-Power Training Data Identification with Provable Statistical Guarantees
Zhenlong Liu
Hao Zeng
Weiran Huang
Hongxin Wei
129
0
0
10 Oct 2025
Vision-Language-Action Models for Robotics: A Review Towards Real-World Applications
Vision-Language-Action Models for Robotics: A Review Towards Real-World ApplicationsIEEE Access (IEEE Access), 2025
Kento Kawaharazuka
Jihoon Oh
Jun Yamada
Ingmar Posner
Yuke Zhu
LM&Ro
177
17
0
08 Oct 2025
When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs
When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs
Soyeong Jeong
Taehee Jung
Sung Ju Hwang
Joo-Kyung Kim
Luan Tuyen Chau
LLMAGLRM
60
0
0
08 Oct 2025
Membership Inference Attacks on Tokenizers of Large Language Models
Membership Inference Attacks on Tokenizers of Large Language Models
Meng Tong
Yuntao Du
Kejiang Chen
Weiming Zhang
Ninghui Li
MIALM
219
0
0
07 Oct 2025
Distributed Low-Communication Training with Decoupled Momentum Optimization
Distributed Low-Communication Training with Decoupled Momentum Optimization
S. Nedelkoski
Alexander Acker
O. Kao
Soeren Becker
Dominik Scheinert
75
0
0
03 Oct 2025
xLSTM Scaling Laws: Competitive Performance with Linear Time-Complexity
xLSTM Scaling Laws: Competitive Performance with Linear Time-Complexity
Maximilian Beck
Kajetan Schweighofer
Sebastian Böck
Sebastian Lehner
Sepp Hochreiter
105
0
1
02 Oct 2025
Uncovering the Computational Ingredients of Human-Like Representations in LLMs
Uncovering the Computational Ingredients of Human-Like Representations in LLMs
Zach Studdiford
Timothy T. Rogers
Kushin Mukherjee
Siddharth Suresh
120
0
0
01 Oct 2025
VietBinoculars: A Zero-Shot Approach for Detecting Vietnamese LLM-Generated Text
VietBinoculars: A Zero-Shot Approach for Detecting Vietnamese LLM-Generated Text
Trieu Hai Nguyen
Sivaswamy Akilesh
81
0
0
30 Sep 2025
Pretraining with hierarchical memories: separating long-tail and common knowledge
Pretraining with hierarchical memories: separating long-tail and common knowledge
Hadi Pouransari
David Grangier
C Thomas
Michael Kirchhof
Oncel Tuzel
RALMKELM
175
1
0
29 Sep 2025
Sanitize Your Responses: Mitigating Privacy Leakage in Large Language Models
Sanitize Your Responses: Mitigating Privacy Leakage in Large Language Models
Wenjie Fu
Huandong Wang
Junyao Gao
Guoan Wan
Tao Jiang
AAMLKELMMU
72
0
0
29 Sep 2025
MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources
MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources
Huu Nguyen
Victor May
Harsh Raj
Marianna Nezhurina
Yishan Wang
...
Aleksandra Krasnodębska
Christoph Schuhmann
Mats Leon Richter
Xuan-Son
J. Jitsev
71
1
0
29 Sep 2025
Adaptive Token-Weighted Differential Privacy for LLMs: Not All Tokens Require Equal Protection
Adaptive Token-Weighted Differential Privacy for LLMs: Not All Tokens Require Equal Protection
Manjiang Yu
Priyanka Singh
Xue Li
Yang Cao
AAML
64
0
0
27 Sep 2025
Efficient Fine-Grained GPU Performance Modeling for Distributed Deep Learning of LLM
Efficient Fine-Grained GPU Performance Modeling for Distributed Deep Learning of LLM
Biyao Zhang
Mingkai Zheng
Debargha Ganguly
Xuecen Zhang
Vikash Singh
Vipin Chaudhary
Zhao Zhang
62
0
0
26 Sep 2025
Etude: Piano Cover Generation with a Three-Stage Approach -- Extract, strucTUralize, and DEcode
Etude: Piano Cover Generation with a Three-Stage Approach -- Extract, strucTUralize, and DEcode
Tse-Yang Che
Yuh-Jzer Joung
56
0
0
20 Sep 2025
Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory and Test-Time Compute Scaling
Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory and Test-Time Compute Scaling
Ivan Rodkin
Daniil Orel
Konstantin Smirnov
Arman Bolatov
Bilal Elbouardi
...
Aydar Bulatov
Preslav Nakov
Timothy Baldwin
Artem Shelmanov
Mikhail Burtsev
LRM
153
0
0
22 Aug 2025
The Fools are Certain; the Wise are Doubtful: Exploring LLM Confidence in Code Completion
The Fools are Certain; the Wise are Doubtful: Exploring LLM Confidence in Code Completion
Zoe Kotti
Konstantina Dritsa
D. Spinellis
Panos Louridas
68
0
0
22 Aug 2025
Exploiting Vocabulary Frequency Imbalance in Language Model Pre-training
Exploiting Vocabulary Frequency Imbalance in Language Model Pre-training
Woojin Chung
Jeonghoon Kim
140
0
0
21 Aug 2025
Can Transformers Break Encryption Schemes via In-Context Learning?
Can Transformers Break Encryption Schemes via In-Context Learning?
Jathin Korrapati
Patrick Mendoza
Aditya Tomar
Abein Abraham
36
0
0
13 Aug 2025
Matrix-Driven Instant Review: Confident Detection and Reconstruction of LLM Plagiarism on PC
Matrix-Driven Instant Review: Confident Detection and Reconstruction of LLM Plagiarism on PC
Ruichong Zhang
144
2
0
08 Aug 2025
Trainable Dynamic Mask Sparse Attention
Trainable Dynamic Mask Sparse Attention
Jingze Shi
Yifan Wu
Yiran Peng
Yiran Peng
Liangdong Wang
Guang Liu
Yuyu Luo
240
2
0
04 Aug 2025
FMimic: Foundation Models are Fine-grained Action Learners from Human Videos
FMimic: Foundation Models are Fine-grained Action Learners from Human VideosThe international journal of robotics research (IJRR), 2025
Guangyan Chen
Meiling Wang
Te Cui
Yao Mu
Haoyang Lu
...
Mengxiao Hu
Tianxing Zhou
M. Fu
Yi Yang
Yufeng Yue
LM&RoVLM
97
4
0
28 Jul 2025
IQ Test for LLMs: An Evaluation Framework for Uncovering Core Skills in LLMs
IQ Test for LLMs: An Evaluation Framework for Uncovering Core Skills in LLMs
Aviya Maimon
Amir D. N. Cohen
Gal Vishne
Shauli Ravfogel
Reut Tsarfaty
80
0
0
27 Jul 2025
Supernova: Achieving More with Less in Transformer Architectures
Supernova: Achieving More with Less in Transformer Architectures
Andrei-Valentin Tanase
Elena Pelican
93
0
0
21 Jul 2025
Opus: A Prompt Intention Framework for Complex Workflow Generation
Opus: A Prompt Intention Framework for Complex Workflow Generation
Théo Fagnoni
Mahsun Altin
Chia En Chung
Phillip Kingston
Alan Tuning
Dana O. Mohamed
Inès Adnani
83
1
0
15 Jul 2025
Understanding and Improving Length Generalization in Recurrent Models
Understanding and Improving Length Generalization in Recurrent Models
Ricardo Buitrago Ruiz
Albert Gu
162
4
0
03 Jul 2025
MambaMia: A State-Space-Model-Based Compression for Efficient Video Understanding in Large Multimodal Models
MambaMia: A State-Space-Model-Based Compression for Efficient Video Understanding in Large Multimodal Models
Geewook Kim
Minjoon Seo
167
1
0
16 Jun 2025
Exploring Cultural Variations in Moral Judgments with Large Language Models
Exploring Cultural Variations in Moral Judgments with Large Language Models
Hadi Mohammadi
Efthymia Papadopoulou
Yasmeen F.S.S. Meijer
Ayoub Bagheri
158
1
0
14 Jun 2025
Is your batch size the problem? Revisiting the Adam-SGD gap in language modeling
Is your batch size the problem? Revisiting the Adam-SGD gap in language modeling
Teodora Srećković
Jonas Geiping
Antonio Orvieto
MoE
151
5
0
14 Jun 2025
Long-Short Alignment for Effective Long-Context Modeling in LLMs
Long-Short Alignment for Effective Long-Context Modeling in LLMs
Tianqi Du
Haotian Huang
Yifei Wang
Yisen Wang
131
1
0
13 Jun 2025
Surprisal from Larger Transformer-based Language Models Predicts fMRI Data More Poorly
Surprisal from Larger Transformer-based Language Models Predicts fMRI Data More Poorly
Yi-Chien Lin
William Schuler
105
1
0
12 Jun 2025
TransXSSM: A Hybrid Transformer State Space Model with Unified Rotary Position Embedding
TransXSSM: A Hybrid Transformer State Space Model with Unified Rotary Position Embedding
Yiran Peng
Jingze Shi
Yifan Wu
Nan Tang
Yuyu Luo
282
3
0
11 Jun 2025
Beyond Text Compression: Evaluating Tokenizers Across Scales
Beyond Text Compression: Evaluating Tokenizers Across ScalesAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Jonas F. Lotz
António V. Lopes
Stephan Peitz
Hendra Setiawan
Leonardo Emili
239
2
0
03 Jun 2025
IF-GUIDE: Influence Function-Guided Detoxification of LLMs
IF-GUIDE: Influence Function-Guided Detoxification of LLMs
Zachary Coalson
Juhan Bae
Nicholas Carlini
Sanghyun Hong
TDI
333
1
0
02 Jun 2025
G2S: A General-to-Specific Learning Framework for Temporal Knowledge Graph Forecasting with Large Language Models
G2S: A General-to-Specific Learning Framework for Temporal Knowledge Graph Forecasting with Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Long Bai
Zixuan Li
Xiaolong Jin
Jiafeng Guo
Xueqi Cheng
Tat-Seng Chua
AI4TS
89
1
0
31 May 2025
HELM: Hyperbolic Large Language Models via Mixture-of-Curvature Experts
HELM: Hyperbolic Large Language Models via Mixture-of-Curvature Experts
Neil He
Rishabh Anand
Hiren Madhu
Ali Maatouk
Smita Krishnaswamy
Leandros Tassiulas
Menglin Yang
Rex Ying
176
7
0
30 May 2025
Mamba Knockout for Unraveling Factual Information Flow
Mamba Knockout for Unraveling Factual Information FlowAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Nir Endy
Idan Daniel Grosbard
Yuval Ran-Milo
Yonatan Slutzky
Itay Tshuva
Raja Giryes
113
0
0
30 May 2025
The Arabic AI Fingerprint: Stylometric Analysis and Detection of Large Language Models Text
The Arabic AI Fingerprint: Stylometric Analysis and Detection of Large Language Models Text
Maged S. Al-Shaibani
Moataz Ahmed
DeLMO
159
5
0
29 May 2025
Learning in Compact Spaces with Approximately Normalized Transformer
Learning in Compact Spaces with Approximately Normalized Transformer
Jörg Franke
Urs Spiegelhalter
Marianna Nezhurina
J. Jitsev
Katharina Eggensperger
Michael Hefenbrock
201
1
0
28 May 2025
Evaluation of LLMs in Speech is Often Flawed: Test Set Contamination in Large Language Models for Speech Recognition
Evaluation of LLMs in Speech is Often Flawed: Test Set Contamination in Large Language Models for Speech Recognition
Yuan Tseng
Titouan Parcollet
Rogier van Dalen
Shucong Zhang
Sourav Bhattacharya
234
1
0
28 May 2025
Explaining Large Language Models with gSMILE
Explaining Large Language Models with gSMILE
Zeinab Dehghani
Mohammed Naveed Akram
Adil Khan
Mohammed Naveed Akram
Y. Papadopoulos
MILMLRM
448
0
0
27 May 2025
In Search of Adam's Secret Sauce
In Search of Adam's Secret Sauce
Antonio Orvieto
Robert Gower
235
10
0
27 May 2025
1234...111213
Next