GPT-NeoX-20B: An Open-Source Autoregressive Language Model

14 April 2022

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)Github (7200★)

Papers citing "GPT-NeoX-20B: An Open-Source Autoregressive Language Model"

50 / 603 papers shown

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

...

489

477

18 Dec 2024

Optimizing AI-Assisted Code Generation

Simon Torka

Sahin Albayrak

306

14 Dec 2024

Code LLMs: A Taxonomy-based SurveyBigData Congress [Services Society] (BSS), 2024

Nishat Raihan

Christian D. Newman

Marcos Zampieri

387

11 Dec 2024

LA4SR: illuminating the dark proteome with generative AI

Kourosh Salehi-Ashtiani

183

11 Nov 2024

Towards Low-Resource Harmful Meme Detection with LMM AgentsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

243

08 Nov 2024

OpenCoder: The Open Cookbook for Top-Tier Code Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

J.K. Liu

...

538

103

07 Nov 2024

Photon: Federated LLM Pre-Training

...

340

05 Nov 2024

Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in TransformersNeural Information Processing Systems (NeurIPS), 2024

384

01 Nov 2024

GigaCheck: Detecting LLM-generated Content

332

31 Oct 2024

TokenFormer: Rethinking Transformer Scaling with Tokenized Model ParametersInternational Conference on Learning Representations (ICLR), 2024

Haiyang Wang

Yue Fan

Muhammad Ferjad Naeem

440

30 Oct 2024

SVIP: Towards Verifiable Inference of Open-source Large Language Models

349

29 Oct 2024

M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation

Shukai Liu

...

Zekun Wang

223

28 Oct 2024

DeTeCtive: Detecting AI-generated Text via Multi-Level Contrastive LearningNeural Information Processing Systems (NeurIPS), 2024

334

28 Oct 2024

Deep Optimizer States: Towards Scalable Training of Transformer Models Using Interleaved OffloadingInternational Middleware Conference (Middleware), 2024

194

26 Oct 2024

Reinforcement Learning for Aligning Large Language Models Agents with Interactive Environments: Quantifying and Mitigating Prompt OverfittingNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

305

25 Oct 2024

Self-Explained Keywords Empower Large Language Models for Code Generation

Lishui Fan

Mouxiang Chen

Zhongxin Liu

316

21 Oct 2024

Scalable Data Ablation Approximations for Language Models through Modular Training and MergingConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Jesse Dodge

171

21 Oct 2024

Hallucination Detox: Sensitivity Dropout (SenD) for Large Language Model TrainingAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Shahrad Mohammadzadeh

437

20 Oct 2024

Adaptive Data Optimization: Dynamic Sample Selection with Scaling LawsInternational Conference on Learning Representations (ICLR), 2024

Yiding Jiang

286

15 Oct 2024

TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models

...

Jianfeng Gao

301

194

14 Oct 2024

Ada-K Routing: Boosting the Efficiency of MoE-based LLMs

333

14 Oct 2024

LLM-SmartAudit: Advanced Smart Contract Vulnerability Detection

Jing Sun

303

12 Oct 2024

Enterprise Benchmarks for Large Language Model Evaluation

266

11 Oct 2024

PLaMo-100B: A Ground-Up Language Model Designed for Japanese Proficiency

...

Shuji Suzuki

262

10 Oct 2024

LecPrompt: A Prompt-based Approach for Logical Error Correction with CodeBERT

Zhenyu Xu

Victor S. Sheng

KELM

233

10 Oct 2024

Detecting Training Data of Large Language Models via Expectation Maximization

Gyuwan Kim

Yang Li

Evangelia Spiliopoulou

Jie Ma

Miguel Ballesteros

MIALM

802

10 Oct 2024

FreqMark: Frequency-Based Watermark for Sentence-Level Detection of LLM-Generated Text

210

09 Oct 2024

Which Programming Language and What Features at Pre-training Stage Affect Downstream Logical Inference Performance?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024

274

09 Oct 2024

Fine-tuning can Help Detect Pretraining Data from Large Language ModelsInternational Conference on Learning Representations (ICLR), 2024

517

09 Oct 2024

Round and Round We Go! What makes Rotary Positional Encodings useful?International Conference on Learning Representations (ICLR), 2024

Federico Barbero

Alex Vitvitskyi

Christos Perivolaropoulos

Razvan Pascanu

Petar Velickovic

521

08 Oct 2024

DEPT: Decoupled Embeddings for Pre-training Language ModelsInternational Conference on Learning Representations (ICLR), 2024

William F. Shen

Dongqi Cai

Nicholas D. Lane

1.4K

07 Oct 2024

LLM-Pilot: Characterize and Optimize Performance of your LLM Inference ServicesInternational Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2024

Małgorzata Łazuka

Andreea Anghel

Thomas Parnell

282

03 Oct 2024

Training Language Models on Synthetic Edit Sequences Improves Code SynthesisInternational Conference on Learning Representations (ICLR), 2024

Ulyana Piterbarg

Lerrel Pinto

Rob Fergus

SyDa

520

03 Oct 2024

Creative and Context-Aware Translation of East Asian Idioms with GPT-4Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024

350

01 Oct 2024

Zero-Shot Detection of LLM-Generated Text using Token CohesivenessConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Shixuan Ma

Quan Wang

302

25 Sep 2024

Pretraining Data Detection for Large Language Models: A Divergence-based Calibration MethodConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Jiafeng Guo

549

23 Sep 2024

Expanding Expressivity in Transformer Models with MöbiusAttention

Anna-Maria Halacheva

M. Nayyeri

Steffen Staab

266

08 Sep 2024

Con-ReCall: Detecting Pre-training Data in LLMs via Contrastive DecodingInternational Conference on Computational Linguistics (COLING), 2024

Cheng Wang

Yiwei Wang

Bryan Hooi

Yujun Cai

Nanyun Peng

Kai-Wei Chang

464

05 Sep 2024

The AdEMAMix Optimizer: Better, Faster, OlderInternational Conference on Learning Representations (ICLR), 2024

Matteo Pagliardini

Pierre Ablin

David Grangier

ODL

358

05 Sep 2024

Comparing Discrete and Continuous Space LLMs for Speech RecognitionInterspeech (Interspeech), 2024

Yaoxun Xu

Shi-Xiong Zhang

Jianwei Yu

Zhiyong Wu

Dong Yu

AuLLM

311

01 Sep 2024

A Survey of Large Language Models for European Languages

Wazir Ali

S. Pyysalo

434

27 Aug 2024

Internal and External Knowledge Interactive Refinement Framework for Knowledge-Intensive Question Answering

Haowei Du

Dongyan Zhao

KELM

194

23 Aug 2024

ONSEP: A Novel Online Neural-Symbolic Framework for Event Prediction Based on Large Language ModelAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Jie Tan

277

14 Aug 2024

Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?

Yejin Choi

390

23 Jul 2024

Consent in Crisis: The Rapid Decline of the AI Data Commons

...

421

20 Jul 2024

The 2024 Foundation Model Transparency Index

345

17 Jul 2024

A Survey on Symbolic Knowledge Distillation of Large Language Models

327

12 Jul 2024

AutoBencher: Towards Declarative Benchmark Construction

Percy Liang

Tatsunori Hashimoto

212

11 Jul 2024

A Review of the Challenges with Massive Web-mined Corpora Used in Large Language Models Pre-Training

Michał Perełkiewicz

Rafał Poświata

231

10 Jul 2024

Who is better at math, Jenny or Jingzhen? Uncovering Stereotypes in Large Language Models

Zara Siddique

Liam D. Turner

Luis Espinosa-Anke

250

09 Jul 2024