The Cost of Training NLP Models: A Concise Overview

19 April 2020

Papers citing "The Cost of Training NLP Models: A Concise Overview"

50 / 106 papers shown

Confidential LLM Inference: Performance and Cost Across CPU and GPU TEEs

117

23 Sep 2025

SLM-Bench: A Comprehensive Benchmark of Small Language Models on Environmental Impacts--Extended Version

205

21 Aug 2025

Spark Transformer: Reactivating Sparsity in FFN and Attention

...

294

07 Jun 2025

QUPID: Quantified Understanding for Enhanced Performance, Insights, and Decisions in Korean Search Engines

361

12 May 2025

Harnessing uncertainty when learning through Equilibrium Propagation in neural networks

Jonathan Peters

Philippe Talatchian

258

28 Mar 2025

Fewer May Be Better: Enhancing Offline Reinforcement Learning with Reduced DatasetInternational Conference on Learning Representations (ICLR), 2025

...

304

26 Feb 2025

TAPT: Test-Time Adversarial Prompt Tuning for Robust Inference in Vision-Language ModelsComputer Vision and Pattern Recognition (CVPR), 2024

439

20 Nov 2024

Understanding Adam Requires Better Rotation Dependent Assumptions

Tianyue H. Zhang

Lucas Maes

Alexia Jolicoeur-Martineau

Damien Scieur

Simon Lacoste-Julien

Charles Guille-Escuret

346

25 Oct 2024

Adaptive Data Optimization: Dynamic Sample Selection with Scaling LawsInternational Conference on Learning Representations (ICLR), 2024

Yiding Jiang

307

15 Oct 2024

Fortify Your Foundations: Practical Privacy and Security for Foundation Model Deployments In The Cloud

Marcin Chrapek

Anjo Vahldiek-Oberwagner

Marcin Spoczynski

Scott Constable

Mona Vij

Torsten Hoefler

375

08 Oct 2024

Taylor Unswift: Secured Weight Release for Large Language Models via Taylor ExpansionConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

...

Shuai Xu

495

06 Oct 2024

On Tables with Numbers, with Numbers

Konstantinos Kogkalidis

S. Chatzikyriakidis

LMTD

496

12 Aug 2024

AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt TuningACM Multimedia (MM), 2024

Zhineng Chen

361

04 Aug 2024

DeepCodeProbe: Towards Understanding What Models Trained on Code Learn

Vahid Majdinasab

Amin Nikanjam

Foutse Khomh

318

11 Jul 2024

Optimizing Language Model's Reasoning Abilities with Weak Supervision

423

07 May 2024

Analyzing the Role of Semantic Representations in the Era of Large Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

280

02 May 2024

LLeMpower: Understanding Disparities in the Control and Access of Large Language Models

252

14 Apr 2024

BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text

Elliot Bolton

Abhinav Venigalla

Michihiro Yasunaga

David Leo Wright Hall

...

Christopher D. Manning

LM&MA MedIm

357

124

27 Mar 2024

Improving Sampling Methods for Fine-tuning SentenceBERT in Text StreamsInternational Conference on Pattern Recognition (ICPR), 2024

261

18 Mar 2024

GenOL: Generating Diverse Examples for Name-only Online Learning

Seon Joo Kim

Jonghyun Choi

SyDa

447

16 Mar 2024

Knowledge Conflicts for LLMs: A SurveyConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Yue Zhang

1.3K

240

13 Mar 2024

Copyleft for Alleviating AIGC Copyright Dilemma: What-if Analysis, Public Perception and Implications

216

19 Feb 2024

A Sober Look at LLMs for Material Discovery: Are They Actually Good for Bayesian Optimization Over Molecules?

Agustinus Kristiadi

Felix Strieth-Kalthoff

430

07 Feb 2024

Efficient Prompt Caching via Embedding Similarity

240

02 Feb 2024

The Compute Divide in Machine Learning: A Threat to Academic Contribution and Scrutiny?

320

04 Jan 2024

Train ñ Trade: Foundations of Parameter MarketsNeural Information Processing Systems (NeurIPS), 2023

233

07 Dec 2023

Collaboration or Corporate Capture? Quantifying NLP's Reliance on Industry Artifacts and Contributions

270

06 Dec 2023

ASPEN: High-Throughput LoRA Fine-Tuning of Large Language Models with a Single GPUProceedings of the VLDB Endowment (PVLDB), 2023

...

361

05 Dec 2023

Refine, Discriminate and Align: Stealing Encoders via Sample-Wise Prototypes and Multi-Relational ExtractionEuropean Conference on Computer Vision (ECCV), 2023

Ming Ding

295

01 Dec 2023

Generalisable Agents for Neural Network Optimisation

Benjamin Rosman

241

30 Nov 2023

Efficient Transformer Knowledge Distillation: A Performance ReviewConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

184

22 Nov 2023

Show Your Work with Confidence: Confidence Bands for Tuning Curves

Nicholas Lourie

Kyunghyun Cho

He He

227

16 Nov 2023

Exploring Dataset-Scale Indicators of Data Quality

Ben Feuer

Chinmay Hegde

241

07 Nov 2023

KAKURENBO: Adaptively Hiding Samples in Deep Neural Network Training

Truong Thao Nguyen

Balazs Gerofi

Edgar Josafat Martinez-Noriega

Franccois Trahay

Mohamed Wahib

287

16 Oct 2023

"A Nova Eletricidade: Aplicações, Riscos e Tendências da IA Moderna -- "The New Electricity": Applications, Risks, and Trends in Current AI

Mariana Recamonde Mendoza

T. L. T. D. Silveira

V. P. Moreira

216

08 Oct 2023

Beyond Labeling Oracles: What does it mean to steal ML models?

423

03 Oct 2023

Challenges and Opportunities of Using Transformer-Based Multi-Task Learning in NLP Through ML Lifecycle: A Survey

265

16 Aug 2023

Using Artificial Populations to Study Psychological Phenomena in Neural ModelsAAAI Conference on Artificial Intelligence (AAAI), 2023

303

15 Aug 2023

RAI Guidelines: Method for Generating Responsible AI Guidelines Grounded in Regulations and Usable by (Non-)Technical Roles

Marios Constantinides

Edyta Bogucka

Daniele Quercia

Susanna Kallio

Mohammad Tahaei

320

27 Jul 2023

Improving Retrieval-Augmented Large Language Models via Data Importance Learning

218

06 Jul 2023

Mirage: Towards Low-interruption Services on Batch GPU Clusters with Reinforcement LearningInternational Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2023

Qi-Dong Ding

Pengfei Zheng

Shreyas Kudari

Shivaram Venkataraman

Zhao-jie Zhang

VLM OffRL

234

25 Jun 2023

Document Image Cleaning using Budget-Aware Black-Box Approximation

185

22 Jun 2023

Lost in Translation: Large Language Models in Non-English Content Analysis

Gabriel Nicholas

Aliya Bhatia

ELM

301

12 Jun 2023

Evaluating the Social Impact of Generative AI Systems in Systems and Society

...

563

161

09 Jun 2023

On Optimal Caching and Model Multiplexing for Large Model Inference

410

03 Jun 2023

DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent MethodNeural Information Processing Systems (NeurIPS), 2023

Ahmed Khaled

Konstantin Mishchenko

Chi Jin

ODL

479

25 May 2023

Automated Tensor Model Parallelism with Overlapped Communication for Efficient Foundation Model TrainingIEEE Transactions on Parallel and Distributed Systems (TPDS), 2023

Xiaoge Deng

Dongsheng Li

KaiCheng Lu

207

25 May 2023

Annotation Imputation to Individualize Predictions: Initial Studies on Distribution Dynamics and Model Predictions

303

24 May 2023

MoMo: Momentum Models for Adaptive Learning RatesInternational Conference on Machine Learning (ICML), 2023

407

12 May 2023

INGENIOUS: Using Informative Data Subsets for Efficient Pre-Training of Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

H. S. V. N. S. K. Renduchintala

Krishnateja Killamsetty

Ganesh Ramakrishnan

166

11 May 2023