v1v2 (latest)

Evaluating Large Language Models Trained on Code

7 July 2021

ArXiv (abs)PDF HTML HuggingFace (8 upvotes)

Papers citing "Evaluating Large Language Models Trained on Code"

50 / 4,509 papers shown

MMTU: A Massive Multi-Task Table Understanding and Reasoning Benchmark

279

05 Jun 2025

Sensory-Motor Control with Large Language Models via Iterative Policy Refinement

J. Carvalho

S. Nolfi

LM&Ro

367

05 Jun 2025

Bohdi: Heterogeneous LLM Fusion with Automatic Data Exploration

408

04 Jun 2025

From Understanding to Generation: An Efficient Shortcut for Evaluating Language Models

286

04 Jun 2025

Seed-Coder: Let the Code Model Curate Data for Itself

...

342

04 Jun 2025

APT: Improving Specialist LLM Performance with Weakness Case Acquisition and Iterative Preference TrainingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

249

04 Jun 2025

The Cost of Dynamic Reasoning: Demystifying AI Agents and Test-Time Scaling from an AI Infrastructure Perspective

356

04 Jun 2025

AdaDecode: Accelerating LLM Decoding with Adaptive Layer Parallelism

318

04 Jun 2025

CETBench: A Novel Dataset constructed via Transformations over Programs for Benchmarking LLMs for Code-Equivalence Checking

238

04 Jun 2025

Generating Automotive Code: Large Language Models for Software Development and Verification in Safety-Critical Systems

Sven Kirchner

Alois Knoll

176

04 Jun 2025

Understanding Gender Bias in AI-Generated Product DescriptionsConference on Fairness, Accountability and Transparency (FAccT), 2025

229

03 Jun 2025

Cataloguing Hugging Face Models to Software Engineering Activities: Automation and Findings

302

03 Jun 2025

Rethinking the effects of data contamination in Code Intelligence

292

03 Jun 2025

EvaLearn: Quantifying the Learning Capability and Efficiency of LLMs via Sequential Problem Solving

...

308

03 Jun 2025

SVGenius: Benchmarking LLMs in SVG Understanding, Editing and Generation

...

229

03 Jun 2025

MASTER: Enhancing Large Language Model via Multi-Agent Simulated Teaching

289

03 Jun 2025

FroM: Frobenius Norm-Based Data-Free Adaptive Model Merging

353

03 Jun 2025

Simplifying Root Cause Analysis in Kubernetes with StateGraph and LLM

186

03 Jun 2025

FlowerTune: A Cross-Domain Benchmark for Federated Fine-Tuning of Large Language Models

Yan Gao

Massimo Roberto Scamarcia

Javier Fernandez-Marques

...

406

03 Jun 2025

EALG: Evolutionary Adversarial Generation of Language Model-Guided Generators for Combinatorial Optimization

291

03 Jun 2025

AI Scientists Fail Without Strong Implementation Capability

351

02 Jun 2025

The Unified Cognitive Consciousness Theory for Language Models: Anchoring Semantics, Thresholds of Activation, and Emergent Reasoning

384

02 Jun 2025

MLorc: Momentum Low-rank Compression for Memory Efficient Large Language Model Adaptation

342

02 Jun 2025

TAH-QUANT: Effective Activation Quantization in Pipeline Parallelism over Slow Network

213

02 Jun 2025

Improving LLM-Generated Code Quality with GRPO

Maxime Robeyns

Laurence Aitchison

ALM

171

02 Jun 2025

Earley-Driven Dynamic Pruning for Efficient Structured Decoding

139

01 Jun 2025

Legal Compliance Evaluation of Smart Contracts Generated By Large Language ModelsInternational Conference on Blockchain (ICB), 2025

152

01 Jun 2025

Mamba Drafters for Speculative Decoding

...

294

01 Jun 2025

Behavioral Augmentation of UML Class Diagrams: An Empirical Study of Large Language Models for Method Generation

Djaber Rouabhia

Ismail Hadjadj

187

01 Jun 2025

ChartGen: Scaling Chart Understanding Via Code-Guided Synthetic Chart Generation

...

140

31 May 2025

CodeSense: a Real-World Benchmark and Dataset for Code Semantic Reasoning

264

31 May 2025

FLoE: Fisher-Based Layer Selection for Efficient Sparse Adaptation of Low-Rank Experts

213

31 May 2025

SwiftEval: Developing a Language-Specific Benchmark for LLM-generated Code Evaluation

210

30 May 2025

Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning

204

30 May 2025

Tag-Evol: Achieving Efficient Instruction Evolving via Tag InjectionAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

164

30 May 2025

RAST: Reasoning Activation in LLMs via Small-model Transfer

256

30 May 2025

Control-R: Towards controllable test-time scaling

...

198

30 May 2025

REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards

379

30 May 2025

Structure-Aware Fill-in-the-Middle Pretraining for Code

149

30 May 2025

Accelerated Sampling from Masked Diffusion Models via Entropy Bounded Unmasking

257

30 May 2025

An Adversary-Resistant Multi-Agent LLM System via Credibility Scoring

Sana Ebrahimi

Mohsen Dehghankar

Abolfazl Asudeh

204

30 May 2025

AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning

...

517

30 May 2025

Reasoning Models Hallucinate More: Factuality-Aware Reinforcement Learning for Large Reasoning Models

Junyi Li

Hwee Tou Ng

OffRL HILM LRM

561

30 May 2025

HardTests: Synthesizing High-Quality Test Cases for LLM Coding

322

30 May 2025

QiMeng-CodeV-R1: Reasoning-Enhanced Verilog Generation

...

413

30 May 2025

Can LLMs Reason Structurally? An Evaluation via the Lens of Data Structures

234

29 May 2025

Cross-Task Experiential Learning on LLM-based Multi-Agent Collaboration

...

316

29 May 2025

ZeroGUI: Automating Online GUI Learning at Zero Human Cost

...

349

29 May 2025

VERINA: Benchmarking Verifiable Code Generation

234

29 May 2025

Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training

462

29 May 2025