ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.24183
378
3
v1v2v3v4 (latest)

CodeV-R1: Reasoning-Enhanced Verilog Generation

30 May 2025
Y. Zhu
Di Huang
Hanqi Lyu
X. Zhang
Chongxiao Li
Wenxuan Shi
Yutong Wu
Jianan Mu
Jinghua Wang
Yang Zhao
Pengwei Jin
Shuyao Cheng
Shengwen Liang
Xishan Zhang
Rui Zhang
Zidong Du
Qi Guo
Xing Hu
Yihao Chen
    OffRLLRM
ArXiv (abs)PDFHTMLHuggingFace (8 upvotes)Github (12263★)
Main:10 Pages
11 Figures
Bibliography:4 Pages
10 Tables
Appendix:14 Pages
Abstract

Large language models (LLMs) trained via reinforcement learning with verifiable reward (RLVR) have achieved breakthroughs on tasks with explicit, automatable verification, such as software programming and mathematical problems. Extending RLVR to electronic design automation (EDA), especially automatically generating hardware description languages (HDLs) like Verilog from natural-language (NL) specifications, however, poses three key challenges: the lack of automated and accurate verification environments, the scarcity of high-quality NL-code pairs, and the prohibitive computation cost of RLVR. To this end, we introduce CodeV-R1, an RLVR framework for training Verilog generation LLMs. First, we develop a rule-based testbench generator that performs robust equivalence checking against golden references. Second, we propose a round-trip data synthesis method that pairs open-source Verilog snippets with LLM-generated NL descriptions, verifies code-NL-code consistency via the generated testbench, and filters out inequivalent examples to yield a high-quality dataset. Third, we employ a two-stage "distill-then-RL" training pipeline: distillation for the cold start of reasoning abilities, followed by adaptive DAPO, our novel RLVR algorithm that can reduce training cost by adaptively adjusting sampling rate. The resulting model, CodeV-R1-7B, achieves 68.6% and 72.9% pass@1 on VerilogEval v2 and RTLLM v1.1, respectively, surpassing prior state-of-the-art by 12~20%, while matching or even exceeding the performance of 671B DeepSeek-R1. We will release our model, training pipeline, and dataset to facilitate research in EDA and LLM communities.

View on arXiv
Comments on this paper