Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2108.11590
Cited By
AVATAR: A Parallel Corpus for Java-Python Program Translation
26 August 2021
W. Ahmad
Md Golam Rahman Tushar
Saikat Chakraborty
Kai-Wei Chang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"AVATAR: A Parallel Corpus for Java-Python Program Translation"
35 / 35 papers shown
Title
Large Language Models are Qualified Benchmark Builders: Rebuilding Pre-Training Datasets for Advancing Code Intelligence Tasks
Kang Yang
Xinjun Mao
Shangwen Wang
Y. Wang
Tanghaoran Zhang
Bo Lin
Yihao Qin
Zhang Zhang
Yao Lu
Kamal Al-Sabahi
ALM
40
1
0
28 Apr 2025
OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs
Wasi Uddin Ahmad
Aleksander Ficek
Mehrzad Samadi
Jocelyn Huang
Vahid Noroozi
Somshubra Majumdar
Boris Ginsburg
ALM
29
0
0
05 Apr 2025
Collaboration is all you need: LLM Assisted Safe Code Translation
Rabimba Karanjai
Sam Blackshear
Lei Xu
W. Shi
37
0
0
14 Mar 2025
Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol
Roham Koohestani
Philippe de Bekker
M. Izadi
VLM
45
0
0
07 Mar 2025
Skeleton-Guided-Translation: A Benchmarking Framework for Code Repository Translation with Fine-Grained Quality Evaluation
Xing Zhang
Jiaheng Wen
Fangkai Yang
Pu Zhao
Yu Kang
...
Qingwei Lin
Yingnong Dang
Saravan Rajmohan
Dongmei Zhang
Qi Zhang
49
2
0
28 Jan 2025
How Should We Build A Benchmark? Revisiting 274 Code-Related Benchmarks For LLMs
Jialun Cao
Yuk-Kit Chan
Zixuan Ling
Wenxuan Wang
Shuqing Li
...
Pinjia He
Shuai Wang
Zibin Zheng
Michael R. Lyu
S. Cheung
ALM
66
1
0
18 Jan 2025
TRANSAGENT: An LLM-Based Multi-Agent System for Code Translation
Zhiqiang Yuan
Weitong Chen
Hanlin Wang
Kai Yu
Xin Peng
Yiling Lou
LLMAG
15
8
0
30 Sep 2024
CRUXEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution
Ruiyang Xu
Jialun Cao
Y. Lu
Hongyu Lin
Xianpei Han
Ben He
S. Cheung
Le Sun
ELM
LRM
27
0
0
23 Aug 2024
VersiCode: Towards Version-controllable Code Generation
Tongtong Wu
Weigang Wu
Xingyu Wang
Kang Xu
Suyu Ma
Bo Jiang
Ping Yang
Zhenchang Xing
Yuan-Fang Li
Gholamreza Haffari
24
4
0
11 Jun 2024
Divide-and-Conquer Meets Consensus: Unleashing the Power of Functions in Code Generation
Jingchang Chen
Hongxuan Tang
Zheng Chu
Qianglong Chen
Zekun Wang
Ming Liu
Bing Qin
37
0
0
30 May 2024
MHPP: Exploring the Capabilities and Limitations of Language Models Beyond Basic Code Generation
Jianbo Dai
Jianqiao Lu
Yunlong Feng
Rongju Ruan
Ming Cheng
Haochen Tan
Zhijiang Guo
ELM
LRM
31
11
0
19 May 2024
Granite Code Models: A Family of Open Foundation Models for Code Intelligence
Mayank Mishra
Matt Stallone
Gaoyuan Zhang
Yikang Shen
Aditya Prasad
...
Amith Singhee
Nirmit Desai
David D. Cox
Ruchir Puri
Rameswar Panda
AI4TS
38
51
0
07 May 2024
CodeMind: A Framework to Challenge Large Language Models for Code Reasoning
Changshu Liu
Shizhuo Dylan Zhang
Ali Reza Ibrahimzada
Reyhaneh Jabbarvand
ELM
ReCod
LRM
22
0
0
15 Feb 2024
EffiBench: Benchmarking the Efficiency of Automatically Generated Code
Dong Huang
Yuhao Qing
Weiyi Shang
Heming Cui
Jie M. Zhang
68
10
0
03 Feb 2024
Structured Code Representations Enable Data-Efficient Adaptation of Code Language Models
Mayank Agarwal
Yikang Shen
Bailin Wang
Yoon Kim
Jie Chen
29
0
0
19 Jan 2024
Deep Learning for Code Intelligence: Survey, Benchmark and Toolkit
Yao Wan
Yang He
Zhangqian Bi
Jianguo Zhang
Hongyu Zhang
Yulei Sui
Guandong Xu
Hai Jin
Philip S. Yu
12
20
0
30 Dec 2023
Data Augmentation for Code Translation with Comparable Corpora and Multiple References
Yiqing Xie
Atharva Naik
Daniel Fried
Carolyn Rose
34
4
0
01 Nov 2023
SUT: Active Defects Probing for Transcompiler Models
Mengnan Qi
Yufan Huang
Maoquan Wang
Yongqiang Yao
Zihan Liu
Bin Gu
Colin B. Clement
Neel Sundaresan
17
0
0
22 Oct 2023
CodeTransOcean: A Comprehensive Multilingual Benchmark for Code Translation
Weixiang Yan
Yuchen Tian
Yunzhe Li
Qian Chen
Wen Wang
10
15
0
08 Oct 2023
Bias Testing and Mitigation in LLM-based Code Generation
Dong Huang
Qingwen Bu
Jie M. Zhang
Xiaofei Xie
Junjie Chen
Heming Cui
23
20
0
03 Sep 2023
CodeBPE: Investigating Subtokenization Options for Large Language Model Pretraining on Source Code
Nadezhda Chirkova
Sergey Troshin
8
6
0
01 Aug 2023
CoTran: An LLM-based Code Translator using Reinforcement Learning with Feedback from Compiler and Symbolic Execution
Prithwish Jana
Piyush Jha
Haoyang Ju
Gautham Kishore
Aryan Mahajan
Vijay Ganesh
14
12
0
11 Jun 2023
SLaDe: A Portable Small Language Model Decompiler for Optimized Assembly
Jordi Armengol-Estapé
Jackson Woodruff
Chris Cummins
Michael F. P. O'Boyle
25
15
0
21 May 2023
xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval
Mohammad Abdullah Matin Khan
M Saiful Bari
Xuan Long Do
Weishi Wang
Md. Rizwan Parvez
Shafiq R. Joty
ALM
ELM
19
0
0
06 Mar 2023
Multi-lingual Evaluation of Code Generation Models
Ben Athiwaratkun
Sanjay Krishna Gouda
Zijian Wang
Xiaopeng Li
Yuchen Tian
...
Baishakhi Ray
Parminder Bhatia
Sudipta Sengupta
Dan Roth
Bing Xiang
ELM
96
117
0
26 Oct 2022
XLCoST: A Benchmark Dataset for Cross-lingual Code Intelligence
Ming Zhu
Aneesh Jain
Karthik Suresh
Roshan Ravindran
Sindhu Tipirneni
Chandan K. Reddy
16
43
0
16 Jun 2022
FixEval: Execution-based Evaluation of Program Fixes for Programming Problems
Md. Mahim Anjum Haque
W. Ahmad
Ismini Lourentzou
Chris Brown
13
15
0
15 Jun 2022
NatGen: Generative pre-training by "Naturalizing" source code
Saikat Chakraborty
Toufique Ahmed
Yangruibo Ding
Prem Devanbu
Baishakhi Ray
AI4CE
26
116
0
15 Jun 2022
Summarize and Generate to Back-translate: Unsupervised Translation of Programming Languages
Wasi Uddin Ahmad
Saikat Chakraborty
Baishakhi Ray
Kai-Wei Chang
24
27
0
23 May 2022
Probing Pretrained Models of Source Code
Sergey Troshin
Nadezhda Chirkova
ELM
17
28
0
16 Feb 2022
Better Together? An Evaluation of AI-Supported Code Translation
Justin D. Weisz
Michael J. Muller
Steven I. Ross
Fernando Martinez
Stephanie Houde
Mayank Agarwal
Kartik Talamadupula
John T. Richards
22
48
0
15 Feb 2022
Many Heads but One Brain: Fusion Brain -- a Competition and a Single Multimodal Multitask Architecture
Daria Bakshandaeva
Denis Dimitrov
V.Ya. Arkhipkin
Alex Shonenkov
M. Potanin
...
Mikhail Martynov
Anton Voronov
Vera Davydova
E. Tutubalina
Aleksandr Petiushko
30
0
0
22 Nov 2021
Using Document Similarity Methods to create Parallel Datasets for Code Translation
Mayank Agarwal
Kartik Talamadupula
Fernando Martinez
Stephanie Houde
Michael J. Muller
John T. Richards
Steven I. Ross
Justin D. Weisz
SyDa
15
6
0
11 Oct 2021
CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation
Yue Wang
Weishi Wang
Shafiq R. Joty
S. Hoi
196
1,451
0
02 Sep 2021
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation
Shuai Lu
Daya Guo
Shuo Ren
Junjie Huang
Alexey Svyatkovskiy
...
Nan Duan
Neel Sundaresan
Shao Kun Deng
Shengyu Fu
Shujie Liu
ELM
183
1,098
0
09 Feb 2021
1