ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.06156
  4. Cited By
The Vault: A Comprehensive Multilingual Dataset for Advancing Code
  Understanding and Generation

The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation

9 May 2023
Dũng Nguyễn Mạnh
Nam Le Hai
An Dau
A. Nguyen
Khanh N. Nghiem
Jingnan Guo
Nghi D. Q. Bui
ArXivPDFHTML

Papers citing "The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation"

15 / 15 papers shown
Title
SWE-Synth: Synthesizing Verifiable Bug-Fix Data to Enable Large Language Models in Resolving Real-World Bugs
SWE-Synth: Synthesizing Verifiable Bug-Fix Data to Enable Large Language Models in Resolving Real-World Bugs
Minh V.T. Pham
Huy N. Phan
Hoang N. Phan
Cuong Le Chi
T. Nguyen
Nghi D. Q. Bui
SyDa
24
0
0
20 Apr 2025
CoDet-M4: Detecting Machine-Generated Code in Multi-Lingual, Multi-Generator and Multi-Domain Settings
CoDet-M4: Detecting Machine-Generated Code in Multi-Lingual, Multi-Generator and Multi-Domain Settings
Daniil Orel
Dilshod Azizov
Preslav Nakov
DeLMO
50
0
0
17 Mar 2025
Dopamin: Transformer-based Comment Classifiers through Domain
  Post-Training and Multi-level Layer Aggregation
Dopamin: Transformer-based Comment Classifiers through Domain Post-Training and Multi-level Layer Aggregation
Nam Le Hai
Nghi D. Q. Bui
26
1
0
06 Aug 2024
XMainframe: A Large Language Model for Mainframe Modernization
XMainframe: A Large Language Model for Mainframe Modernization
Anh T. V. Dau
Hieu Trung Dao
Anh Tuan Nguyen
Hieu Trung Tran
Phong X. Nguyen
Nghi D. Q. Bui
21
1
0
05 Aug 2024
Building a Large Japanese Web Corpus for Large Language Models
Building a Large Japanese Web Corpus for Large Language Models
Naoaki Okazaki
Kakeru Hattori
Hirai Shota
Hiroki Iida
Masanari Ohi
Kazuki Fujii
Taishi Nakamura
Mengsay Loem
Rio Yokota
Sakae Mizuki
47
6
0
27 Apr 2024
Envisioning the Next-Generation AI Coding Assistants: Insights &
  Proposals
Envisioning the Next-Generation AI Coding Assistants: Insights & Proposals
Khanh N. Nghiem
Anh Minh Nguyen
Nghi D. Q. Bui
21
1
0
21 Mar 2024
Between Lines of Code: Unraveling the Distinct Patterns of Machine and
  Human Programmers
Between Lines of Code: Unraveling the Distinct Patterns of Machine and Human Programmers
Yuling Shi
Hongyu Zhang
Chengcheng Wan
Xiaodong Gu
DeLMO
11
3
0
12 Jan 2024
Large Language Models for Software Engineering: A Systematic Literature
  Review
Large Language Models for Software Engineering: A Systematic Literature Review
Xinying Hou
Yanjie Zhao
Yue Liu
Zhou Yang
Kailong Wang
Li Li
Xiapu Luo
David Lo
John C. Grundy
Haoyu Wang
25
320
0
21 Aug 2023
Exploring Distributional Shifts in Large Language Models for Code
  Analysis
Exploring Distributional Shifts in Large Language Models for Code Analysis
Shushan Arakelyan
Rocktim Jyoti Das
Yi Mao
Xiang Ren
ALM
11
18
0
16 Mar 2023
Towards Using Data-Influence Methods to Detect Noisy Samples in Source
  Code Corpora
Towards Using Data-Influence Methods to Detect Noisy Samples in Source Code Corpora
An Dau
Thang Nguyen-Duc
Hoang Thanh-Tung
Nghi D. Q. Bui
TDI
11
4
0
25 May 2022
CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for
  Code Understanding and Generation
CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation
Yue Wang
Weishi Wang
Shafiq R. Joty
S. Hoi
201
1,451
0
02 Sep 2021
Measuring Coding Challenge Competence With APPS
Measuring Coding Challenge Competence With APPS
Dan Hendrycks
Steven Basart
Saurav Kadavath
Mantas Mazeika
Akul Arora
...
Collin Burns
Samir Puranik
Horace He
D. Song
Jacob Steinhardt
ELM
AIMat
ALM
192
614
0
20 May 2021
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding
  and Generation
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation
Shuai Lu
Daya Guo
Shuo Ren
Junjie Huang
Alexey Svyatkovskiy
...
Nan Duan
Neel Sundaresan
Shao Kun Deng
Shengyu Fu
Shujie Liu
ELM
190
853
0
09 Feb 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
242
1,508
0
31 Dec 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
220
3,054
0
23 Jan 2020
1