ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.06649
  4. Cited By
On the Importance of Building High-quality Training Datasets for Neural
  Code Search

On the Importance of Building High-quality Training Datasets for Neural Code Search

14 February 2022
Zhensu Sun
Li Li
Yong-Jin Liu
Xiaoning Du
Li Li
ArXivPDFHTML

Papers citing "On the Importance of Building High-quality Training Datasets for Neural Code Search"

12 / 12 papers shown
Title
Large Language Models are Qualified Benchmark Builders: Rebuilding Pre-Training Datasets for Advancing Code Intelligence Tasks
Large Language Models are Qualified Benchmark Builders: Rebuilding Pre-Training Datasets for Advancing Code Intelligence Tasks
Kang Yang
Xinjun Mao
Shangwen Wang
Yunhong Wang
Tanghaoran Zhang
Bo Lin
Yihao Qin
Zhang Zhang
Yao Lu
Kamal Al-Sabahi
ALM
152
1
0
28 Apr 2025
Secure On-Device Video OOD Detection Without Backpropagation
Secure On-Device Video OOD Detection Without Backpropagation
Li Li
Peilin Cai
Yuxiao Zhou
Zhiyu Ni
Renjie Liang
You Qin
Yi Nian
Z. Tu
Xiyang Hu
Yue Zhao
OODD
FedML
65
2
0
08 Mar 2025
LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation
LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation
Ziyao Zhang
Yanlin Wang
Chong Wang
Jiachi Chen
Zibin Zheng
128
14
0
20 Jan 2025
Multi-Robot Motion Planning with Diffusion Models
Multi-Robot Motion Planning with Diffusion Models
Yorai Shaoul
Itamar Mishani
Shivam Vats
Jiaoyang Li
Maxim Likhachev
DiffM
42
5
0
04 Oct 2024
Large Language Models for Cyber Security: A Systematic Literature Review
Large Language Models for Cyber Security: A Systematic Literature Review
HanXiang Xu
Shenao Wang
Ningke Li
Kaidi Wang
Yanjie Zhao
Kai Chen
Ting Yu
Yang Liu
Haoyu Wang
34
23
0
08 May 2024
VRMN-bD: A Multi-modal Natural Behavior Dataset of Immersive Human Fear
  Responses in VR Stand-up Interactive Games
VRMN-bD: A Multi-modal Natural Behavior Dataset of Immersive Human Fear Responses in VR Stand-up Interactive Games
He Zhang
Xinyang Li
Yuanxi Sun
Xinyi Fu
Christine Qiu
John M. Carroll
32
4
0
22 Jan 2024
Deep Learning for Code Intelligence: Survey, Benchmark and Toolkit
Deep Learning for Code Intelligence: Survey, Benchmark and Toolkit
Yao Wan
Yang He
Zhangqian Bi
Jianguo Zhang
Hongyu Zhang
Yulei Sui
Guandong Xu
Hai Jin
Philip S. Yu
35
20
0
30 Dec 2023
CompCodeVet: A Compiler-guided Validation and Enhancement Approach for
  Code Dataset
CompCodeVet: A Compiler-guided Validation and Enhancement Approach for Code Dataset
Le Chen
Arijit Bhattacharjee
Nesreen K. Ahmed
N. Hasabnis
Gal Oren
Bin Lei
Ali Jannesari
LRM
29
3
0
11 Nov 2023
The Vault: A Comprehensive Multilingual Dataset for Advancing Code
  Understanding and Generation
The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation
Dũng Nguyễn Mạnh
Nam Le Hai
An Dau
A. Nguyen
Khanh N. Nghiem
Jingnan Guo
Nghi D. Q. Bui
31
15
0
09 May 2023
MIXCODE: Enhancing Code Classification by Mixup-Based Data Augmentation
MIXCODE: Enhancing Code Classification by Mixup-Based Data Augmentation
Zeming Dong
Qiang Hu
Yuejun Guo
Maxime Cordy
Mike Papadakis
Zhenya Zhang
Yves Le Traon
Jianjun Zhao
28
8
0
06 Oct 2022
Don't Complete It! Preventing Unhelpful Code Completion for Productive
  and Sustainable Neural Code Completion Systems
Don't Complete It! Preventing Unhelpful Code Completion for Productive and Sustainable Neural Code Completion Systems
Zhensu Sun
Xiaoning Du
Fu Song
Shangwen Wang
Mingze Ni
Li Li
26
10
0
13 Sep 2022
Towards Using Data-Influence Methods to Detect Noisy Samples in Source
  Code Corpora
Towards Using Data-Influence Methods to Detect Noisy Samples in Source Code Corpora
An Dau
Thang Nguyen-Duc
Hoang Thanh-Tung
Nghi D. Q. Bui
TDI
11
4
0
25 May 2022
1