ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2507.04952
  4. Cited By
ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation
v1v2 (latest)

ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation

7 July 2025
Chenchen Zhang
Yuhang Li
Can Xu
Jiaheng Liu
Ao Liu
Changzhi Zhou
K. Deng
Dengpeng Wu
Guanhua Huang
K. Li
Qi Yi
Ruibin Xiong
Shihui Hu
Yue Zhang
Yuhao Jiang
Zenan Xu
Yuanxing Zhang
Wiggin Zhou
Chayse Zhou
Fengzong Lian
ArXiv (abs)PDFHTMLHuggingFace (8 upvotes)

Papers citing "ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation"

6 / 6 papers shown
EWE: An Agentic Framework for Extreme Weather Analysis
EWE: An Agentic Framework for Extreme Weather Analysis
Zhe Jiang
Jiong Wang
Xiaoyu Yue
Zijie Guo
Wenlong Zhang
Fenghua Ling
Wanli Ouyang
L. Bai
164
1
0
26 Nov 2025
SWE-Compass: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models
SWE-Compass: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models
Jingxuan Xu
K. Deng
W. Li
Songwei Yu
Huaixi Tang
...
Zhaoxiang Zhang
Yuqun Zhang
H. Zhang
Bin Chen
Jiaheng Liu
ELM
351
1
0
07 Nov 2025
VinciCoder: Unifying Multimodal Code Generation via Coarse-to-fine Visual Reinforcement Learning
VinciCoder: Unifying Multimodal Code Generation via Coarse-to-fine Visual Reinforcement Learning
Xuanle Zhao
Deyang Jiang
Zhixiong Zeng
Lei Chen
Haibo Qiu
Jing Huang
Yufeng Zhong
Liming Zheng
Yilin Cao
Lin Ma
140
2
0
01 Nov 2025
ReLook: Vision-Grounded RL with a Multimodal LLM Critic for Agentic Web Coding
ReLook: Vision-Grounded RL with a Multimodal LLM Critic for Agentic Web Coding
Yuhang Li
Chenchen Zhang
Ruilin Lv
Ao Liu
K. Deng
Yuanxing Zhang
Jiaheng Liu
Wiggin Zhou
B. Zhou
LRM
103
3
0
13 Oct 2025
InteractScience: Programmatic and Visually-Grounded Evaluation of Interactive Scientific Demonstration Code Generation
InteractScience: Programmatic and Visually-Grounded Evaluation of Interactive Scientific Demonstration Code Generation
Qiaosheng Chen
Y. Liu
Lei Li
Kai Chen
Q. Guo
Gong Cheng
Fei Yuan
ELM
153
1
0
10 Oct 2025
AIM-Bench: Evaluating Decision-making Biases of Agentic LLM as Inventory Manager
AIM-Bench: Evaluating Decision-making Biases of Agentic LLM as Inventory Manager
Xuhua Zhao
Yuxuan Xie
Caihua Chen
Yuxiang Sun
65
0
0
15 Aug 2025
1