Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.03859
Cited By
SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?
4 October 2024
John Yang
Carlos E. Jimenez
Alex Zhang
K. Lieret
Joyce Yang
Xindi Wu
Ori Press
Niklas Muennighoff
Gabriel Synnaeve
Karthik Narasimhan
Diyi Yang
Sida I. Wang
Ofir Press
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?"
5 / 5 papers shown
Title
WebGen-Bench: Evaluating LLMs on Generating Interactive and Functional Websites from Scratch
Zimu Lu
Y. Yang
Houxing Ren
Haotian Hou
Han Xiao
Ke Wang
Weikang Shi
Aojun Zhou
Mingjie Zhan
H. Li
LLMAG
30
0
0
06 May 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
X. Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Yu Jiang
ALM
ELM
84
0
0
26 Apr 2025
Frontier AI's Impact on the Cybersecurity Landscape
Wenbo Guo
Yujin Potter
Tianneng Shi
Zhun Wang
Andy Zhang
Dawn Song
28
1
0
07 Apr 2025
SWE-Lancer: Can Frontier LLMs Earn
1
M
i
l
l
i
o
n
f
r
o
m
R
e
a
l
−
W
o
r
l
d
F
r
e
e
l
a
n
c
e
S
o
f
t
w
a
r
e
E
n
g
i
n
e
e
r
i
n
g
?
1 Million from Real-World Freelance Software Engineering?
1
M
i
ll
i
o
n
f
ro
m
R
e
a
l
−
W
or
l
d
F
ree
l
an
ce
S
o
f
tw
a
re
E
n
g
in
eer
in
g
?
Samuel Miserendino
M. Wang
Tejal Patwardhan
Johannes Heidecke
36
17
0
17 Feb 2025
Interactive Tools Substantially Assist LM Agents in Finding Security Vulnerabilities
Talor Abramovich
Meet Udeshi
Minghao Shao
K. Lieret
Haoran Xi
...
Brendan Dolan-Gavitt
Muhammad Shafique
Karthik Narasimhan
Ramesh Karri
Ofir Press
LLMAG
22
5
0
24 Sep 2024
1