ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.01511
  4. Cited By
CRAB: Cross-environment Agent Benchmark for Multimodal Language Model
  Agents

CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents

1 July 2024
Tianqi Xu
Linyao Chen
Dai-Jie Wu
Yanjun Chen
Zecheng Zhang
Xiang Yao
Zhiqiang Xie
Yongchao Chen
Shilong Liu
Bochen Qian
Philip H. S. Torr
Bernard Ghanem
G. Li
ArXivPDFHTML

Papers citing "CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents"

19 / 19 papers shown
Title
V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in Multimodal Large Language Models
V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in Multimodal Large Language Models
Xiangxi Zheng
Linjie Li
Z. Yang
Ping Yu
Alex Jinpeng Wang
Rui Yan
Yuan Yao
Lijuan Wang
LRM
16
0
0
08 Apr 2025
Attacking Multimodal OS Agents with Malicious Image Patches
Lukas Aichberger
Alasdair Paren
Y. Gal
Philip H. S. Torr
Adel Bibi
AAML
51
2
0
13 Mar 2025
ZeroSumEval: An Extensible Framework For Scaling LLM Evaluation with Inter-Model Competition
ZeroSumEval: An Extensible Framework For Scaling LLM Evaluation with Inter-Model Competition
H. A. Alyahya
Haidar Khan
Yazeed Alnumay
M Saiful Bari
B. Yener
LRM
52
0
0
10 Mar 2025
EMOS: Embodiment-aware Heterogeneous Multi-robot Operating System with LLM Agents
EMOS: Embodiment-aware Heterogeneous Multi-robot Operating System with LLM Agents
Junting Chen
Checheng Yu
Xunzhe Zhou
Tianqi Xu
Yao Mu
Mengkang Hu
Wenqi Shao
Y. Wang
G. Li
Lin Shao
60
4
0
30 Oct 2024
OSCAR: Operating System Control via State-Aware Reasoning and
  Re-Planning
OSCAR: Operating System Control via State-Aware Reasoning and Re-Planning
Xiaoqiang Wang
Bang Liu
LLMAG
LM&Ro
LRM
31
6
0
24 Oct 2024
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms
Zhangheng Li
Keen You
H. Zhang
Di Feng
Harsh Agrawal
Xiujun Li
Mohana Prasad Sathya Moorthy
Jeff Nichols
Y. Yang
Zhe Gan
MLLM
40
18
0
24 Oct 2024
Agent Skill Acquisition for Large Language Models via CycleQD
Agent Skill Acquisition for Large Language Models via CycleQD
So Kuroki
Taishi Nakamura
Takuya Akiba
Yujin Tang
MoMe
24
0
0
16 Oct 2024
Agent-as-a-Judge: Evaluate Agents with Agents
Agent-as-a-Judge: Evaluate Agents with Agents
Mingchen Zhuge
Changsheng Zhao
Dylan R. Ashley
Wenyi Wang
Dmitrii Khizbullin
...
Raghuraman Krishnamoorthi
Yuandong Tian
Yangyang Shi
Vikas Chandra
Jürgen Schmidhuber
ELM
57
32
0
14 Oct 2024
COMMA: A Communicative Multimodal Multi-Agent Benchmark
COMMA: A Communicative Multimodal Multi-Agent Benchmark
Timothy Ossowski
Jixuan Chen
Danyal Maqbool
Zefan Cai
Tyler J. Bradshaw
Junjie Hu
VLM
29
2
0
10 Oct 2024
Steering Large Language Models between Code Execution and Textual Reasoning
Steering Large Language Models between Code Execution and Textual Reasoning
Yongchao Chen
Harsh Jhamtani
Srinagesh Sharma
Chuchu Fan
Chi Wang
LLMAG
LRM
26
6
0
04 Oct 2024
A Survey on Multimodal Benchmarks: In the Era of Large AI Models
A Survey on Multimodal Benchmarks: In the Era of Large AI Models
Lin Li
Guikun Chen
Hanrong Shi
Jun Xiao
Long Chen
34
8
0
21 Sep 2024
A Survey on Evaluation of Multimodal Large Language Models
A Survey on Evaluation of Multimodal Large Language Models
Jiaxing Huang
Jingyi Zhang
LM&MA
ELM
LRM
43
20
0
28 Aug 2024
OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist
  Autonomous Agents for Desktop and Web
OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web
Raghav Kapoor
Y. Butala
M. Russak
Jing Yu Koh
Kiran Kamble
Waseem Alshikh
Ruslan Salakhutdinov
LLMAG
51
44
0
27 Feb 2024
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
Zhiyong Wu
Chengcheng Han
Zichen Ding
Zhenmin Weng
Zhoumianze Liu
Shunyu Yao
Tao Yu
Lingpeng Kong
LLMAG
LM&Ro
107
83
0
12 Feb 2024
ScreenAgent: A Vision Language Model-driven Computer Control Agent
ScreenAgent: A Vision Language Model-driven Computer Control Agent
Runliang Niu
Jindong Li
Shiqi Wang
Yali Fu
Xiyu Hu
Xueyuan Leng
He Kong
Yi Chang
Qi Wang
LLMAG
MLLM
LM&Ro
55
37
0
09 Feb 2024
Graph-enhanced Large Language Models in Asynchronous Plan Reasoning
Graph-enhanced Large Language Models in Asynchronous Plan Reasoning
Fangru Lin
Emanuele La Malfa
Valentin Hofmann
Elle Michelle Yang
Anthony Cohn
J. Pierrehumbert
LRM
48
16
0
05 Feb 2024
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
Kanzhi Cheng
Qiushi Sun
Yougang Chu
Fangzhi Xu
Yantao Li
Jianbing Zhang
Zhiyong Wu
LLMAG
162
137
0
17 Jan 2024
CogAgent: A Visual Language Model for GUI Agents
CogAgent: A Visual Language Model for GUI Agents
Wenyi Hong
Weihan Wang
Qingsong Lv
Jiazheng Xu
Wenmeng Yu
...
Juanzi Li
Bin Xu
Yuxiao Dong
Ming Ding
Jie Tang
MLLM
132
310
0
14 Dec 2023
Generative Agents: Interactive Simulacra of Human Behavior
Generative Agents: Interactive Simulacra of Human Behavior
J. Park
Joseph C. O'Brien
Carrie J. Cai
Meredith Ringel Morris
Percy Liang
Michael S. Bernstein
LM&Ro
AI4CE
206
1,701
0
07 Apr 2023
1