ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.16158
  4. Cited By
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual
  Perception

Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception

29 January 2024
Junyang Wang
Haiyang Xu
Jiabo Ye
Mingshi Yan
Weizhou Shen
Ji Zhang
Fei Huang
Jitao Sang
ArXivPDFHTML

Papers citing "Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception"

50 / 79 papers shown
Title
EcoAgent: An Efficient Edge-Cloud Collaborative Multi-Agent Framework for Mobile Automation
EcoAgent: An Efficient Edge-Cloud Collaborative Multi-Agent Framework for Mobile Automation
Biao Yi
Xavier Hu
Y. Chen
Shengyu Zhang
Hongxia Yang
Fan Wu
Fei Wu
LLMAG
60
0
0
08 May 2025
Visual Test-time Scaling for GUI Agent Grounding
Visual Test-time Scaling for GUI Agent Grounding
Tiange Luo
Lajanugen Logeswaran
Justin Johnson
Honglak Lee
48
0
0
01 May 2025
NGENT: Next-Generation AI Agents Must Integrate Multi-Domain Abilities to Achieve Artificial General Intelligence
NGENT: Next-Generation AI Agents Must Integrate Multi-Domain Abilities to Achieve Artificial General Intelligence
Zhicong Li
Hangyu Mao
Jiangjin Yin
Mingzhe Xing
Zhiwei Xu
Yuanxing Zhang
Yang Xiao
29
0
0
30 Apr 2025
AndroidGen: Building an Android Language Agent under Data Scarcity
AndroidGen: Building an Android Language Agent under Data Scarcity
Hanyu Lai
Junjie Gao
Xiao-Yang Liu
Y. Xu
S. Zhang
Yuxiao Dong
Jie Tang
LLMAG
72
0
0
27 Apr 2025
Generative AI in Embodied Systems: System-Level Analysis of Performance, Efficiency and Scalability
Generative AI in Embodied Systems: System-Level Analysis of Performance, Efficiency and Scalability
Zishen Wan
Jiayi Qian
Yuhang Du
Jason J. Jabbour
Yilun Du
Yang Katie Zhao
A. Raychowdhury
Tushar Krishna
Vijay Janapa Reddi
LM&Ro
86
0
0
26 Apr 2025
V$^2$R-Bench: Holistically Evaluating LVLM Robustness to Fundamental Visual Variations
V2^22R-Bench: Holistically Evaluating LVLM Robustness to Fundamental Visual Variations
Zhiyuan Fan
Yumeng Wang
Sandeep Polisetty
Yi Ren Fung
43
0
0
23 Apr 2025
On the Robustness of GUI Grounding Models Against Image Attacks
On the Robustness of GUI Grounding Models Against Image Attacks
Haoren Zhao
Tianyi Chen
Zhen Wang
AAML
31
0
0
07 Apr 2025
UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning
UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning
Zhengxi Lu
Yuxiang Chai
Yaxuan Guo
Xi Yin
Liang Liu
Hao Wang
Han Xiao
Shuai Ren
Guanjing Xiong
H. Li
LLMAG
LRM
74
9
0
27 Mar 2025
Advancing Mobile GUI Agents: A Verifier-Driven Approach to Practical Deployment
Advancing Mobile GUI Agents: A Verifier-Driven Approach to Practical Deployment
Gaole Dai
Shiqi Jiang
Ting Cao
Yuanchun Li
Y. Yang
Rui Tan
Mo Li
Lili Qiu
46
0
0
20 Mar 2025
Growing a Twig to Accelerate Large Vision-Language Models
Growing a Twig to Accelerate Large Vision-Language Models
Zhenwei Shao
Mingyang Wang
Zhou Yu
Wenwen Pan
Yan Yang
Tao Wei
H. Zhang
Ning Mao
Wei Chen
Jun Yu
VLM
59
1
0
18 Mar 2025
Aligning Multimodal LLM with Human Preference: A Survey
Aligning Multimodal LLM with Human Preference: A Survey
Tao Yu
Y. Zhang
Chaoyou Fu
Junkang Wu
Jinda Lu
...
Qingsong Wen
Z. Zhang
Yan Huang
Liang Wang
T. Tan
73
2
0
18 Mar 2025
CombatVLA: An Efficient Vision-Language-Action Model for Combat Tasks in 3D Action Role-Playing Games
Peng Chen
Pi Bu
Yingyao Wang
Xinyi Wang
Ziming Wang
...
Qi Zhu
Jun Song
Siran Yang
Jiamang Wang
Bo Zheng
70
2
0
12 Mar 2025
SmartBench: Is Your LLM Truly a Good Chinese Smartphone Assistant?
Xudong Lu
Haohao Gao
Renshou Wu
Shuai Ren
Xiaoxin Chen
Hongsheng Li
Fangyuan Li
ELM
49
0
0
08 Mar 2025
A Survey of Large Language Model Empowered Agents for Recommendation and Search: Towards Next-Generation Information Retrieval
A Survey of Large Language Model Empowered Agents for Recommendation and Search: Towards Next-Generation Information Retrieval
Yu Zhang
Shutong Qiao
Jiaqi Zhang
Tzu-Heng Lin
Chen Gao
Y. Li
LM&Ro
LM&MA
82
0
0
07 Mar 2025
SpiritSight Agent: Advanced GUI Agent with One Look
SpiritSight Agent: Advanced GUI Agent with One Look
Zhiyuan Huang
Ziming Cheng
Junting Pan
Zhaohui Hou
Mingjie Zhan
LLMAG
85
2
0
05 Mar 2025
AppAgentX: Evolving GUI Agents as Proficient Smartphone Users
AppAgentX: Evolving GUI Agents as Proficient Smartphone Users
Wenjia Jiang
Yangyang Zhuang
Chenxi Song
Xu Yang
Chi Zhang
Chi Zhang
LLMAG
79
1
0
04 Mar 2025
UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface
Hao Tang
Chenwei Xie
Haiyang Wang
Xiaoyi Bao
Tingyu Weng
Pandeng Li
Yun Zheng
Liwei Wang
ObjD
VLM
54
0
0
03 Mar 2025
RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete
RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete
Yuheng Ji
Huajie Tan
Jiayu Shi
Xiaoshuai Hao
Yuan Zhang
...
Huaihai Lyu
Xiaolong Zheng
Jiaming Liu
Zhongyuan Wang
Shanghang Zhang
80
5
0
28 Feb 2025
MobileSteward: Integrating Multiple App-Oriented Agents with Self-Evolution to Automate Cross-App Instructions
MobileSteward: Integrating Multiple App-Oriented Agents with Self-Evolution to Automate Cross-App Instructions
Yuxuan Liu
Hongda Sun
Wei Liu
Jian Luan
Bo Du
Rui Yan
48
1
0
24 Feb 2025
Position: Standard Benchmarks Fail -- LLM Agents Present Overlooked Risks for Financial Applications
Position: Standard Benchmarks Fail -- LLM Agents Present Overlooked Risks for Financial Applications
Zichen Chen
Jiaao Chen
Jianda Chen
Misha Sra
ELM
34
1
0
21 Feb 2025
Magma: A Foundation Model for Multimodal AI Agents
Magma: A Foundation Model for Multimodal AI Agents
Jianwei Yang
Reuben Tan
Qianhui Wu
Ruijie Zheng
Baolin Peng
...
Seonghyeon Ye
Joel Jang
Yuquan Deng
Lars Liden
Jianfeng Gao
VLM
AI4TS
98
8
0
18 Feb 2025
AppVLM: A Lightweight Vision Language Model for Online App Control
AppVLM: A Lightweight Vision Language Model for Online App Control
Georgios Papoudakis
Thomas Coste
Zhihao Wu
Jianye Hao
J. Wang
Kun Shao
46
1
0
10 Feb 2025
Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks
Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks
Zhenhailong Wang
Haiyang Xu
Junyang Wang
Xi Zhang
Ming Yan
J. Zhang
Fei Huang
Heng Ji
41
9
0
20 Jan 2025
Attention-driven GUI Grounding: Leveraging Pretrained Multimodal Large
  Language Models without Fine-Tuning
Attention-driven GUI Grounding: Leveraging Pretrained Multimodal Large Language Models without Fine-Tuning
Hai-Ming Xu
Qi Chen
Lei Wang
Lingqiao Liu
62
1
0
14 Dec 2024
o1-Coder: an o1 Replication for Coding
Yuxiang Zhang
Shangxi Wu
Yuqi Yang
Jiangming Shu
Jinlin Xiao
Chao Kong
Jitao Sang
LRM
59
3
0
29 Nov 2024
SPAgent: Adaptive Task Decomposition and Model Selection for General
  Video Generation and Editing
SPAgent: Adaptive Task Decomposition and Model Selection for General Video Generation and Editing
Rong-Cheng Tu
Wenhao Sun
Zhao Jin
Jingyi Liao
Jiaxing Huang
Dacheng Tao
VGen
DiffM
92
3
0
28 Nov 2024
MAS-Attention: Memory-Aware Stream Processing for Attention Acceleration
  on Resource-Constrained Edge Devices
MAS-Attention: Memory-Aware Stream Processing for Attention Acceleration on Resource-Constrained Edge Devices
Mohammadali Shakerdargah
Shan Lu
Chao Gao
Di Niu
70
0
0
20 Nov 2024
Generalist Virtual Agents: A Survey on Autonomous Agents Across Digital Platforms
Minghe Gao
Wendong Bu
Bingchen Miao
Yang Wu
Yunfei Li
Juncheng Billy Li
Siliang Tang
Qi Wu
Yueting Zhuang
Meng Wang
LM&Ro
33
3
0
17 Nov 2024
GUI Agents with Foundation Models: A Comprehensive Survey
GUI Agents with Foundation Models: A Comprehensive Survey
Shuai Wang
W. Liu
Jingxuan Chen
Weinan Gan
Xingshan Zeng
...
Bin Wang
Chuhan Wu
Yasheng Wang
Ruiming Tang
Jianye Hao
LLMAG
61
12
0
07 Nov 2024
Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset
Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset
Yingzi Ma
Jiongxiao Wang
Fei-Yue Wang
Siyuan Ma
Jiazhao Li
...
B. Li
Yejin Choi
M. Chen
Chaowei Xiao
Chaowei Xiao
MU
45
6
0
05 Nov 2024
Foundations and Recent Trends in Multimodal Mobile Agents: A Survey
Foundations and Recent Trends in Multimodal Mobile Agents: A Survey
Biao Wu
Yanda Li
Meng Fang
Zirui Song
Zhiwei Zhang
Yunchao Wei
L. Chen
LM&Ro
LLMAG
OffRL
AI4TS
39
3
0
04 Nov 2024
EDGE: Enhanced Grounded GUI Understanding with Enriched
  Multi-Granularity Synthetic Data
EDGE: Enhanced Grounded GUI Understanding with Enriched Multi-Granularity Synthetic Data
Xuetian Chen
Hangcheng Li
Jiaqing Liang
Sihang Jiang
Deqing Yang
LLMAG
46
2
0
25 Oct 2024
OSCAR: Operating System Control via State-Aware Reasoning and
  Re-Planning
OSCAR: Operating System Control via State-Aware Reasoning and Re-Planning
Xiaoqiang Wang
Bang Liu
LLMAG
LM&Ro
LRM
31
6
0
24 Oct 2024
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms
Zhangheng Li
Keen You
H. Zhang
Di Feng
Harsh Agrawal
Xiujun Li
Mohana Prasad Sathya Moorthy
Jeff Nichols
Y. Yang
Zhe Gan
MLLM
42
18
0
24 Oct 2024
Lightweight Neural App Control
Lightweight Neural App Control
Filippos Christianos
Georgios Papoudakis
Thomas Coste
Jianye Hao
Jun Wang
Kun Shao
LM&Ro
44
4
0
23 Oct 2024
SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation
SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation
Jingxuan Chen
Derek Yuen
Bin Xie
Y. Yang
Gongwei Chen
...
Liqiang Nie
Yasheng Wang
Jianye Hao
Jun Wang
Kun Shao
LLMAG
31
5
0
19 Oct 2024
AppBench: Planning of Multiple APIs from Various APPs for Complex User
  Instruction
AppBench: Planning of Multiple APIs from Various APPs for Complex User Instruction
Hongru Wang
Rui Wang
Boyang Xue
Heming Xia
Jingtao Cao
Zeming Liu
Jeff Z. Pan
Kam-Fai Wong
ALM
25
8
0
10 Oct 2024
TinyClick: Single-Turn Agent for Empowering GUI Automation
TinyClick: Single-Turn Agent for Empowering GUI Automation
Pawel Pawlowski
Krystian Zawistowski
Wojciech Lapacz
Marcin Skorupa
Adam Wiacek
Sebastien Postansque
Jakub Hoscilowicz
MLLM
LLMAG
LRM
35
6
0
09 Oct 2024
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
Boyu Gou
Ruohan Wang
Boyuan Zheng
Yanan Xie
Cheng Chang
Yiheng Shu
Huan Sun
Yu Su
LM&Ro
LLMAG
63
48
0
07 Oct 2024
AssistantX: An LLM-Powered Proactive Assistant in Collaborative
  Human-Populated Environment
AssistantX: An LLM-Powered Proactive Assistant in Collaborative Human-Populated Environment
Nan Sun
Bo Mao
Yongchang Li
Lumeng Ma
Di Guo
Huaping Liu
21
1
0
26 Sep 2024
Turn Every Application into an Agent: Towards Efficient
  Human-Agent-Computer Interaction with API-First LLM-Based Agents
Turn Every Application into an Agent: Towards Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents
Junting Lu
Zhiyang Zhang
Fangkai Yang
Jue Zhang
Lu Wang
Chao Du
Qingwei Lin
Saravan Rajmohan
Dongmei Zhang
Qi Zhang
LLMAG
26
1
0
25 Sep 2024
A Survey on Multimodal Benchmarks: In the Era of Large AI Models
A Survey on Multimodal Benchmarks: In the Era of Large AI Models
Lin Li
Guikun Chen
Hanrong Shi
Jun Xiao
Long Chen
34
9
0
21 Sep 2024
Blocks as Probes: Dissecting Categorization Ability of Large Multimodal
  Models
Blocks as Probes: Dissecting Categorization Ability of Large Multimodal Models
Bin Fu
Qiyang Wan
Jialin Li
Ruiping Wang
Xilin Chen
34
0
0
03 Sep 2024
A Survey on Evaluation of Multimodal Large Language Models
A Survey on Evaluation of Multimodal Large Language Models
Jiaxing Huang
Jingyi Zhang
LM&MA
ELM
LRM
43
20
0
28 Aug 2024
Anim-Director: A Large Multimodal Model Powered Agent for Controllable
  Animation Video Generation
Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation
Yunxin Li
Haoyuan Shi
Baotian Hu
Longyue Wang
Jiashun Zhu
Jinyi Xu
Zhen Zhao
Min Zhang
VGen
34
5
0
19 Aug 2024
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents
Pranav Putta
Edmund Mills
Naman Garg
S. Motwani
Chelsea Finn
Divyansh Garg
Rafael Rafailov
LLMAG
LRM
23
19
0
13 Aug 2024
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation
  Agents
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
Xiao-Yang Liu
Tianjie Zhang
Yu Gu
Iat Long Iong
Yifan Xu
...
Zhengxiao Du
Chan Hee Song
Yu Su
Yuxiao Dong
Jie Tang
VLM
LLMAG
31
22
0
12 Aug 2024
AppAgent v2: Advanced Agent for Flexible Mobile Interactions
AppAgent v2: Advanced Agent for Flexible Mobile Interactions
Yanda Li
Chi Zhang
Wanqi Yang
Bin-Bin Fu
Pei Cheng
Xin Chen
Ling Chen
Yunchao Wei
LLMAG
LM&Ro
25
9
0
05 Aug 2024
The Emerged Security and Privacy of LLM Agent: A Survey with Case
  Studies
The Emerged Security and Privacy of LLM Agent: A Survey with Case Studies
Feng He
Tianqing Zhu
Dayong Ye
Bo Liu
Wanlei Zhou
Philip S. Yu
PILM
LLMAG
ELM
68
22
0
28 Jul 2024
Spider2-V: How Far Are Multimodal Agents From Automating Data Science
  and Engineering Workflows?
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
Ruisheng Cao
Fangyu Lei
Haoyuan Wu
Jixuan Chen
Yeqiao Fu
...
Qian Liu
Victor Zhong
Lu Chen
Kai Yu
Tao Yu
25
2
0
15 Jul 2024
12
Next