ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2312.07472
  4. Cited By
MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active
  Perception

MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception

12 December 2023
Yiran Qin
Enshen Zhou
Qichang Liu
Zhen-fei Yin
Lu Sheng
Ruimao Zhang
Yu Qiao
Jing Shao
    LM&Ro
ArXivPDFHTML

Papers citing "MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception"

31 / 31 papers shown
Title
Embodied Intelligence: The Key to Unblocking Generalized Artificial Intelligence
Embodied Intelligence: The Key to Unblocking Generalized Artificial Intelligence
Jinhao Jiang
Changlin Chen
Shile Feng
Wanru Geng
Zesheng Zhou
Ni Wang
Shuai Li
Feng-Qi Cui
Erbao Dong
AI4CE
21
0
0
11 May 2025
Generative AI in Embodied Systems: System-Level Analysis of Performance, Efficiency and Scalability
Generative AI in Embodied Systems: System-Level Analysis of Performance, Efficiency and Scalability
Zishen Wan
Jiayi Qian
Yuhang Du
Jason J. Jabbour
Yilun Du
Yang Katie Zhao
A. Raychowdhury
Tushar Krishna
Vijay Janapa Reddi
LM&Ro
86
0
0
26 Apr 2025
MultiMind: Enhancing Werewolf Agents with Multimodal Reasoning and Theory of Mind
MultiMind: Enhancing Werewolf Agents with Multimodal Reasoning and Theory of Mind
Z. Zhang
Nuoqian Xiao
Qi Chai
Deheng Ye
Hao Wang
LLMAG
LRM
95
0
0
25 Apr 2025
WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents
WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents
Siyu Zhou
Tianyi Zhou
Yijun Yang
Guodong Long
Deheng Ye
Jing Jiang
Chengqi Zhang
LM&Ro
27
0
0
22 Apr 2025
Manipulating Multimodal Agents via Cross-Modal Prompt Injection
Manipulating Multimodal Agents via Cross-Modal Prompt Injection
Le Wang
Zonghao Ying
Tianyuan Zhang
Siyuan Liang
Shengshan Hu
Mingchuan Zhang
A. Liu
Xianglong Liu
AAML
31
1
0
19 Apr 2025
Position: Interactive Generative Video as Next-Generation Game Engine
Position: Interactive Generative Video as Next-Generation Game Engine
Jiwen Yu
Yiran Qin
Haoxuan Che
Quande Liu
Xintao Wang
Pengfei Wan
Di Zhang
Xihui Liu
VGen
45
1
0
21 Mar 2025
How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game
How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game
Z. Wang
Yurui Dong
Fuwen Luo
Minyuan Ruan
Zhili Cheng
C. L. P. Chen
Peng Li
Yang Liu
LRM
79
0
0
13 Mar 2025
Uncertainty in Action: Confidence Elicitation in Embodied Agents
Tianjiao Yu
Vedant Shah
Muntasir Wahed
Kiet A. Nguyen
Adheesh Sunil Juvekar
Tal August
Ismini Lourentzou
40
0
0
13 Mar 2025
Open-World Skill Discovery from Unsegmented Demonstrations
Jingwen Deng
Zihao Wang
Shaofei Cai
Anji Liu
Yitao Liang
41
1
0
11 Mar 2025
Generative Multi-Agent Collaboration in Embodied AI: A Systematic Review
Generative Multi-Agent Collaboration in Embodied AI: A Systematic Review
Di Wu
Xian Wei
Guang Chen
Hao Shen
Xiangfeng Wang
Wenhao Li
Bo Jin
47
2
0
17 Feb 2025
Visual Large Language Models for Generalized and Specialized Applications
Yifan Li
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
VLM
86
11
0
06 Jan 2025
STEVE-Audio: Expanding the Goal Conditioning Modalities of Embodied
  Agents in Minecraft
STEVE-Audio: Expanding the Goal Conditioning Modalities of Embodied Agents in Minecraft
Nicholas Lenzen
Amogh Raut
Andrew Melnik
VGen
66
0
0
01 Dec 2024
WorldSimBench: Towards Video Generation Models as World Simulators
WorldSimBench: Towards Video Generation Models as World Simulators
Yiran Qin
Zhelun Shi
Jiwen Yu
Xijun Wang
Enshen Zhou
...
Lu Sheng
Jing Shao
Lei Bai
Wanli Ouyang
Ruimao Zhang
EGVM
VGen
122
374
0
23 Oct 2024
Story3D-Agent: Exploring 3D Storytelling Visualization with Large
  Language Models
Story3D-Agent: Exploring 3D Storytelling Visualization with Large Language Models
Yuzhou Huang
Yiran Qin
Shunlin Lu
Xintao Wang
Rui Huang
Ying Shan
Ruimao Zhang
VGen
32
1
0
21 Aug 2024
Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in
  Long-Horizon Tasks
Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks
Zaijing Li
Yuquan Xie
Rui Shao
Gongwei Chen
Dongmei Jiang
Liqiang Nie
49
18
0
07 Aug 2024
AppAgent v2: Advanced Agent for Flexible Mobile Interactions
AppAgent v2: Advanced Agent for Flexible Mobile Interactions
Yanda Li
Chi Zhang
Wanqi Yang
Bin-Bin Fu
Pei Cheng
Xin Chen
Ling Chen
Yunchao Wei
LLMAG
LM&Ro
31
9
0
05 Aug 2024
MMedAgent: Learning to Use Medical Tools with Multi-modal Agent
MMedAgent: Learning to Use Medical Tools with Multi-modal Agent
Binxu Li
Tiankai Yan
Yuanting Pan
Zhe Xu
Jie Luo
Ruiyang Ji
Shilong Liu
Haoyu Dong
Zihao Lin
Yixin Wang
LM&MA
36
24
0
02 Jul 2024
OmniJARVIS: Unified Vision-Language-Action Tokenization Enables
  Open-World Instruction Following Agents
OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents
Zihao Wang
Shaofei Cai
Zhancun Mu
Haowei Lin
Ceyao Zhang
Xuejie Liu
Qing Li
Anji Liu
Xiaojian Ma
Yitao Liang
LM&Ro
30
11
0
27 Jun 2024
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
Yongting Zhang
Lu Chen
Guodong Zheng
Yifeng Gao
Rui Zheng
...
Yu Qiao
Xuanjing Huang
Feng Zhao
Tao Gui
Jing Shao
VLM
75
23
0
17 Jun 2024
AD-H: Autonomous Driving with Hierarchical Agents
AD-H: Autonomous Driving with Hierarchical Agents
Zaibin Zhang
Shiyu Tang
Yuanhang Zhang
Talas Fu
Yifan Wang
Yang Liu
Dong Wang
Jing Shao
Lijun Wang
H. Lu
42
3
0
05 Jun 2024
MineLand: Simulating Large-Scale Multi-Agent Interactions with Limited
  Multimodal Senses and Physical Needs
MineLand: Simulating Large-Scale Multi-Agent Interactions with Limited Multimodal Senses and Physical Needs
Xianhao Yu
Jiaqi Fu
Renjia Deng
Wenjuan Han
26
5
0
28 Mar 2024
RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents
RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents
Zeren Chen
Zhelun Shi
Xiaoya Lu
Lehan He
Sucheng Qian
...
Zhen-fei Yin
Jing Shao
Jing Shao
Cewu Lu
Cewu Lu
33
5
0
28 Mar 2024
AGFSync: Leveraging AI-Generated Feedback for Preference Optimization in
  Text-to-Image Generation
AGFSync: Leveraging AI-Generated Feedback for Preference Optimization in Text-to-Image Generation
Jingkun An
Yinghao Zhu
Zongjian Li
Haoran Feng
Bohua Chen
Yemin Shi
Chengwei Pan
24
2
0
20 Mar 2024
MineDreamer: Learning to Follow Instructions via Chain-of-Imagination
  for Simulated-World Control
MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control
Enshen Zhou
Yiran Qin
Zhen-fei Yin
Yuzhou Huang
Ruimao Zhang
Lu Sheng
Yu Qiao
Jing Shao
LM&Ro
AI4CE
37
32
0
18 Mar 2024
Exploring the Potential of Large Language Models for Improving Digital Forensic Investigation Efficiency
Exploring the Potential of Large Language Models for Improving Digital Forensic Investigation Efficiency
Akila Wickramasekara
F. Breitinger
Mark Scanlon
42
7
0
29 Feb 2024
Large Multimodal Agents: A Survey
Large Multimodal Agents: A Survey
Junlin Xie
Zhihong Chen
Ruifei Zhang
Xiang Wan
Guanbin Li
LM&Ro
LLMAG
37
38
0
23 Feb 2024
PsySafe: A Comprehensive Framework for Psychological-based Attack,
  Defense, and Evaluation of Multi-agent System Safety
PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety
Zaibin Zhang
Yongting Zhang
Lijun Li
Hongzhi Gao
Lijun Wang
Huchuan Lu
Feng Zhao
Yu Qiao
Jing Shao
LLMAG
12
29
0
22 Jan 2024
SmartEdit: Exploring Complex Instruction-based Image Editing with
  Multimodal Large Language Models
SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models
Yuzhou Huang
Liangbin Xie
Xintao Wang
Ziyang Yuan
Xiaodong Cun
...
Jiantao Zhou
Chao Dong
Rui Huang
Ruimao Zhang
Ying Shan
DiffM
18
58
0
11 Dec 2023
mPLUG-Owl: Modularization Empowers Large Language Models with
  Multimodality
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Qinghao Ye
Haiyang Xu
Guohai Xu
Jiabo Ye
Ming Yan
...
Junfeng Tian
Qiang Qi
Ji Zhang
Feiyan Huang
Jingren Zhou
VLM
MLLM
206
899
0
27 Apr 2023
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,881
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,402
0
28 Jan 2022
1