v1v2 (latest)

Simple but Effective: CLIP Embeddings for Embodied AI

18 November 2021

ArXiv (abs)PDF HTML Github (126★)

Papers citing "Simple but Effective: CLIP Embeddings for Embodied AI"

50 / 190 papers shown

Title
Human-Centric Open-Future Task Discovery: Formulation, Benchmark, and Scalable Tree-Based Search Zijian Song Xiaoxin Lin Tao Pu Zhenlong Yuan Guangrun Wang Liang Lin 105 0 0 24 Nov 2025
AVERY: Adaptive VLM Split Computing through Embodied Self-Awareness for Efficient Disaster Response Systems Rajat Bhattacharjya Sing-Yao Wu Hyunwoo Oh Chaewon Nam Suyeon Koo Mohsen Imani Elaheh Bozorgzadeh N. Dutt VLM 70 0 0 22 Nov 2025
A-TPT: Angular Diversity Calibration Properties for Test-Time Prompt Tuning of Vision-Language Models Shihab Aaqil Ahamed Udaya S.K.P. Miriya Thanthrige Ranga Rodrigo Muhammad Haris Khan VLM 146 0 0 30 Oct 2025
C-NAV: Towards Self-Evolving Continual Object Navigation in Open World Ming-Ming Yu Fei Zhu Wenzhuo Liu Y. Yang Qunbo Wang Wenjun Wu Jing Liu 130 1 0 23 Oct 2025
Exploring Conditions for Diffusion models in Robotic Control Heeseong Shin Byeongho Heo Dongyoon Han Seungryong Kim Taekyung Kim 140 0 0 17 Oct 2025
What Matters in RL-Based Methods for Object-Goal Navigation? An Empirical Study and A Unified Framework Hongze Wang Boyang Sun Jiaxu Xing Fan Yang Marco Hutter Dhruv Shah Davide Scaramuzza Marc Pollefeys 48 0 0 02 Oct 2025
LAGEA: Language Guided Embodied Agents for Robotic Manipulation Abdul Monaf Chowdhury Akm Moshiur Rahman Mazumder Rabeya Akter S. Arib LM&Ro 80 0 0 27 Sep 2025
Revealing Multimodal Causality with Large Language Models Jin Li Shoujin Wang Qi Zhang Feng Liu Tongliang Liu LongBing Cao Shui Yu F. Chen 116 0 0 22 Sep 2025
Agentic Aerial Cinematography: From Dialogue Cues to Cinematic Trajectories Yifan Lin Sophie Ziyu Liu Ran Qi George Z. Xue Xinping Song Chao Qin Hugh H. T. Liu VGen 93 0 0 19 Sep 2025
Object Detection with Multimodal Large Vision-Language Models: An In-depth ReviewInformation Fusion (Inf. Fusion), 2025 Ranjan Sapkota Manoj Karkee ObjD VLM 239 9 0 25 Aug 2025
Imaginative World Modeling with Scene Graphs for Embodied Agent Navigation Yue Hu Junzhe Wu Ruihan Xu Hang Liu Avery Xi Henry X. Liu Ram Vasudevan Maani Ghaffari LM&Ro 92 2 0 09 Aug 2025
MAG-Nav: Language-Driven Object Navigation Leveraging Memory-Reserved Active Grounding Weifan Zhang Tingguang Li Yuzhen Liu LM&Ro 64 1 0 07 Aug 2025
X-NeMo: Expressive Neural Motion Reenactment via Disentangled Latent AttentionInternational Conference on Learning Representations (ICLR), 2025 Xiaochen Zhao Hongyi Xu Guoxian Song You Xie Chenxu Zhang Xiu Li Linjie Luo J. Suo Yebin Liu VGen 128 10 0 30 Jul 2025
Efficient and Generalizable Environmental Understanding for Visual Navigation Ruoyu Wang Xinshu Li Chen Wang Lina Yao CML 184 0 0 18 Jun 2025
UAD: Unsupervised Affordance Distillation for Generalization in Robotic ManipulationIEEE International Conference on Robotics and Automation (ICRA), 2025 Yihe Tang Wenlong Huang Yingke Wang Chengshu Li Roy Yuan Ruohan Zhang Jiajun Wu Li Fei-Fei 212 0 0 10 Jun 2025
MapBERT: Bitwise Masked Modeling for Real-Time Semantic Mapping Generation Yijie Deng Shuaihang Yuan Congcong Wen Niraj Pudasaini Anthony Tzes Geeta Chandra Raju Bethala Yi Fang 111 0 0 09 Jun 2025
RATE-Nav: Region-Aware Termination Enhancement for Zero-shot Object Navigation with Vision-Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 Junjie Li Nan Zhang Xiaoyang Qu Kai Lu Guokuan Li Jiguang Wan Jianzong Wang 213 1 0 03 Jun 2025
DORAEMON: Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation Tianjun Gu Linfeng Li Xuhong Wang Chenghua Gong Jingyu Gong Zhizhong Zhang Yuan Xie Lizhuang Ma Xin Tan LM&Ro 392 0 0 28 May 2025
SD-OVON: A Semantics-aware Dataset and Benchmark Generation Pipeline for Open-Vocabulary Object Navigation in Dynamic Scenes Dicong Qiu Jiadi You Zeying Gong Ronghe Qiu Hui Xiong Junwei Liang 136 0 0 24 May 2025
Building spatial world models from sparse transitional episodic memories Zizhan He Maxime Daigle Pouya Bashivan KELM 156 0 0 19 May 2025
A Survey of Robotic Navigation and Manipulation with Physics Simulators in the Era of Embodied AI Lik Hang Kenny Wong Xueyang Kang Kaixin Bai Jianwei Zhang 298 9 0 01 May 2025
Multimodal Perception for Goal-oriented Navigation: A Survey I-Tak Ieong Hao Tang LM&Ro LRM 261 0 0 22 Apr 2025
CL-CoTNav: Closed-Loop Hierarchical Chain-of-Thought for Zero-Shot Object-Goal Navigation with Vision-Language Models Yuxin Cai Xiangkun He Maonan Wang Hongliang Guo W. Yau Chen Lv LM&Ro LRM 287 6 0 11 Apr 2025
FLAM: Foundation Model-Based Body Stabilization for Humanoid Locomotion and Manipulation Xianqi Zhang Hongliang Wei Wenrui Wang Xingtao Wang Xiaopeng Fan Debin Zhao 179 1 0 28 Mar 2025
Classifier-guided CLIP Distillation for Unsupervised Multi-label ClassificationComputer Vision and Pattern Recognition (CVPR), 2025 Dongseob Kim Hyunjung Shim VLM 271 0 0 21 Mar 2025
Open-World Skill Discovery from Unsegmented Demonstrations Jingwen Deng Zihao Wang Shaofei Cai Hoang Trung-Dung Yitao Liang 167 3 0 11 Mar 2025
WMNav: Integrating Vision-Language Models into World Models for Object Goal Navigation Dujun Nie Xianda Guo Yiqun Duan Ruijun Zhang Long Chen LM&Ro 581 18 0 04 Mar 2025
CuriousBot: Interactive Mobile Exploration via Actionable 3D Relational Object Graph Yixuan Wang Leonor Fermoselle Tarik Kelestemur Jiuguang Wang Yunzhu Li 189 4 0 23 Jan 2025
Visual Semantic Navigation with Real Robots Carlos Gutiérrez-Álvarez Pablo Ríos-Navarro Rafael Flor-Rodríguez Francisco Javier Acevedo-Rodríguez Roberto J. López-Sastre 354 4 0 10 Jan 2025
Efficient Policy Adaptation with Contrastive Prompt Ensemble for Embodied AgentsNeural Information Processing Systems (NeurIPS), 2024 Wonje Choi Woo Kyung Kim SeungHyun Kim Honguk Woo 271 12 0 16 Dec 2024
Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation Yueru Jia Jiaming Liu Sixiang Chen Chenyang Gu Zihan Wang ... Lily Lee Pengwei Wang Zhongyuan Wang Renrui Zhang Shanghang Zhang 325 38 0 27 Nov 2024
Teaching Embodied Reinforcement Learning Agents: Informativeness and Diversity of Language UseConference on Empirical Methods in Natural Language Processing (EMNLP), 2024 Jiajun Xi Yinong He Jianing Yang Yinpei Dai Joyce Chai LM&Ro 257 9 0 31 Oct 2024
Reliable Semantic Understanding for Real World Zero-shot Object Goal NavigationInternational Conference on Pattern Recognition (ICPR), 2024 Halil Utku Unlu Shuaihang Yuan Congcong Wen Niraj Pudasaini Anthony Tzes Yi Fang 142 1 0 29 Oct 2024
Zero-shot Object Navigation with Vision-Language Models ReasoningInternational Conference on Pattern Recognition (ICPR), 2024 Congcong Wen Yisiyuan Huang Niraj Pudasaini Yanjia Huang Shuaihang Yuan Yu Hao Hui Lin Yu-Shen Liu Yi Fang LM&Ro 188 20 0 24 Oct 2024
ImagineNav: Prompting Vision-Language Models as Embodied Navigator through Scene ImaginationInternational Conference on Learning Representations (ICLR), 2024 Xinxin Zhao Wenzhe Cai Likun Tang Teng Wang LM&Ro 182 19 0 13 Oct 2024
SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object NavigationNeural Information Processing Systems (NeurIPS), 2024 Hang Yin Xiuwei Xu Zhenyu Wu Jie Zhou Jiwen Lu 191 64 0 10 Oct 2024
Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and TrainingInternational Journal of Computer Vision (IJCV), 2024 Sara Sarto Nicholas Moratelli Marcella Cornia Lorenzo Baraldi Rita Cucchiara 207 8 0 09 Oct 2024
PREDICT: Preference Reasoning by Evaluating Decomposed preferences Inferred from Candidate Trajectories Stephane Aroca-Ouellette Natalie Mackraz B. Theobald Katherine Metcalf 137 0 0 08 Oct 2024
The Wallpaper is Ugly: Indoor Localization using Vision and LanguageIEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), 2023 Seth Pate Lawson L. S. Wong 163 4 0 04 Oct 2024
ReLIC: A Recipe for 64k Steps of In-Context Reinforcement Learning for Embodied AI Ahmad Elawady Gunjan Chhablani Ram Ramrakhya Karmesh Yadav Dhruv Batra Z. Kira Andrew Szot OffRL 268 2 0 03 Oct 2024
DivScene: Towards Open-Vocabulary Object Navigation with Large Vision Language Models in Diverse Scenes Zhaowei Wang Hongming Zhang Tianqing Fang Ye Tian Yue Yang Kaixin Ma Xiaoman Pan Yangqiu Song Dong Yu LM&Ro 327 4 0 03 Oct 2024
Robo-MUTUAL: Robotic Multimodal Task Specification via Unimodal LearningIEEE International Conference on Robotics and Automation (ICRA), 2024 Jianxiong Li Zhihao Wang Jinliang Zheng Xiaoai Zhou Guanming Wang ... Yu Liu Jingjing Liu Ya-Qin Zhang Junzhi Yu Xianyuan Zhan 187 4 0 02 Oct 2024
Feature Extractor or Decision Maker: Rethinking the Role of Visual Encoders in Visuomotor PoliciesIEEE International Conference on Robotics and Automation (ICRA), 2024 Ruiyu Wang Zheyu Zhuang Shutong Jin Nils Ingelhag Danica Kragic Florian T. Pokorny 282 0 0 30 Sep 2024
FLaRe: Achieving Masterful and Adaptive Robot Policies with Large-Scale Reinforcement Learning Fine-TuningIEEE International Conference on Robotics and Automation (ICRA), 2024 Jiaheng Hu Rose Hendrix Ali Farhadi Aniruddha Kembhavi Roberto Martín-Martín Peter Stone Kuo-Hao Zeng Kiana Ehsani 274 38 0 25 Sep 2024
HM3D-OVON: A Dataset and Benchmark for Open-Vocabulary Object Goal NavigationIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2024 Naoki Yokoyama Ram Ramrakhya Abhishek Das Dhruv Batra Sehoon Ha 183 38 0 22 Sep 2024
Automatic Scene Generation: State-of-the-Art Techniques, Models, Datasets, Challenges, and Future ProspectsIEEE Access (IEEE Access), 2024 Awal Ahmed Fime Saifuddin Mahmud Arpita Das Md. Sunzidul Islam Hong-Hoon Kim VGen 3DV 183 2 0 14 Sep 2024
SOOD-ImageNet: a Large-Scale Dataset for Semantic Out-Of-Distribution Image Classification and Semantic Segmentation Alberto Bacchin Davide Allegro Stefano Ghidoni Emanuele Menegatti 170 1 0 02 Sep 2024
VLPG-Nav: Object Navigation Using Visual Language Pose Graph and Object Localization Probability MapsIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2024 Senthil Hariharan Arul Dhruva Kumar Vivek Sugirtharaj Richard Kim Xuewei Qi R. Madhivanan Arnie Sen Dinesh Manocha 55 2 0 15 Aug 2024
Visual Grounding for Object-Level Generalization in Reinforcement LearningEuropean Conference on Computer Vision (ECCV), 2024 Haobin Jiang Zongqing Lu LM&Ro 169 3 0 04 Aug 2024
NOLO: Navigate Only Look Once Mengyu Bu Shuhao Gu Yang Feng EgoV 283 1 0 02 Aug 2024

All Papers

Simple but Effective: CLIP Embeddings for Embodied AI

Papers citing "Simple but Effective: CLIP Embeddings for Embodied AI"