ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.09888
  4. Cited By
Simple but Effective: CLIP Embeddings for Embodied AI
v1v2 (latest)

Simple but Effective: CLIP Embeddings for Embodied AI

18 November 2021
Apoorv Khandelwal
Luca Weihs
Roozbeh Mottaghi
Aniruddha Kembhavi
    VLMLM&Ro
ArXiv (abs)PDFHTMLGithub (126★)

Papers citing "Simple but Effective: CLIP Embeddings for Embodied AI"

50 / 190 papers shown
Human-Centric Open-Future Task Discovery: Formulation, Benchmark, and Scalable Tree-Based Search
Human-Centric Open-Future Task Discovery: Formulation, Benchmark, and Scalable Tree-Based Search
Zijian Song
Xiaoxin Lin
Tao Pu
Zhenlong Yuan
Guangrun Wang
Liang Lin
216
0
0
24 Nov 2025
AVERY: Adaptive VLM Split Computing through Embodied Self-Awareness for Efficient Disaster Response Systems
AVERY: Adaptive VLM Split Computing through Embodied Self-Awareness for Efficient Disaster Response Systems
Rajat Bhattacharjya
Sing-Yao Wu
Hyunwoo Oh
Chaewon Nam
Suyeon Koo
Mohsen Imani
Elaheh Bozorgzadeh
N. Dutt
VLM
106
1
0
22 Nov 2025
A-TPT: Angular Diversity Calibration Properties for Test-Time Prompt Tuning of Vision-Language Models
A-TPT: Angular Diversity Calibration Properties for Test-Time Prompt Tuning of Vision-Language Models
Shihab Aaqil Ahamed
Udaya S.K.P. Miriya Thanthrige
Ranga Rodrigo
Muhammad Haris Khan
VLM
198
0
0
30 Oct 2025
C-NAV: Towards Self-Evolving Continual Object Navigation in Open World
C-NAV: Towards Self-Evolving Continual Object Navigation in Open World
Ming-Ming Yu
Fei Zhu
Wenzhuo Liu
Y. Yang
Qunbo Wang
Wenjun Wu
Jing Liu
226
1
0
23 Oct 2025
Exploring Conditions for Diffusion models in Robotic Control
Exploring Conditions for Diffusion models in Robotic Control
Heeseong Shin
Byeongho Heo
Dongyoon Han
Seungryong Kim
Taekyung Kim
200
0
0
17 Oct 2025
What Matters in RL-Based Methods for Object-Goal Navigation? An Empirical Study and A Unified Framework
What Matters in RL-Based Methods for Object-Goal Navigation? An Empirical Study and A Unified Framework
Hongze Wang
Boyang Sun
Jiaxu Xing
Fan Yang
Marco Hutter
Dhruv Shah
Davide Scaramuzza
Marc Pollefeys
100
0
0
02 Oct 2025
LAGEA: Language Guided Embodied Agents for Robotic Manipulation
LAGEA: Language Guided Embodied Agents for Robotic Manipulation
Abdul Monaf Chowdhury
Akm Moshiur Rahman Mazumder
Rabeya Akter
S. Arib
LM&Ro
109
0
0
27 Sep 2025
Revealing Multimodal Causality with Large Language Models
Revealing Multimodal Causality with Large Language Models
Jin Li
Shoujin Wang
Qi Zhang
Feng Liu
Tongliang Liu
LongBing Cao
Shui Yu
F. Chen
184
0
0
22 Sep 2025
Agentic Aerial Cinematography: From Dialogue Cues to Cinematic Trajectories
Agentic Aerial Cinematography: From Dialogue Cues to Cinematic Trajectories
Yifan Lin
Sophie Ziyu Liu
Ran Qi
George Z. Xue
Xinping Song
Chao Qin
Hugh H. T. Liu
VGen
141
0
0
19 Sep 2025
Object Detection with Multimodal Large Vision-Language Models: An In-depth Review
Object Detection with Multimodal Large Vision-Language Models: An In-depth ReviewInformation Fusion (Inf. Fusion), 2025
Ranjan Sapkota
Manoj Karkee
ObjDVLM
290
15
0
25 Aug 2025
Imaginative World Modeling with Scene Graphs for Embodied Agent Navigation
Imaginative World Modeling with Scene Graphs for Embodied Agent Navigation
Yue Hu
Junzhe Wu
Ruihan Xu
Hang Liu
Avery Xi
Henry X. Liu
Ram Vasudevan
Maani Ghaffari
LM&Ro
132
2
0
09 Aug 2025
MAG-Nav: Language-Driven Object Navigation Leveraging Memory-Reserved Active Grounding
MAG-Nav: Language-Driven Object Navigation Leveraging Memory-Reserved Active Grounding
Weifan Zhang
Tingguang Li
Yuzhen Liu
LM&Ro
104
1
0
07 Aug 2025
X-NeMo: Expressive Neural Motion Reenactment via Disentangled Latent Attention
X-NeMo: Expressive Neural Motion Reenactment via Disentangled Latent AttentionInternational Conference on Learning Representations (ICLR), 2025
Xiaochen Zhao
Hongyi Xu
Guoxian Song
You Xie
Chenxu Zhang
Xiu Li
Linjie Luo
J. Suo
Yebin Liu
VGen
168
17
0
30 Jul 2025
Efficient and Generalizable Environmental Understanding for Visual Navigation
Efficient and Generalizable Environmental Understanding for Visual Navigation
Ruoyu Wang
Xinshu Li
Chen Wang
Lina Yao
CML
238
0
0
18 Jun 2025
UAD: Unsupervised Affordance Distillation for Generalization in Robotic Manipulation
UAD: Unsupervised Affordance Distillation for Generalization in Robotic ManipulationIEEE International Conference on Robotics and Automation (ICRA), 2025
Yihe Tang
Wenlong Huang
Yingke Wang
Chengshu Li
Roy Yuan
Ruohan Zhang
Jiajun Wu
Li Fei-Fei
313
0
0
10 Jun 2025
MapBERT: Bitwise Masked Modeling for Real-Time Semantic Mapping Generation
MapBERT: Bitwise Masked Modeling for Real-Time Semantic Mapping Generation
Yijie Deng
Shuaihang Yuan
Congcong Wen
Niraj Pudasaini
Anthony Tzes
Geeta Chandra Raju Bethala
Yi Fang
135
0
0
09 Jun 2025
RATE-Nav: Region-Aware Termination Enhancement for Zero-shot Object Navigation with Vision-Language Models
RATE-Nav: Region-Aware Termination Enhancement for Zero-shot Object Navigation with Vision-Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Junjie Li
Nan Zhang
Xiaoyang Qu
Kai Lu
Guokuan Li
Jiguang Wan
Jianzong Wang
277
2
0
03 Jun 2025
DORAEMON: Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation
DORAEMON: Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation
Tianjun Gu
Linfeng Li
Xuhong Wang
Chenghua Gong
Jingyu Gong
Zhizhong Zhang
Yuan Xie
Lizhuang Ma
Xin Tan
LM&Ro
499
1
0
28 May 2025
SD-OVON: A Semantics-aware Dataset and Benchmark Generation Pipeline for Open-Vocabulary Object Navigation in Dynamic Scenes
SD-OVON: A Semantics-aware Dataset and Benchmark Generation Pipeline for Open-Vocabulary Object Navigation in Dynamic Scenes
Dicong Qiu
Jiadi You
Zeying Gong
Ronghe Qiu
Hui Xiong
Junwei Liang
176
0
0
24 May 2025
Building spatial world models from sparse transitional episodic memories
Building spatial world models from sparse transitional episodic memories
Zizhan He
Maxime Daigle
Pouya Bashivan
KELM
236
0
0
19 May 2025
A Survey of Robotic Navigation and Manipulation with Physics Simulators in the Era of Embodied AI
A Survey of Robotic Navigation and Manipulation with Physics Simulators in the Era of Embodied AI
Lik Hang Kenny Wong
Xueyang Kang
Kaixin Bai
Jianwei Zhang
392
11
0
01 May 2025
Multimodal Perception for Goal-oriented Navigation: A Survey
Multimodal Perception for Goal-oriented Navigation: A Survey
I-Tak Ieong
Hao Tang
LM&RoLRM
321
1
0
22 Apr 2025
CL-CoTNav: Closed-Loop Hierarchical Chain-of-Thought for Zero-Shot Object-Goal Navigation with Vision-Language Models
CL-CoTNav: Closed-Loop Hierarchical Chain-of-Thought for Zero-Shot Object-Goal Navigation with Vision-Language Models
Yuxin Cai
Xiangkun He
Maonan Wang
Hongliang Guo
W. Yau
Chen Lv
LM&RoLRM
367
6
0
11 Apr 2025
FLAM: Foundation Model-Based Body Stabilization for Humanoid Locomotion and Manipulation
FLAM: Foundation Model-Based Body Stabilization for Humanoid Locomotion and Manipulation
Xianqi Zhang
Hongliang Wei
Wenrui Wang
Xingtao Wang
Xiaopeng Fan
Debin Zhao
224
1
0
28 Mar 2025
Classifier-guided CLIP Distillation for Unsupervised Multi-label Classification
Classifier-guided CLIP Distillation for Unsupervised Multi-label ClassificationComputer Vision and Pattern Recognition (CVPR), 2025
Dongseob Kim
Hyunjung Shim
VLM
327
0
0
21 Mar 2025
Open-World Skill Discovery from Unsegmented Demonstrations
Open-World Skill Discovery from Unsegmented Demonstrations
Jingwen Deng
Zihao Wang
Shaofei Cai
Hoang Trung-Dung
Yitao Liang
232
3
0
11 Mar 2025
WMNav: Integrating Vision-Language Models into World Models for Object Goal Navigation
WMNav: Integrating Vision-Language Models into World Models for Object Goal Navigation
Dujun Nie
Xianda Guo
Yiqun Duan
Ruijun Zhang
Long Chen
LM&Ro
685
19
0
04 Mar 2025
CuriousBot: Interactive Mobile Exploration via Actionable 3D Relational Object Graph
CuriousBot: Interactive Mobile Exploration via Actionable 3D Relational Object Graph
Yixuan Wang
Leonor Fermoselle
Tarik Kelestemur
Jiuguang Wang
Yunzhu Li
245
5
0
23 Jan 2025
Visual Semantic Navigation with Real Robots
Visual Semantic Navigation with Real Robots
Carlos Gutiérrez-Álvarez
Pablo Ríos-Navarro
Rafael Flor-Rodríguez
Francisco Javier Acevedo-Rodríguez
Roberto J. López-Sastre
442
4
0
10 Jan 2025
Efficient Policy Adaptation with Contrastive Prompt Ensemble for
  Embodied Agents
Efficient Policy Adaptation with Contrastive Prompt Ensemble for Embodied AgentsNeural Information Processing Systems (NeurIPS), 2024
Wonje Choi
Woo Kyung Kim
SeungHyun Kim
Honguk Woo
319
12
0
16 Dec 2024
Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for
  Robust 3D Robotic Manipulation
Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation
Yueru Jia
Jiaming Liu
Sixiang Chen
Chenyang Gu
Zihan Wang
...
Lily Lee
Pengwei Wang
Zhongyuan Wang
Renrui Zhang
Shanghang Zhang
407
41
0
27 Nov 2024
Teaching Embodied Reinforcement Learning Agents: Informativeness and
  Diversity of Language Use
Teaching Embodied Reinforcement Learning Agents: Informativeness and Diversity of Language UseConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Jiajun Xi
Yinong He
Jianing Yang
Yinpei Dai
Joyce Chai
LM&Ro
293
9
0
31 Oct 2024
Reliable Semantic Understanding for Real World Zero-shot Object Goal
  Navigation
Reliable Semantic Understanding for Real World Zero-shot Object Goal NavigationInternational Conference on Pattern Recognition (ICPR), 2024
Halil Utku Unlu
Shuaihang Yuan
Congcong Wen
Niraj Pudasaini
Anthony Tzes
Yi Fang
178
1
0
29 Oct 2024
Zero-shot Object Navigation with Vision-Language Models Reasoning
Zero-shot Object Navigation with Vision-Language Models ReasoningInternational Conference on Pattern Recognition (ICPR), 2024
Congcong Wen
Yisiyuan Huang
Niraj Pudasaini
Yanjia Huang
Shuaihang Yuan
Yu Hao
Hui Lin
Yu-Shen Liu
Yi Fang
LM&Ro
256
21
0
24 Oct 2024
ImagineNav: Prompting Vision-Language Models as Embodied Navigator
  through Scene Imagination
ImagineNav: Prompting Vision-Language Models as Embodied Navigator through Scene ImaginationInternational Conference on Learning Representations (ICLR), 2024
Xinxin Zhao
Wenzhe Cai
Likun Tang
Teng Wang
LM&Ro
234
20
0
13 Oct 2024
SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object
  Navigation
SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object NavigationNeural Information Processing Systems (NeurIPS), 2024
Hang Yin
Xiuwei Xu
Zhenyu Wu
Jie Zhou
Jiwen Lu
227
70
0
10 Oct 2024
Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training
Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and TrainingInternational Journal of Computer Vision (IJCV), 2024
Sara Sarto
Nicholas Moratelli
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
287
11
0
09 Oct 2024
PREDICT: Preference Reasoning by Evaluating Decomposed preferences
  Inferred from Candidate Trajectories
PREDICT: Preference Reasoning by Evaluating Decomposed preferences Inferred from Candidate Trajectories
Stephane Aroca-Ouellette
Natalie Mackraz
B. Theobald
Katherine Metcalf
169
0
0
08 Oct 2024
The Wallpaper is Ugly: Indoor Localization using Vision and Language
The Wallpaper is Ugly: Indoor Localization using Vision and LanguageIEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), 2023
Seth Pate
Lawson L. S. Wong
215
4
0
04 Oct 2024
ReLIC: A Recipe for 64k Steps of In-Context Reinforcement Learning for
  Embodied AI
ReLIC: A Recipe for 64k Steps of In-Context Reinforcement Learning for Embodied AI
Ahmad Elawady
Gunjan Chhablani
Ram Ramrakhya
Karmesh Yadav
Dhruv Batra
Z. Kira
Andrew Szot
OffRL
337
2
0
03 Oct 2024
DivScene: Towards Open-Vocabulary Object Navigation with Large Vision Language Models in Diverse Scenes
DivScene: Towards Open-Vocabulary Object Navigation with Large Vision Language Models in Diverse Scenes
Zhaowei Wang
Hongming Zhang
Tianqing Fang
Ye Tian
Yue Yang
Kaixin Ma
Xiaoman Pan
Yangqiu Song
Dong Yu
LM&Ro
415
4
0
03 Oct 2024
Robo-MUTUAL: Robotic Multimodal Task Specification via Unimodal Learning
Robo-MUTUAL: Robotic Multimodal Task Specification via Unimodal LearningIEEE International Conference on Robotics and Automation (ICRA), 2024
Jianxiong Li
Zhihao Wang
Jinliang Zheng
Xiaoai Zhou
Guanming Wang
...
Yu Liu
Jingjing Liu
Ya-Qin Zhang
Junzhi Yu
Xianyuan Zhan
244
4
0
02 Oct 2024
Feature Extractor or Decision Maker: Rethinking the Role of Visual Encoders in Visuomotor Policies
Feature Extractor or Decision Maker: Rethinking the Role of Visual Encoders in Visuomotor PoliciesIEEE International Conference on Robotics and Automation (ICRA), 2024
Ruiyu Wang
Zheyu Zhuang
Shutong Jin
Nils Ingelhag
Danica Kragic
Florian T. Pokorny
366
0
0
30 Sep 2024
FLaRe: Achieving Masterful and Adaptive Robot Policies with Large-Scale
  Reinforcement Learning Fine-Tuning
FLaRe: Achieving Masterful and Adaptive Robot Policies with Large-Scale Reinforcement Learning Fine-TuningIEEE International Conference on Robotics and Automation (ICRA), 2024
Jiaheng Hu
Rose Hendrix
Ali Farhadi
Aniruddha Kembhavi
Roberto Martín-Martín
Peter Stone
Kuo-Hao Zeng
Kiana Ehsani
318
44
0
25 Sep 2024
HM3D-OVON: A Dataset and Benchmark for Open-Vocabulary Object Goal
  Navigation
HM3D-OVON: A Dataset and Benchmark for Open-Vocabulary Object Goal NavigationIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2024
Naoki Yokoyama
Ram Ramrakhya
Abhishek Das
Dhruv Batra
Sehoon Ha
243
42
0
22 Sep 2024
Automatic Scene Generation: State-of-the-Art Techniques, Models,
  Datasets, Challenges, and Future Prospects
Automatic Scene Generation: State-of-the-Art Techniques, Models, Datasets, Challenges, and Future ProspectsIEEE Access (IEEE Access), 2024
Awal Ahmed Fime
Saifuddin Mahmud
Arpita Das
Md. Sunzidul Islam
Hong-Hoon Kim
VGen3DV
271
2
0
14 Sep 2024
SOOD-ImageNet: a Large-Scale Dataset for Semantic Out-Of-Distribution
  Image Classification and Semantic Segmentation
SOOD-ImageNet: a Large-Scale Dataset for Semantic Out-Of-Distribution Image Classification and Semantic Segmentation
Alberto Bacchin
Davide Allegro
Stefano Ghidoni
Emanuele Menegatti
213
1
0
02 Sep 2024
VLPG-Nav: Object Navigation Using Visual Language Pose Graph and Object
  Localization Probability Maps
VLPG-Nav: Object Navigation Using Visual Language Pose Graph and Object Localization Probability MapsIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2024
Senthil Hariharan Arul
Dhruva Kumar
Vivek Sugirtharaj
Richard Kim
Xuewei
Qi
R. Madhivanan
Arnie Sen
Dinesh Manocha
85
2
0
15 Aug 2024
Visual Grounding for Object-Level Generalization in Reinforcement
  Learning
Visual Grounding for Object-Level Generalization in Reinforcement LearningEuropean Conference on Computer Vision (ECCV), 2024
Haobin Jiang
Zongqing Lu
LM&Ro
229
3
0
04 Aug 2024
NOLO: Navigate Only Look Once
NOLO: Navigate Only Look Once
Mengyu Bu
Shuhao Gu
Yang Feng
EgoV
322
2
0
02 Aug 2024
1234
Next