v1v2 (latest)

Simple but Effective: CLIP Embeddings for Embodied AI

18 November 2021

ArXiv (abs)PDF HTML Github (126★)

Papers citing "Simple but Effective: CLIP Embeddings for Embodied AI"

50 / 190 papers shown

Human-Centric Open-Future Task Discovery: Formulation, Benchmark, and Scalable Tree-Based Search

216

24 Nov 2025

AVERY: Adaptive VLM Split Computing through Embodied Self-Awareness for Efficient Disaster Response Systems

106

22 Nov 2025

A-TPT: Angular Diversity Calibration Properties for Test-Time Prompt Tuning of Vision-Language Models

Shihab Aaqil Ahamed

Udaya S.K.P. Miriya Thanthrige

Ranga Rodrigo

Muhammad Haris Khan

VLM

198

30 Oct 2025

C-NAV: Towards Self-Evolving Continual Object Navigation in Open World

226

23 Oct 2025

Exploring Conditions for Diffusion models in Robotic Control

200

17 Oct 2025

What Matters in RL-Based Methods for Object-Goal Navigation? An Empirical Study and A Unified Framework

100

02 Oct 2025

LAGEA: Language Guided Embodied Agents for Robotic Manipulation

Abdul Monaf Chowdhury

Akm Moshiur Rahman Mazumder

Rabeya Akter

S. Arib

LM&Ro

109

27 Sep 2025

Revealing Multimodal Causality with Large Language Models

184

22 Sep 2025

Agentic Aerial Cinematography: From Dialogue Cues to Cinematic Trajectories

141

19 Sep 2025

Object Detection with Multimodal Large Vision-Language Models: An In-depth ReviewInformation Fusion (Inf. Fusion), 2025

Ranjan Sapkota

Manoj Karkee

ObjD VLM

290

25 Aug 2025

Imaginative World Modeling with Scene Graphs for Embodied Agent Navigation

132

09 Aug 2025

MAG-Nav: Language-Driven Object Navigation Leveraging Memory-Reserved Active Grounding

104

07 Aug 2025

X-NeMo: Expressive Neural Motion Reenactment via Disentangled Latent AttentionInternational Conference on Learning Representations (ICLR), 2025

168

30 Jul 2025

Efficient and Generalizable Environmental Understanding for Visual Navigation

238

18 Jun 2025

UAD: Unsupervised Affordance Distillation for Generalization in Robotic ManipulationIEEE International Conference on Robotics and Automation (ICRA), 2025

313

10 Jun 2025

MapBERT: Bitwise Masked Modeling for Real-Time Semantic Mapping Generation

Geeta Chandra Raju Bethala

Yi Fang

135

09 Jun 2025

RATE-Nav: Region-Aware Termination Enhancement for Zero-shot Object Navigation with Vision-Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

277

03 Jun 2025

DORAEMON: Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation

499

28 May 2025

SD-OVON: A Semantics-aware Dataset and Benchmark Generation Pipeline for Open-Vocabulary Object Navigation in Dynamic Scenes

176

24 May 2025

Building spatial world models from sparse transitional episodic memories

236

19 May 2025

A Survey of Robotic Navigation and Manipulation with Physics Simulators in the Era of Embodied AI

392

01 May 2025

Multimodal Perception for Goal-oriented Navigation: A Survey

I-Tak Ieong

Hao Tang

LM&Ro LRM

321

22 Apr 2025

CL-CoTNav: Closed-Loop Hierarchical Chain-of-Thought for Zero-Shot Object-Goal Navigation with Vision-Language Models

367

11 Apr 2025

FLAM: Foundation Model-Based Body Stabilization for Humanoid Locomotion and Manipulation

224

28 Mar 2025

Classifier-guided CLIP Distillation for Unsupervised Multi-label ClassificationComputer Vision and Pattern Recognition (CVPR), 2025

Dongseob Kim

Hyunjung Shim

VLM

327

21 Mar 2025

Open-World Skill Discovery from Unsegmented Demonstrations

232

11 Mar 2025

WMNav: Integrating Vision-Language Models into World Models for Object Goal Navigation

685

04 Mar 2025

CuriousBot: Interactive Mobile Exploration via Actionable 3D Relational Object Graph

245

23 Jan 2025

Visual Semantic Navigation with Real Robots

Carlos Gutiérrez-Álvarez

Pablo Ríos-Navarro

Rafael Flor-Rodríguez

Francisco Javier Acevedo-Rodríguez

Roberto J. López-Sastre

442

10 Jan 2025

Efficient Policy Adaptation with Contrastive Prompt Ensemble for Embodied AgentsNeural Information Processing Systems (NeurIPS), 2024

319

16 Dec 2024

Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation

...

407

27 Nov 2024

Teaching Embodied Reinforcement Learning Agents: Informativeness and Diversity of Language UseConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

293

31 Oct 2024

Reliable Semantic Understanding for Real World Zero-shot Object Goal NavigationInternational Conference on Pattern Recognition (ICPR), 2024

178

29 Oct 2024

Zero-shot Object Navigation with Vision-Language Models ReasoningInternational Conference on Pattern Recognition (ICPR), 2024

Yu-Shen Liu

256

24 Oct 2024

ImagineNav: Prompting Vision-Language Models as Embodied Navigator through Scene ImaginationInternational Conference on Learning Representations (ICLR), 2024

234

13 Oct 2024

SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object NavigationNeural Information Processing Systems (NeurIPS), 2024

227

10 Oct 2024

Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and TrainingInternational Journal of Computer Vision (IJCV), 2024

287

09 Oct 2024

PREDICT: Preference Reasoning by Evaluating Decomposed preferences Inferred from Candidate Trajectories

Stephane Aroca-Ouellette

Natalie Mackraz

B. Theobald

Katherine Metcalf

169

08 Oct 2024

The Wallpaper is Ugly: Indoor Localization using Vision and LanguageIEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), 2023

Seth Pate

Lawson L. S. Wong

215

04 Oct 2024

ReLIC: A Recipe for 64k Steps of In-Context Reinforcement Learning for Embodied AI

337

03 Oct 2024

DivScene: Towards Open-Vocabulary Object Navigation with Large Vision Language Models in Diverse Scenes

415

03 Oct 2024

Robo-MUTUAL: Robotic Multimodal Task Specification via Unimodal LearningIEEE International Conference on Robotics and Automation (ICRA), 2024

Jianxiong Li

Zhihao Wang

Jinliang Zheng

Xiaoai Zhou

Guanming Wang

...

Yu Liu

Jingjing Liu

Ya-Qin Zhang

Junzhi Yu

Xianyuan Zhan

244

02 Oct 2024

Feature Extractor or Decision Maker: Rethinking the Role of Visual Encoders in Visuomotor PoliciesIEEE International Conference on Robotics and Automation (ICRA), 2024

366

30 Sep 2024

FLaRe: Achieving Masterful and Adaptive Robot Policies with Large-Scale Reinforcement Learning Fine-TuningIEEE International Conference on Robotics and Automation (ICRA), 2024

Roberto Martín-Martín

Peter Stone

Kuo-Hao Zeng

Kiana Ehsani

318

25 Sep 2024

HM3D-OVON: A Dataset and Benchmark for Open-Vocabulary Object Goal NavigationIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2024

Sehoon Ha

243

22 Sep 2024

Automatic Scene Generation: State-of-the-Art Techniques, Models, Datasets, Challenges, and Future ProspectsIEEE Access (IEEE Access), 2024

271

14 Sep 2024

SOOD-ImageNet: a Large-Scale Dataset for Semantic Out-Of-Distribution Image Classification and Semantic Segmentation

213

02 Sep 2024

VLPG-Nav: Object Navigation Using Visual Language Pose Graph and Object Localization Probability MapsIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2024

Senthil Hariharan Arul

Xuewei

Dinesh Manocha

15 Aug 2024

Visual Grounding for Object-Level Generalization in Reinforcement LearningEuropean Conference on Computer Vision (ECCV), 2024

Haobin Jiang

Zongqing Lu

LM&Ro

229

04 Aug 2024

NOLO: Navigate Only Look Once

322

02 Aug 2024