ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.09888
  4. Cited By
Simple but Effective: CLIP Embeddings for Embodied AI

Simple but Effective: CLIP Embeddings for Embodied AI

18 November 2021
Apoorv Khandelwal
Luca Weihs
Roozbeh Mottaghi
Aniruddha Kembhavi
    VLM
    LM&Ro
ArXivPDFHTML

Papers citing "Simple but Effective: CLIP Embeddings for Embodied AI"

50 / 173 papers shown
Title
A Survey of Robotic Navigation and Manipulation with Physics Simulators in the Era of Embodied AI
A Survey of Robotic Navigation and Manipulation with Physics Simulators in the Era of Embodied AI
Lik Hang Kenny Wong
Xueyang Kang
Kaixin Bai
Jianwei Zhang
52
0
0
01 May 2025
Multimodal Perception for Goal-oriented Navigation: A Survey
Multimodal Perception for Goal-oriented Navigation: A Survey
I-Tak Ieong
Hao Tang
LM&Ro
LRM
29
0
0
22 Apr 2025
CL-CoTNav: Closed-Loop Hierarchical Chain-of-Thought for Zero-Shot Object-Goal Navigation with Vision-Language Models
CL-CoTNav: Closed-Loop Hierarchical Chain-of-Thought for Zero-Shot Object-Goal Navigation with Vision-Language Models
Yuxin Cai
Xiangkun He
Maonan Wang
Hongliang Guo
W. Yau
Chen Lv
LM&Ro
LRM
34
0
0
11 Apr 2025
FLAM: Foundation Model-Based Body Stabilization for Humanoid Locomotion and Manipulation
FLAM: Foundation Model-Based Body Stabilization for Humanoid Locomotion and Manipulation
Xianqi Zhang
Hongliang Wei
Wenrui Wang
Xingtao Wang
Xiaopeng Fan
Debin Zhao
34
0
0
28 Mar 2025
Classifier-guided CLIP Distillation for Unsupervised Multi-label Classification
Classifier-guided CLIP Distillation for Unsupervised Multi-label Classification
Dongseob Kim
Hyunjung Shim
VLM
44
0
0
21 Mar 2025
Open-World Skill Discovery from Unsegmented Demonstrations
Jingwen Deng
Zihao Wang
Shaofei Cai
Anji Liu
Yitao Liang
41
1
0
11 Mar 2025
WMNav: Integrating Vision-Language Models into World Models for Object Goal Navigation
WMNav: Integrating Vision-Language Models into World Models for Object Goal Navigation
Dujun Nie
Xianda Guo
Yiqun Duan
Ruijun Zhang
Long Chen
LM&Ro
124
2
0
04 Mar 2025
Visual Semantic Navigation with Real Robots
Visual Semantic Navigation with Real Robots
Carlos Gutiérrez-Álvarez
Pablo Ríos-Navarro
Rafael Flor-Rodríguez
Francisco Javier Acevedo-Rodríguez
Roberto J. López-Sastre
47
2
0
10 Jan 2025
Efficient Policy Adaptation with Contrastive Prompt Ensemble for
  Embodied Agents
Efficient Policy Adaptation with Contrastive Prompt Ensemble for Embodied Agents
Wonje Choi
Woo Kyung Kim
SeungHyun Kim
Honguk Woo
72
8
0
16 Dec 2024
Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for
  Robust 3D Robotic Manipulation
Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation
Yueru Jia
Jiaming Liu
Sixiang Chen
Chenyang Gu
Z. Wang
...
Lily Lee
Pengwei Wang
Zhongyuan Wang
Renrui Zhang
Shanghang Zhang
87
11
0
27 Nov 2024
Teaching Embodied Reinforcement Learning Agents: Informativeness and
  Diversity of Language Use
Teaching Embodied Reinforcement Learning Agents: Informativeness and Diversity of Language Use
Jiajun Xi
Yinong He
Jianing Yang
Yinpei Dai
Joyce Chai
LM&Ro
24
2
0
31 Oct 2024
Reliable Semantic Understanding for Real World Zero-shot Object Goal
  Navigation
Reliable Semantic Understanding for Real World Zero-shot Object Goal Navigation
Halil Utku Unlu
Shuaihang Yuan
Congcong Wen
Hao Huang
Anthony Tzes
Yi Fang
27
1
0
29 Oct 2024
Zero-shot Object Navigation with Vision-Language Models Reasoning
Zero-shot Object Navigation with Vision-Language Models Reasoning
Congcong Wen
Yisiyuan Huang
Hao Huang
Yanjia Huang
Shuaihang Yuan
Yu Hao
Hui Lin
Yu-Shen Liu
Yi Fang
LM&Ro
40
7
0
24 Oct 2024
ImagineNav: Prompting Vision-Language Models as Embodied Navigator
  through Scene Imagination
ImagineNav: Prompting Vision-Language Models as Embodied Navigator through Scene Imagination
Xinxin Zhao
Wenzhe Cai
Likun Tang
Teng Wang
LM&Ro
32
3
0
13 Oct 2024
SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object
  Navigation
SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object Navigation
Hang Yin
Xiuwei Xu
Zhenyu Wu
Jie Zhou
Jiwen Lu
29
13
0
10 Oct 2024
Positive-Augmented Contrastive Learning for Vision-and-Language
  Evaluation and Training
Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training
Sara Sarto
Nicholas Moratelli
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
28
3
0
09 Oct 2024
PREDICT: Preference Reasoning by Evaluating Decomposed preferences
  Inferred from Candidate Trajectories
PREDICT: Preference Reasoning by Evaluating Decomposed preferences Inferred from Candidate Trajectories
Stephane Aroca-Ouellette
Natalie Mackraz
B. Theobald
Katherine Metcalf
28
0
0
08 Oct 2024
The Wallpaper is Ugly: Indoor Localization using Vision and Language
The Wallpaper is Ugly: Indoor Localization using Vision and Language
Seth Pate
Lawson L. S. Wong
31
0
0
04 Oct 2024
ReLIC: A Recipe for 64k Steps of In-Context Reinforcement Learning for
  Embodied AI
ReLIC: A Recipe for 64k Steps of In-Context Reinforcement Learning for Embodied AI
Ahmad Elawady
Gunjan Chhablani
Ram Ramrakhya
Karmesh Yadav
Dhruv Batra
Z. Kira
Andrew Szot
OffRL
28
0
0
03 Oct 2024
DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes
  and Objects
DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects
Zhaowei Wang
Hongming Zhang
Tianqing Fang
Ye Tian
Yue Yang
Kaixin Ma
Xiaoman Pan
Yangqiu Song
Dong Yu
LM&Ro
33
3
0
03 Oct 2024
Robo-MUTUAL: Robotic Multimodal Task Specification via Unimodal Learning
Robo-MUTUAL: Robotic Multimodal Task Specification via Unimodal Learning
Jianxiong Li
Zhihao Wang
Jinliang Zheng
Xiaoai Zhou
Guanming Wang
...
Yu Liu
Jingjing Liu
Ya-Qin Zhang
Junzhi Yu
Xianyuan Zhan
38
2
0
02 Oct 2024
Feature Extractor or Decision Maker: Rethinking the Role of Visual Encoders in Visuomotor Policies
Feature Extractor or Decision Maker: Rethinking the Role of Visual Encoders in Visuomotor Policies
Ruiyu Wang
Zheyu Zhuang
Shutong Jin
Nils Ingelhag
Danica Kragic
Florian T. Pokorny
22
0
0
30 Sep 2024
FLaRe: Achieving Masterful and Adaptive Robot Policies with Large-Scale
  Reinforcement Learning Fine-Tuning
FLaRe: Achieving Masterful and Adaptive Robot Policies with Large-Scale Reinforcement Learning Fine-Tuning
Jiaheng Hu
Rose Hendrix
Ali Farhadi
Aniruddha Kembhavi
Roberto Martin-Martin
Peter Stone
Kuo-Hao Zeng
Kiana Ehsani
31
7
0
25 Sep 2024
HM3D-OVON: A Dataset and Benchmark for Open-Vocabulary Object Goal
  Navigation
HM3D-OVON: A Dataset and Benchmark for Open-Vocabulary Object Goal Navigation
Naoki Yokoyama
Ram Ramrakhya
Abhishek Das
Dhruv Batra
Sehoon Ha
21
9
0
22 Sep 2024
SOOD-ImageNet: a Large-Scale Dataset for Semantic Out-Of-Distribution
  Image Classification and Semantic Segmentation
SOOD-ImageNet: a Large-Scale Dataset for Semantic Out-Of-Distribution Image Classification and Semantic Segmentation
Alberto Bacchin
Davide Allegro
Stefano Ghidoni
Emanuele Menegatti
42
1
0
02 Sep 2024
VLPG-Nav: Object Navigation Using Visual Language Pose Graph and Object
  Localization Probability Maps
VLPG-Nav: Object Navigation Using Visual Language Pose Graph and Object Localization Probability Maps
Senthil Hariharan Arul
Dhruva Kumar
Vivek Sugirtharaj
Richard Kim
Xuewei
Qi
R. Madhivanan
Arnie Sen
Dinesh Manocha
18
1
0
15 Aug 2024
Visual Grounding for Object-Level Generalization in Reinforcement
  Learning
Visual Grounding for Object-Level Generalization in Reinforcement Learning
Haobin Jiang
Zongqing Lu
LM&Ro
25
2
0
04 Aug 2024
NOLO: Navigate Only Look Once
NOLO: Navigate Only Look Once
Mengyu Bu
Shuhao Gu
Yang Feng
EgoV
36
1
0
02 Aug 2024
OVExp: Open Vocabulary Exploration for Object-Oriented Navigation
OVExp: Open Vocabulary Exploration for Object-Oriented Navigation
Meng Wei
Tai Wang
Yilun Chen
Hanqing Wang
Jiangmiao Pang
Xihui Liu
VLM
47
3
0
12 Jul 2024
PoliFormer: Scaling On-Policy RL with Transformers Results in Masterful
  Navigators
PoliFormer: Scaling On-Policy RL with Transformers Results in Masterful Navigators
Kuo-Hao Zeng
Zichen Zhang
Kiana Ehsani
Rose Hendrix
Jordi Salvador
Alvaro Herrasti
Ross Girshick
Aniruddha Kembhavi
Luca Weihs
LM&Ro
OffRL
34
17
0
28 Jun 2024
ET tu, CLIP? Addressing Common Object Errors for Unseen Environments
ET tu, CLIP? Addressing Common Object Errors for Unseen Environments
Ye Won Byun
Cathy Jiao
Shahriar Noroozizadeh
Jimin Sun
Rosa Vitiello
VLM
27
1
0
25 Jun 2024
Center-Sensitive Kernel Optimization for Efficient On-Device Incremental
  Learning
Center-Sensitive Kernel Optimization for Efficient On-Device Incremental Learning
Dingwen Zhang
Yan Li
De-Chun Cheng
N. Wang
J. Han
CLL
34
0
0
13 Jun 2024
Adapting Pretrained ViTs with Convolution Injector for Visuo-Motor
  Control
Adapting Pretrained ViTs with Convolution Injector for Visuo-Motor Control
Dongyoon Hwang
ByungKun Lee
Hojoon Lee
Hyunseung Kim
Jaegul Choo
40
0
0
10 Jun 2024
Learning Manipulation by Predicting Interaction
Learning Manipulation by Predicting Interaction
Jia Zeng
Qingwen Bu
Bangjun Wang
Wenke Xia
Li Chen
...
Heming Cui
Bin Zhao
Xuelong Li
Yu Qiao
Hongyang Li
48
20
0
01 Jun 2024
SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for
  Embodied Manipulation
SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation
Junjie Zhang
Chenjia Bai
Haoran He
Wenke Xia
Zhigang Wang
Bin Zhao
Xiu Li
Xuelong Li
35
12
0
30 May 2024
Leveraging Unknown Objects to Construct Labeled-Unlabeled
  Meta-Relationships for Zero-Shot Object Navigation
Leveraging Unknown Objects to Construct Labeled-Unlabeled Meta-Relationships for Zero-Shot Object Navigation
Yanwei Zheng
Changrui Li
Chuanlin Lan
Yaling Li
Xiao Zhang
Yifei Zou
Dongxiao Yu
Zhipeng Cai
25
0
0
24 May 2024
A Survey on Vision-Language-Action Models for Embodied AI
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
67
41
0
23 May 2024
Pre-trained Text-to-Image Diffusion Models Are Versatile Representation
  Learners for Control
Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control
Gunshi Gupta
Karmesh Yadav
Y. Gal
Dhruv Batra
Z. Kira
Cong Lu
Tim G. J. Rudner
39
7
0
09 May 2024
What Foundation Models can Bring for Robot Learning in Manipulation : A
  Survey
What Foundation Models can Bring for Robot Learning in Manipulation : A Survey
Dingzhe Li
Yixiang Jin
A. Yong
Hongze Yu
Jun Shi
Xiaoshuai Hao
Peng Hao
Huaping Liu
Fuchun Sun
Bin Fang
AI4CE
LM&Ro
64
13
0
28 Apr 2024
Unified Scene Representation and Reconstruction for 3D Large Language
  Models
Unified Scene Representation and Reconstruction for 3D Large Language Models
Tao Chu
Pan Zhang
Xiao-wen Dong
Yuhang Zang
Qiong Liu
Jiaqi Wang
24
1
0
19 Apr 2024
Empowering Embodied Visual Tracking with Visual Foundation Models and
  Offline RL
Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL
Fangwei Zhong
Kui Wu
Hai Ci
Churan Wang
Hao Chen
OffRL
34
2
0
15 Apr 2024
TDANet: Target-Directed Attention Network For Object-Goal Visual
  Navigation With Zero-Shot Ability
TDANet: Target-Directed Attention Network For Object-Goal Visual Navigation With Zero-Shot Ability
Shiwei Lian
Feitian Zhang
27
3
0
12 Apr 2024
Reflectance Estimation for Proximity Sensing by Vision-Language Models:
  Utilizing Distributional Semantics for Low-Level Cognition in Robotics
Reflectance Estimation for Proximity Sensing by Vision-Language Models: Utilizing Distributional Semantics for Low-Level Cognition in Robotics
Masashi Osada
G. A. G. Ricardez
Yosuke Suzuki
Tadahiro Taniguchi
21
2
0
11 Apr 2024
GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation
GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation
Mukul Khanna
Ram Ramrakhya
Gunjan Chhablani
Sriram Yenamandra
Théophile Gervet
Matthew Chang
Z. Kira
Devendra Singh Chaplot
Dhruv Batra
Roozbeh Mottaghi
LM&Ro
51
22
0
09 Apr 2024
SUGAR: Pre-training 3D Visual Representations for Robotics
SUGAR: Pre-training 3D Visual Representations for Robotics
Shizhe Chen
Ricardo Garcia Pinel
Ivan Laptev
Cordelia Schmid
37
14
0
01 Apr 2024
Online Embedding Multi-Scale CLIP Features into 3D Maps
Online Embedding Multi-Scale CLIP Features into 3D Maps
Shun Taguchi
Hideki Deguchi
22
0
0
27 Mar 2024
C-TPT: Calibrated Test-Time Prompt Tuning for Vision-Language Models via
  Text Feature Dispersion
C-TPT: Calibrated Test-Time Prompt Tuning for Vision-Language Models via Text Feature Dispersion
Hee Suk Yoon
Eunseop Yoon
Joshua Tian Jin Tee
M. Hasegawa-Johnson
Yingzhen Li
C. Yoo
VLM
55
23
0
21 Mar 2024
Aligning Knowledge Graph with Visual Perception for Object-goal
  Navigation
Aligning Knowledge Graph with Visual Perception for Object-goal Navigation
Nuo Xu
Wen Wang
Rong Yang
Mengjie Qin
Zheyuan Lin
Wei Song
Chunlong Zhang
J. Gu
Chao Li
27
7
0
29 Feb 2024
DecisionNCE: Embodied Multimodal Representations via Implicit Preference
  Learning
DecisionNCE: Embodied Multimodal Representations via Implicit Preference Learning
Jianxiong Li
Jinliang Zheng
Yinan Zheng
Liyuan Mao
Xiaoming Hu
...
Jihao Liu
Yu Liu
Jingjing Liu
Ya-Qin Zhang
Xianyuan Zhan
LM&Ro
OffRL
35
8
0
28 Feb 2024
RoboEXP: Action-Conditioned Scene Graph via Interactive Exploration for
  Robotic Manipulation
RoboEXP: Action-Conditioned Scene Graph via Interactive Exploration for Robotic Manipulation
Hanxiao Jiang
Binghao Huang
Ruihai Wu
Zhuoran Li
Shubham Garg
H. Nayyeri
Shenlong Wang
Yunzhu Li
29
17
0
23 Feb 2024
1234
Next