ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.09888
  4. Cited By
Simple but Effective: CLIP Embeddings for Embodied AI
v1v2 (latest)

Simple but Effective: CLIP Embeddings for Embodied AI

18 November 2021
Apoorv Khandelwal
Luca Weihs
Roozbeh Mottaghi
Aniruddha Kembhavi
    VLMLM&Ro
ArXiv (abs)PDFHTMLGithub (126★)

Papers citing "Simple but Effective: CLIP Embeddings for Embodied AI"

40 / 190 papers shown
Title
Objaverse: A Universe of Annotated 3D Objects
Objaverse: A Universe of Annotated 3D ObjectsComputer Vision and Pattern Recognition (CVPR), 2022
Matt Deitke
Dustin Schwenk
Jordi Salvador
Luca Weihs
Oscar Michel
Eli VanderBilt
Ludwig Schmidt
Kiana Ehsani
Aniruddha Kembhavi
Ali Farhadi
480
1,326
0
15 Dec 2022
Self-Supervised Object Goal Navigation with In-Situ Finetuning
Self-Supervised Object Goal Navigation with In-Situ FinetuningIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2022
So Yeon Min
Yifan Hao
Wei Ding
Ali Farhadi
Ruslan Salakhutdinov
Yonatan Bisk
Jian Zhang
SSL
234
8
0
09 Dec 2022
Phone2Proc: Bringing Robust Robots Into Our Chaotic World
Phone2Proc: Bringing Robust Robots Into Our Chaotic WorldComputer Vision and Pattern Recognition (CVPR), 2022
Matt Deitke
Rose Hendrix
Luca Weihs
Ali Farhadi
Kiana Ehsani
Aniruddha Kembhavi
LM&Ro
200
26
0
08 Dec 2022
Task Bias in Vision-Language Models
Task Bias in Vision-Language Models
Sachit Menon
I. Chandratreya
Carl Vondrick
VLMSSL
123
7
0
08 Dec 2022
PEANUT: Predicting and Navigating to Unseen Targets
PEANUT: Predicting and Navigating to Unseen TargetsIEEE International Conference on Computer Vision (ICCV), 2022
Albert J. Zhai
Shenlong Wang
193
38
0
05 Dec 2022
Navigating to Objects in the Real World
Navigating to Objects in the Real WorldScience Robotics (Sci. Robot.), 2022
Théophile Gervet
Soumith Chintala
Dhruv Batra
Jitendra Malik
Devendra Singh Chaplot
213
180
0
02 Dec 2022
A General Purpose Supervisory Signal for Embodied Agents
A General Purpose Supervisory Signal for Embodied Agents
Kunal Pratap Singh
Jordi Salvador
Luca Weihs
Aniruddha Kembhavi
SSL
204
4
0
01 Dec 2022
Time-Aware Datasets are Adaptive Knowledgebases for the New Normal
Time-Aware Datasets are Adaptive Knowledgebases for the New Normal
Abhijit Suprem
Sanjyot Vaidya
J. Ferreira
C. Pu
159
2
0
22 Nov 2022
Last-Mile Embodied Visual Navigation
Last-Mile Embodied Visual NavigationConference on Robot Learning (CoRL), 2022
Justin Wasserman
Karmesh Yadav
Girish Chowdhary
Abhi Gupta
Unnat Jain
166
41
0
21 Nov 2022
Ask4Help: Learning to Leverage an Expert for Embodied Tasks
Ask4Help: Learning to Leverage an Expert for Embodied TasksNeural Information Processing Systems (NeurIPS), 2022
Kunal Pratap Singh
Luca Weihs
Alvaro Herrasti
Jonghyun Choi
Aniruddha Kemhavi
Roozbeh Mottaghi
208
27
0
18 Nov 2022
I Can't Believe There's No Images! Learning Visual Tasks Using only
  Language Supervision
I Can't Believe There's No Images! Learning Visual Tasks Using only Language SupervisionIEEE International Conference on Computer Vision (ICCV), 2022
Sophia Gu
Christopher Clark
Aniruddha Kembhavi
VLM
247
34
0
17 Nov 2022
Foundation Models for Semantic Novelty in Reinforcement Learning
Foundation Models for Semantic Novelty in Reinforcement Learning
Tarun Gupta
Peter Karkus
Tong Che
Danfei Xu
Marco Pavone
VLMOffRLLRM
110
9
0
09 Nov 2022
Text-Only Training for Image Captioning using Noise-Injected CLIP
Text-Only Training for Image Captioning using Noise-Injected CLIPConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
David Nukrai
Ron Mokady
Amir Globerson
VLMCLIP
255
123
0
01 Nov 2022
Instruction-Following Agents with Multimodal Transformer
Instruction-Following Agents with Multimodal Transformer
Hao Liu
Lisa Lee
Kimin Lee
Pieter Abbeel
LM&Ro
226
15
0
24 Oct 2022
A Visual Tour Of Current Challenges In Multimodal Language Models
A Visual Tour Of Current Challenges In Multimodal Language Models
Shashank Sonkar
Naiming Liu
Richard G. Baraniuk
DiffM
90
2
0
22 Oct 2022
DANLI: Deliberative Agent for Following Natural Language Instructions
DANLI: Deliberative Agent for Following Natural Language InstructionsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yichi Zhang
Jianing Yang
Jiayi Pan
Shane Storks
N. Devraj
Ziqiao Ma
Keunwoo Peter Yu
Yuwei Bao
J. Chai
LM&Ro
312
19
0
22 Oct 2022
LAION-5B: An open large-scale dataset for training next generation
  image-text models
LAION-5B: An open large-scale dataset for training next generation image-text modelsNeural Information Processing Systems (NeurIPS), 2022
Christoph Schuhmann
Romain Beaumont
Richard Vencu
Cade Gordon
Ross Wightman
...
Srivatsa Kundurthy
Katherine Crowson
Ludwig Schmidt
R. Kaczmarczyk
J. Jitsev
VLMMLLMCLIP
732
4,467
0
16 Oct 2022
Retrospectives on the Embodied AI Workshop
Retrospectives on the Embodied AI Workshop
Matt Deitke
Dhruv Batra
Yonatan Bisk
Tommaso Campari
Angel X. Chang
...
Jesse Thomason
Alexander Toshev
Joanne Truong
Luca Weihs
Jiajun Wu
LM&Ro
325
53
0
13 Oct 2022
NeRF2Real: Sim2real Transfer of Vision-guided Bipedal Motion Skills
  using Neural Radiance Fields
NeRF2Real: Sim2real Transfer of Vision-guided Bipedal Motion Skills using Neural Radiance FieldsIEEE International Conference on Robotics and Automation (ICRA), 2022
Arunkumar Byravan
Jan Humplik
Leonard Hasenclever
Arthur Brussee
F. Nori
...
Ben Moran
Steven Bohez
Fereshteh Sadeghi
Bojan Vujatovic
N. Heess
278
69
0
10 Oct 2022
VIMA: General Robot Manipulation with Multimodal Prompts
VIMA: General Robot Manipulation with Multimodal Prompts
Yunfan Jiang
Agrim Gupta
Zichen Zhang
Guanzhi Wang
Yongqiang Dou
Yanjun Chen
Li Fei-Fei
Anima Anandkumar
Yuke Zhu
Linxi Fan
LM&Ro
319
463
0
06 Oct 2022
EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual
  Grounding
EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual GroundingComputer Vision and Pattern Recognition (CVPR), 2022
Yanmin Wu
Xinhua Cheng
Renrui Zhang
Zesen Cheng
Jian Zhang
258
106
0
29 Sep 2022
GAMA: Generative Adversarial Multi-Object Scene Attacks
GAMA: Generative Adversarial Multi-Object Scene AttacksNeural Information Processing Systems (NeurIPS), 2022
Abhishek Aich
Calvin-Khang Ta
Akash Gupta
Chengyu Song
S. Krishnamurthy
M. Salman Asif
Amit K. Roy-Chowdhury
AAML
275
24
0
20 Sep 2022
Zero-shot Active Visual Search (ZAVIS): Intelligent Object Search for
  Robotic Assistants
Zero-shot Active Visual Search (ZAVIS): Intelligent Object Search for Robotic AssistantsIEEE International Conference on Robotics and Automation (ICRA), 2022
Jeongeun Park
Taerim Yoon
Jejoon Hong
Youngjae Yu
Matthew K. X. J. Pan
Sungjoon Choi
291
16
0
19 Sep 2022
Instruction-driven history-aware policies for robotic manipulations
Instruction-driven history-aware policies for robotic manipulationsConference on Robot Learning (CoRL), 2022
Pierre-Louis Guhur
Shizhe Chen
Ricardo Garcia Pinel
Makarand Tapaswi
Ivan Laptev
Cordelia Schmid
LM&Ro
409
139
0
11 Sep 2022
Robots Enact Malignant Stereotypes
Robots Enact Malignant StereotypesConference on Fairness, Accountability and Transparency (FAccT), 2022
Rumaisa Azeem
William Agnew
V. Zeng
Severin Kacianka
Matthew C. Gombolay
LM&Ro
170
56
0
23 Jul 2022
Inner Monologue: Embodied Reasoning through Planning with Language
  Models
Inner Monologue: Embodied Reasoning through Planning with Language ModelsConference on Robot Learning (CoRL), 2022
Wenlong Huang
F. Xia
Ted Xiao
Harris Chan
Jacky Liang
...
Tomas Jackson
Linda Luu
Sergey Levine
Karol Hausman
Brian Ichter
LLMAGLM&RoLRM
368
1,140
0
12 Jul 2022
LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language,
  Vision, and Action
LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and ActionConference on Robot Learning (CoRL), 2022
Dhruv Shah
B. Osinski
Brian Ichter
Sergey Levine
LM&Ro
491
599
0
10 Jul 2022
ZSON: Zero-Shot Object-Goal Navigation using Multimodal Goal Embeddings
ZSON: Zero-Shot Object-Goal Navigation using Multimodal Goal EmbeddingsNeural Information Processing Systems (NeurIPS), 2022
Arjun Majumdar
Gunjan Aggarwal
Bhavika Devnani
Judy Hoffman
Dhruv Batra
LM&Ro
383
233
0
24 Jun 2022
AnyMorph: Learning Transferable Polices By Inferring Agent Morphology
AnyMorph: Learning Transferable Polices By Inferring Agent MorphologyInternational Conference on Machine Learning (ICML), 2022
Brandon Trabucco
Mariano Phielipp
Glen Berseth
150
35
0
17 Jun 2022
Zero-shot object goal visual navigation
Zero-shot object goal visual navigationIEEE International Conference on Robotics and Automation (ICRA), 2022
Qianfan Zhao
Lu Zhang
Bin He
Hong Qiao
Zhi-yong Liu
199
57
0
15 Jun 2022
ProcTHOR: Large-Scale Embodied AI Using Procedural Generation
ProcTHOR: Large-Scale Embodied AI Using Procedural Generation
Matt Deitke
Eli VanderBilt
Alvaro Herrasti
Luca Weihs
Jordi Salvador
...
Winson Han
Eric Kolve
Ali Farhadi
Aniruddha Kembhavi
Roozbeh Mottaghi
LM&Ro
298
361
0
14 Jun 2022
Offline Visual Representation Learning for Embodied Navigation
Offline Visual Representation Learning for Embodied Navigation
Karmesh Yadav
Ram Ramrakhya
Arjun Majumdar
Vincent-Pierre Berges
Sachit Kuhar
Dhruv Batra
Alexei Baevski
Oleksandr Maksymets
OffRLSSL
237
94
0
27 Apr 2022
Can Foundation Models Perform Zero-Shot Task Specification For Robot
  Manipulation?
Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation?Conference on Learning for Dynamics & Control (L4DC), 2022
Yuchen Cui
S. Niekum
Abhi Gupta
Vikash Kumar
Aravind Rajeswaran
LM&Ro
166
89
0
23 Apr 2022
Semantic Exploration from Language Abstractions and Pretrained
  Representations
Semantic Exploration from Language Abstractions and Pretrained RepresentationsNeural Information Processing Systems (NeurIPS), 2022
Allison C. Tam
Neil C. Rabinowitz
Andrew Kyle Lampinen
Nicholas A. Roy
Stephanie C. Y. Chan
D. Strouse
Jane X. Wang
Andrea Banino
Felix Hill
LM&Ro
300
76
0
08 Apr 2022
Habitat-Web: Learning Embodied Object-Search Strategies from Human
  Demonstrations at Scale
Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at ScaleComputer Vision and Pattern Recognition (CVPR), 2022
Ram Ramrakhya
Eric Undersander
Dhruv Batra
Abhishek Das
LM&Ro
323
130
0
07 Apr 2022
R3M: A Universal Visual Representation for Robot Manipulation
R3M: A Universal Visual Representation for Robot ManipulationConference on Robot Learning (CoRL), 2022
Suraj Nair
Aravind Rajeswaran
Vikash Kumar
Chelsea Finn
Abhi Gupta
LM&Ro
315
747
0
23 Mar 2022
CoWs on Pasture: Baselines and Benchmarks for Language-Driven Zero-Shot
  Object Navigation
CoWs on Pasture: Baselines and Benchmarks for Language-Driven Zero-Shot Object NavigationComputer Vision and Pattern Recognition (CVPR), 2022
S. Gadre
Mitchell Wortsman
Gabriel Ilharco
Ludwig Schmidt
Shuran Song
CLIPLM&Ro
317
228
0
20 Mar 2022
Stubborn: A Strong Baseline for Indoor Object Navigation
Stubborn: A Strong Baseline for Indoor Object NavigationIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2022
Haokuan Luo
Albert Yue
Zhang-Wei Hong
Pulkit Agrawal
259
55
0
14 Mar 2022
Democratizing Contrastive Language-Image Pre-training: A CLIP Benchmark
  of Data, Model, and Supervision
Democratizing Contrastive Language-Image Pre-training: A CLIP Benchmark of Data, Model, and Supervision
Yufeng Cui
Lichen Zhao
Feng Liang
Yangguang Li
Jing Shao
UQCVVLMCLIP
276
50
0
11 Mar 2022
The Unsurprising Effectiveness of Pre-Trained Vision Models for Control
The Unsurprising Effectiveness of Pre-Trained Vision Models for ControlInternational Conference on Machine Learning (ICML), 2022
Simone Parisi
Aravind Rajeswaran
Senthil Purushwalkam
Abhinav Gupta
LM&Ro
250
219
0
07 Mar 2022
Previous
1234