Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.20179
Cited By
Theia: Distilling Diverse Vision Foundation Models for Robot Learning
29 July 2024
Jinghuan Shang
Karl Schmeckpeper
Brandon B. May
M. Minniti
Tarik Kelestemur
David Watkins
Laura Herlant
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Theia: Distilling Diverse Vision Foundation Models for Robot Learning"
13 / 13 papers shown
Title
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjD
VOS
98
0
0
17 Apr 2025
RFMI: Estimating Mutual Information on Rectified Flow for Text-to-Image Alignment
Chao Wang
Giulio Franzese
A. Finamore
Pietro Michiardi
60
0
0
18 Mar 2025
Swiss Army Knife: Synergizing Biases in Knowledge from Vision Foundation Models for Multi-Task Learning
Yuxiang Lu
Shengcao Cao
Yu-xiong Wang
34
1
0
18 Oct 2024
SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
Haoyi Zhu
Honghui Yang
Yating Wang
Jiange Yang
Limin Wang
Tong He
3DH
43
5
0
10 Oct 2024
PHI-S: Distribution Balancing for Label-Free Multi-Teacher Distillation
Mike Ranzinger
Jon Barker
Greg Heinrich
Pavlo Molchanov
Bryan Catanzaro
Andrew Tao
17
4
0
02 Oct 2024
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
49
38
0
23 May 2024
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Jun Chen
Deyao Zhu
Xiaoqian Shen
Xiang Li
Zechun Liu
Pengchuan Zhang
Raghuraman Krishnamoorthi
Vikas Chandra
Yunyang Xiong
Mohamed Elhoseiny
MLLM
152
280
0
14 Oct 2023
Habitat-Matterport 3D Semantics Dataset
Karmesh Yadav
Ram Ramrakhya
Santhosh Kumar Ramakrishnan
Theo Gervet
John Turner
...
Angel X. Chang
Dhruv Batra
Manolis Savva
Alexander William Clegg
Devendra Singh Chaplot
3DV
MDE
73
81
0
11 Oct 2022
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
255
7,337
0
11 Nov 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
212
682
0
13 Oct 2021
Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets
F. Ebert
Yanlai Yang
Karl Schmeckpeper
Bernadette Bucher
G. Georgakis
Kostas Daniilidis
Chelsea Finn
Sergey Levine
147
212
0
27 Sep 2021
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
283
5,723
0
29 Apr 2021
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
Xiuye Gu
Tsung-Yi Lin
Weicheng Kuo
Yin Cui
VLM
ObjD
206
698
0
28 Apr 2021
1