Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2408.01147
Cited By
Actra: Optimized Transformer Architecture for Vision-Language-Action Models in Robot Learning
2 August 2024
Yueen Ma
Dafeng Chi
Shiguang Wu
Yuecheng Liu
Yuzheng Zhuang
Jianye Hao
Irwin King
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Actra: Optimized Transformer Architecture for Vision-Language-Action Models in Robot Learning"
10 / 10 papers shown
Title
CIVIL: Causal and Intuitive Visual Imitation Learning
Yinlong Dai
Robert Ramirez Sanchez
Ryan Jeronimus
Shahabedin Sagheb
Cara M. Nunez
Heramb Nemlekar
Dylan P. Losey
61
0
0
24 Apr 2025
Effective Tuning Strategies for Generalist Robot Manipulation Policies
Wenbo Zhang
Yang Li
Yanyuan Qiao
Siyuan Huang
Jiajun Liu
Feras Dayoub
Xiao Ma
Lingqiao Liu
21
0
0
02 Oct 2024
Towards Effective Utilization of Mixed-Quality Demonstrations in Robotic Manipulation via Segment-Level Selection and Optimization
Jingjing Chen
Hongjie Fang
Hao-Shu Fang
Cewu Lu
34
2
0
30 Sep 2024
Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions
Yevgen Chebotar
Q. Vuong
A. Irpan
Karol Hausman
F. Xia
...
Brianna Zitkovich
Tomas Jackson
Kanishka Rao
Chelsea Finn
Sergey Levine
OffRL
121
81
0
18 Sep 2023
Open-World Object Manipulation using Pre-trained Vision-Language Models
Austin Stone
Ted Xiao
Yao Lu
K. Gopalakrishnan
Kuang-Huei Lee
...
Sean Kirmani
Brianna Zitkovich
F. Xia
Chelsea Finn
Karol Hausman
LM&Ro
142
144
0
02 Mar 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
259
4,223
0
30 Jan 2023
Real-World Robot Learning with Masked Visual Pre-training
Ilija Radosavovic
Tete Xiao
Stephen James
Pieter Abbeel
Jitendra Malik
Trevor Darrell
SSL
146
239
0
06 Oct 2022
VLP: A Survey on Vision-Language Pre-training
Feilong Chen
Duzhen Zhang
Minglun Han
Xiuyi Chen
Jing Shi
Shuang Xu
Bo Xu
VLM
82
211
0
18 Feb 2022
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
258
7,412
0
11 Nov 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
3,683
0
11 Feb 2021
1