ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.09747
  4. Cited By
FAST: Efficient Action Tokenization for Vision-Language-Action Models

FAST: Efficient Action Tokenization for Vision-Language-Action Models

17 January 2025
Karl Pertsch
Kyle Stachowicz
Brian Ichter
Danny Driess
Suraj Nair
Q. Vuong
Oier Mees
Chelsea Finn
Sergey Levine
ArXiv (abs)PDFHTML

Papers citing "FAST: Efficient Action Tokenization for Vision-Language-Action Models"

50 / 52 papers shown
Title
Diffusion Models for Robotic Manipulation: A Survey
Diffusion Models for Robotic Manipulation: A Survey
Rosa Wolf
Yitian Shi
Sheng Liu
Rania Rayyes
127
2
0
01 Jul 2025
AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning
AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning
Zewei Zhou
Tianhui Cai
Seth Z. Zhao
Yun Zhang
Zhiyu Huang
Bolei Zhou
Jiaqi Ma
LRMVLM
25
0
0
16 Jun 2025
CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding
CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding
Wenxuan Song
Jiayi Chen
Pengxiang Ding
Yuxin Huang
Han Zhao
Donglin Wang
Haoang Li
28
0
0
16 Jun 2025
mimic-one: a Scalable Model Recipe for General Purpose Robot Dexterity
mimic-one: a Scalable Model Recipe for General Purpose Robot Dexterity
Elvis Nava
Victoriano Montesinos
Erik Bauer
Benedek Forrai
Jonas Pai
...
Stephan-Daniel Gravert
Philipp Wand
Stephan Polinski
Benjamin Grewe
Robert K. Katzschmann
27
0
0
13 Jun 2025
Eye, Robot: Learning to Look to Act with a BC-RL Perception-Action Loop
Eye, Robot: Learning to Look to Act with a BC-RL Perception-Action Loop
Justin Kerr
Kush Hari
Ethan Weber
Chung Min Kim
Brent Yi
Tyler Bonnen
Ken Goldberg
Angjoo Kanazawa
121
0
0
12 Jun 2025
From Intention to Execution: Probing the Generalization Boundaries of Vision-Language-Action Models
Irving Fang
Juexiao Zhang
Shengbang Tong
Chen Feng
LM&Ro
63
1
0
11 Jun 2025
SAFE: Multitask Failure Detection for Vision-Language-Action Models
SAFE: Multitask Failure Detection for Vision-Language-Action Models
Qiao Gu
Yuanliang Ju
Shengxiang Sun
Igor Gilitschenski
Haruki Nishimura
Masha Itkina
Florian Shkurti
55
0
0
11 Jun 2025
An Open-Source Software Toolkit & Benchmark Suite for the Evaluation and Adaptation of Multimodal Action Models
An Open-Source Software Toolkit & Benchmark Suite for the Evaluation and Adaptation of Multimodal Action Models
Pranav Guruprasad
Yangyue Wang
Sudipta Chowdhury
Jaewoo Song
Harshvardhan Sikka
41
0
0
10 Jun 2025
Scaling Laws of Motion Forecasting and Planning -- A Technical Report
Mustafa Baniodeh
Kratarth Goel
Scott Ettinger
Carlos Fuertes
Ari Seff
...
Vinutha Kallem
Sergio Casas
Rami Al-Rfou
Benjamin Sapp
Dragomir Anguelov
33
0
0
09 Jun 2025
Real-Time Execution of Action Chunking Flow Policies
Real-Time Execution of Action Chunking Flow Policies
Kevin Black
Manuel Y. Galliker
Sergey Levine
OffRL
28
0
0
09 Jun 2025
Robotic Policy Learning via Human-assisted Action Preference Optimization
Robotic Policy Learning via Human-assisted Action Preference Optimization
Wenke Xia
Yichu Yang
Hongtao Wu
Xiao Ma
Tao Kong
Di Hu
35
0
0
08 Jun 2025
BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning
BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning
Hongyi Zhou
Weiran Liao
Xi Huang
Yucheng Tang
Fabian Otto
...
Qian Wang
Ömer Erdinç Yagmurlu
Nils Blank
Moritz Reuss
Rudolf Lioutikov
65
0
0
06 Jun 2025
DemoSpeedup: Accelerating Visuomotor Policies via Entropy-Guided Demonstration Acceleration
DemoSpeedup: Accelerating Visuomotor Policies via Entropy-Guided Demonstration Acceleration
Lingxiao Guo
Zhengrong Xue
Zijing Xu
Huazhe Xu
168
0
0
05 Jun 2025
STAR: Learning Diverse Robot Skill Abstractions through Rotation-Augmented Vector Quantization
STAR: Learning Diverse Robot Skill Abstractions through Rotation-Augmented Vector Quantization
Hao Li
Qi Lv
Rui Shao
Xiang Deng
Yinchuan Li
Jianye Hao
Liqiang Nie
141
1
0
04 Jun 2025
FreqPolicy: Frequency Autoregressive Visuomotor Policy with Continuous Tokens
FreqPolicy: Frequency Autoregressive Visuomotor Policy with Continuous Tokens
Yiming Zhong
Yumeng Liu
Chuyang Xiao
Zemin Yang
Youzhuo Wang
Yufei Zhu
Ye-ling Shi
Yujing Sun
X. Zhu
Yuexin Ma
61
0
0
02 Jun 2025
Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better
Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better
Danny Driess
Jost Tobias Springenberg
Brian Ichter
Lili Yu
Adrian Li-Bell
...
Allen Z. Ren
Homer Walke
Quan Vuong
Lucy Xiaoyang Shi
Sergey Levine
119
2
0
29 May 2025
ChatVLA-2: Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained Knowledge
ChatVLA-2: Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained Knowledge
Zhongyi Zhou
Yichen Zhu
Junjie Wen
Chaomin Shen
Yi Xu
LM&RoLRMVLM
98
1
0
28 May 2025
SCIZOR: A Self-Supervised Approach to Data Curation for Large-Scale Imitation Learning
SCIZOR: A Self-Supervised Approach to Data Curation for Large-Scale Imitation Learning
Yu Zhang
Yuqi Xie
Huihan Liu
Rutav Shah
Michael Wan
Linxi Fan
Yuke Zhu
58
0
0
28 May 2025
WorldEval: World Model as Real-World Robot Policies Evaluator
WorldEval: World Model as Real-World Robot Policies Evaluator
Yaxuan Li
Yichen Zhu
Junjie Wen
Chaomin Shen
Yi Xu
OffRLVGen
31
0
0
25 May 2025
VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning
VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning
Guanxing Lu
Wenkai Guo
Chubin Zhang
Yuheng Zhou
Haonan Jiang
Zifeng Gao
Yansong Tang
Ziwei Wang
OffRL
118
0
0
24 May 2025
3D Equivariant Visuomotor Policy Learning via Spherical Projection
3D Equivariant Visuomotor Policy Learning via Spherical Projection
Boce Hu
Dian Wang
David Klee
Heng Tian
Xupeng Zhu
Haojie Huang
Robert Platt
Robin Walters
101
0
0
22 May 2025
Exploring the Limits of Vision-Language-Action Manipulations in Cross-task Generalization
Exploring the Limits of Vision-Language-Action Manipulations in Cross-task Generalization
Jiaming Zhou
Ke Ye
Jiayi Liu
Teli Ma
Zifang Wang
Ronghe Qiu
Kun-Yu Lin
Zhilin Zhao
Junwei Liang
130
2
0
21 May 2025
From Grounding to Manipulation: Case Studies of Foundation Model Integration in Embodied Robotic Systems
From Grounding to Manipulation: Case Studies of Foundation Model Integration in Embodied Robotic Systems
Xiuchao Sui
Daiying Tian
Qi Sun
Ruirui Chen
Dongkyu Choi
Kenneth Kwok
Soujanya Poria
LM&Ro
113
0
0
21 May 2025
Policy Contrastive Decoding for Robotic Foundation Models
Policy Contrastive Decoding for Robotic Foundation Models
Shihan Wu
Ji Zhang
Xu Luo
Junlin Xie
Jingkuan Song
Heng Tao Shen
Lianli Gao
OffRL
271
0
0
19 May 2025
OneTwoVLA: A Unified Vision-Language-Action Model with Adaptive Reasoning
OneTwoVLA: A Unified Vision-Language-Action Model with Adaptive Reasoning
Fanqi Lin
Ruiqian Nai
Yingdong Hu
Jiacheng You
Junming Zhao
Yang Gao
LRM
101
0
0
17 May 2025
Zero-Shot Visual Generalization in Robot Manipulation
Zero-Shot Visual Generalization in Robot Manipulation
Sumeet Batra
Gaurav Sukhatme
79
0
0
16 May 2025
Surgical Foundation Model Leveraging Compression and Entropy Maximization for Image-Guided Surgical Assistance
Surgical Foundation Model Leveraging Compression and Entropy Maximization for Image-Guided Surgical Assistance
Lianhao Yin
O. Meireles
Guy Rosman
Daniela Rus
35
0
0
16 May 2025
Conditioning Matters: Training Diffusion Policies is Faster Than You Think
Conditioning Matters: Training Diffusion Policies is Faster Than You Think
Zibin Dong
Yicheng Liu
Yinchuan Li
Hang Zhao
Haifeng Zhang
128
0
0
16 May 2025
ManipBench: Benchmarking Vision-Language Models for Low-Level Robot Manipulation
ManipBench: Benchmarking Vision-Language Models for Low-Level Robot Manipulation
Enyu Zhao
Vedant Raval
Hejia Zhang
Jiageng Mao
Zeyu Shangguan
Stefanos Nikolaidis
Yun Wang
Daniel Seita
LM&RoCoGe
98
0
0
14 May 2025
Real2Render2Real: Scaling Robot Data Without Dynamics Simulation or Robot Hardware
Real2Render2Real: Scaling Robot Data Without Dynamics Simulation or Robot Hardware
Justin Yu
Letian Fu
Huang Huang
Karim El-Refai
Rares Andrei Ambrus
Richard Cheng
Muhammad Zubair Irshad
Ken Goldberg
81
1
0
14 May 2025
Training Strategies for Efficient Embodied Reasoning
Training Strategies for Efficient Embodied Reasoning
William Chen
Suneel Belkhale
Suvir Mirchandani
Oier Mees
Danny Driess
Karl Pertsch
Sergey Levine
OffRLLRM
99
0
0
13 May 2025
Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action Environments
Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action Environments
Pranav Guruprasad
Yangyue Wang
Sudipta Chowdhury
Harshvardhan Sikka
Paul Pu Liang
LM&RoVLM
455
1
0
08 May 2025
D-CODA: Diffusion for Coordinated Dual-Arm Data Augmentation
D-CODA: Diffusion for Coordinated Dual-Arm Data Augmentation
Isabella Liu
Jason Chen
Gaurav Sukhatme
Daniel Seita
131
0
0
08 May 2025
Vision-Language-Action Models: Concepts, Progress, Applications and Challenges
Vision-Language-Action Models: Concepts, Progress, Applications and Challenges
Ranjan Sapkota
Yang Cao
Konstantinos I. Roumeliotis
Manoj Karkee
LM&Ro
407
2
0
07 May 2025
NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks
NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks
Chia-Yu Hung
Qi Sun
Pengfei Hong
Amir Zadeh
Chuan Li
U-Xuan Tan
Navonil Majumder
Soujanya Poria
LM&Ro
120
4
0
28 Apr 2025
STDArm: Transferring Visuomotor Policies From Static Data Training to Dynamic Robot Manipulation
STDArm: Transferring Visuomotor Policies From Static Data Training to Dynamic Robot Manipulation
YiFan Duan
Heng Li
Yilong Wu
Wenhao Yu
Xinran Zhang
Yedong Shen
Jianmin Ji
Yanzhe Zhang
139
0
0
26 Apr 2025
$π_{0.5}$: a Vision-Language-Action Model with Open-World Generalization
π0.5π_{0.5}π0.5​: a Vision-Language-Action Model with Open-World Generalization
Physical Intelligence
Kevin Black
Noah Brown
James Darpinian
Karan Dhabalia
...
Homer Walke
Anna Walling
Haohuan Wang
Lili Yu
Ury Zhilinsky
LM&RoVLM
137
51
0
22 Apr 2025
AlphaSpace: Enabling Robotic Actions through Semantic Tokenization and Symbolic Reasoning
AlphaSpace: Enabling Robotic Actions through Semantic Tokenization and Symbolic Reasoning
Alan Dao
Dinh Bach Vu
Bui Quang Huy
188
0
0
24 Mar 2025
Diffusion Dynamics Models with Generative State Estimation for Cloth Manipulation
Diffusion Dynamics Models with Generative State Estimation for Cloth Manipulation
Tongxuan Tian
Haoyang Li
Bo Ai
Xiaodi Yuan
Zhiao Huang
H. Su
DiffMAI4CE
123
3
0
15 Mar 2025
Towards Fast, Memory-based and Data-Efficient Vision-Language Policy
Haoxuan Li
Sixu Yan
Yongqian Li
Xinggang Wang
LM&Ro
128
1
0
13 Mar 2025
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model
Jiaming Liu
Hao Chen
Pengju An
Zhuoyang Liu
Renrui Zhang
...
Chengkai Hou
Mengdi Zhao
KC alex Zhou
Pheng-Ann Heng
Shanghang Zhang
196
20
0
13 Mar 2025
Open-World Skill Discovery from Unsegmented Demonstrations
Jingwen Deng
Zihao Wang
Shaofei Cai
Hoang Trung-Dung
Yitao Liang
67
1
0
11 Mar 2025
PoseLess: Depth-Free Vision-to-Joint Control via Direct Image Mapping with VLM
Alan Dao
Dinh Bach Vu
Tuan Le Duc Anh
Bui Quang Huy
104
0
0
10 Mar 2025
PointVLA: Injecting the 3D World into Vision-Language-Action Models
Chengmeng Li
Junjie Wen
Yan Peng
Chaomin Shen
Feifei Feng
Yinlin Zhu
3DPC
162
9
0
10 Mar 2025
OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction
OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction
Huang Huang
Fangchen Liu
Letian Fu
Tingfan Wu
Mustafa Mukadam
Jitendra Malik
Ken Goldberg
Pieter Abbeel
LM&RoVLM
184
10
0
05 Mar 2025
SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning
SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning
Borong Zhang
Yuhao Zhang
Yalan Qin
Yingshan Lei
Josef Dai
Yuanpei Chen
Yaodong Yang
128
0
0
05 Mar 2025
Accelerating Vision-Language-Action Model Integrated with Action Chunking via Parallel Decoding
Wenxuan Song
Jiayi Chen
Pengxiang Ding
Han Zhao
Wei Zhao
Zhide Zhong
Zongyuan Ge
Jun Ma
Haoang Li
114
7
0
04 Mar 2025
Action Tokenizer Matters in In-Context Imitation Learning
An Vuong
M. Vu
Dong An
Ian Reid
121
1
0
03 Mar 2025
ObjectVLA: End-to-End Open-World Object Manipulation Without Demonstration
ObjectVLA: End-to-End Open-World Object Manipulation Without Demonstration
Minjie Zhu
Yinlin Zhu
Jinming Li
Zhongyi Zhou
Junjie Wen
Xiaoyu Liu
Yaxin Peng
Chaomin Shen
Feifei Feng
LM&Ro
153
6
0
26 Feb 2025
Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models
Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models
Lucy Xiaoyang Shi
Brian Ichter
Michael Equi
Liyiming Ke
Karl Pertsch
...
Adrian Li-Bell
Danny Driess
Lachy Groom
Sergey Levine
Chelsea Finn
LM&RoLRM
149
23
0
26 Feb 2025
12
Next