Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.06189
Cited By
Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
8 July 2024
Orr Zohar
Xiaohan Wang
Yonatan Bitton
Idan Szpektor
Serena Yeung-Levy
VLM
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision"
8 / 8 papers shown
Title
VideoSAVi: Self-Aligned Video Language Models without Human Supervision
Yogesh Kulkarni
Pooyan Fazli
VLM
90
2
0
01 Dec 2024
LOGO: A Long-Form Video Dataset for Group Action Quality Assessment
Shiyi Zhang
Wen-Dao Dai
Sujia Wang
Xiangwei Shen
Jiwen Lu
Jie Zhou
Yansong Tang
42
25
0
07 Apr 2024
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models
Siddharth Karamcheti
Suraj Nair
Ashwin Balakrishna
Percy Liang
Thomas Kollar
Dorsa Sadigh
MLLM
VLM
57
95
0
12 Feb 2024
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
Avi Singh
John D. Co-Reyes
Rishabh Agarwal
Ankesh Anand
Piyush Patil
...
Yamini Bansal
Ethan Dyer
Behnam Neyshabur
Jascha Narain Sohl-Dickstein
Noah Fiedel
ALM
LRM
ReLM
SyDa
147
143
0
11 Dec 2023
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Bin Lin
Yang Ye
Bin Zhu
Jiaxi Cui
Munan Ning
Peng Jin
Li-ming Yuan
VLM
MLLM
185
576
0
16 Nov 2023
GPT-4V(ision) Unsuitable for Clinical Care and Education: A Clinician-Evaluated Assessment
Senthujan Senkaiahliyan
Augustin Toma
Jun Ma
An-Wen Chan
Andrew Ha
Kevin R. An
Hrishikesh Suresh
Barry Rubin
Bo Wang
LM&MA
ELM
38
11
0
14 Nov 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
382
4,010
0
28 Jan 2022
1