Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2401.08541
Cited By
Scalable Pre-training of Large Autoregressive Image Models
16 January 2024
Alaaeldin El-Nouby
Michal Klein
Shuangfei Zhai
Miguel Angel Bautista
Alexander Toshev
Vaishaal Shankar
J. Susskind
Armand Joulin
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Scalable Pre-training of Large Autoregressive Image Models"
20 / 20 papers shown
Title
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
Haokun Lin
Teng Wang
Yixiao Ge
Yuying Ge
Zhichao Lu
Ying Wei
Qingfu Zhang
Zhenan Sun
Ying Shan
MLLM
VLM
64
0
0
08 May 2025
Learning Streaming Video Representation via Multitask Training
Yibin Yan
Jilan Xu
Shangzhe Di
Yikun Liu
Yudi Shi
Qirui Chen
Zeqian Li
Yifei Huang
Weidi Xie
CLL
82
0
0
28 Apr 2025
Can Masked Autoencoders Also Listen to Birds?
Lukas Rauch
Ilyass Moummad
René Heinrich
Alexis Joly
Bernhard Sick
Christoph Scholz
27
0
0
17 Apr 2025
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjD
VOS
103
0
0
17 Apr 2025
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
Tim Tsz-Kit Lau
Weijian Li
Chenwei Xu
Han Liu
Mladen Kolar
100
0
0
30 Dec 2024
Evaluating the Effectiveness of Attack-Agnostic Features for Morphing Attack Detection
Laurent Colbois
S´ebastien Marcel
AAML
22
0
0
22 Oct 2024
Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective
Yongxin Zhu
B. Li
Hang Zhang
Xin Li
Linli Xu
Lidong Bing
DiffM
21
9
0
16 Oct 2024
Locality Alignment Improves Vision-Language Models
Ian Covert
Tony Sun
James Y. Zou
Tatsunori Hashimoto
VLM
64
3
0
14 Oct 2024
Generalizable autoregressive modeling of time series through functional narratives
Ran Liu
Wenrui Ma
Ellen L. Zippi
Hadi Pouransari
Jingyun Xiao
...
Behrooz Mahasseni
Juri Minxha
Erdrin Azemi
Eva L. Dyer
Ali Moin
AI4TS
25
0
0
10 Oct 2024
An Image is Worth 32 Tokens for Reconstruction and Generation
Qihang Yu
Mark Weber
XueQing Deng
Xiaohui Shen
Daniel Cremers
Liang-Chieh Chen
VLM
ViT
44
79
0
11 Jun 2024
MiM: Mask in Mask Self-Supervised Pre-Training for 3D Medical Image Analysis
Jiaxin Zhuang
Linshan Wu
Qiong Wang
V. Vardhanabhuti
Lin Luo
Hao Chen
Hao Chen
49
4
0
24 Apr 2024
Denoising Autoregressive Representation Learning
Yazhe Li
J. Bornschein
Ting Chen
DiffM
27
3
0
08 Mar 2024
Data-efficient Large Vision Models through Sequential Autoregression
Jianyuan Guo
Zhiwei Hao
Chengcheng Wang
Yehui Tang
Han Wu
Han Hu
Kai Han
Chang Xu
VLM
21
10
0
07 Feb 2024
Low-Resource Vision Challenges for Foundation Models
Yunhua Zhang
Hazel Doughty
Cees G. M. Snoek
VLM
22
5
0
09 Jan 2024
The effectiveness of MAE pre-pretraining for billion-scale pretraining
Mannat Singh
Quentin Duval
Kalyan Vasudev Alwala
Haoqi Fan
Vaibhav Aggarwal
...
Piotr Dollár
Christoph Feichtenhofer
Ross B. Girshick
Rohit Girdhar
Ishan Misra
LRM
105
63
0
23 Mar 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,881
0
04 Mar 2022
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
258
7,412
0
11 Nov 2021
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
298
5,761
0
29 Apr 2021
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
4,764
0
24 Feb 2021
Pixel Recurrent Neural Networks
Aaron van den Oord
Nal Kalchbrenner
Koray Kavukcuoglu
SSeg
GAN
225
2,543
0
25 Jan 2016
1