ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.05442
  4. Cited By
Scaling Vision Transformers to 22 Billion Parameters

Scaling Vision Transformers to 22 Billion Parameters

10 February 2023
Mostafa Dehghani
Josip Djolonga
Basil Mustafa
Piotr Padlewski
Jonathan Heek
Justin Gilmer
Andreas Steiner
Mathilde Caron
Robert Geirhos
Ibrahim M. Alabdulmohsin
Rodolphe Jenatton
Lucas Beyer
Michael Tschannen
Anurag Arnab
Xiao Wang
C. Riquelme
Matthias Minderer
J. Puigcerver
Utku Evci
Manoj Kumar
Sjoerd van Steenkiste
Gamaleldin F. Elsayed
Aravindh Mahendran
F. I. F. Richard Yu
Avital Oliver
Fantine Huot
Jasmijn Bastings
Mark Collier
A. Gritsenko
Vighnesh Birodkar
C. N. Vasconcelos
Yi Tay
Thomas Mensink
Alexander Kolesnikov
Filip Pavetić
Dustin Tran
Thomas Kipf
Mario Luvcić
Xiaohua Zhai
Daniel Keysers
Jeremiah Harmsen
N. Houlsby
    MLLM
ArXivPDFHTML

Papers citing "Scaling Vision Transformers to 22 Billion Parameters"

50 / 416 papers shown
Title
Scaling 4D Representations
Scaling 4D Representations
João Carreira
Dilara Gokay
Michael King
Chuhan Zhang
Ignacio Rocco
...
Viorica Patraucean
Dima Damen
Pauline Luc
Mehdi S. M. Sajjadi
Andrew Zisserman
77
3
0
19 Dec 2024
Slicing Vision Transformer for Flexible Inference
Slicing Vision Transformer for Flexible Inference
Yitian Zhang
Huseyin Coskun
Xu Ma
Huan Wang
Ke Ma
Xi
Chen
Derek Hao Hu
Y. Fu
ViT
74
0
0
06 Dec 2024
Mixture of Physical Priors Adapter for Parameter-Efficient Fine-Tuning
Mixture of Physical Priors Adapter for Parameter-Efficient Fine-Tuning
Z. Wang
C. J. Li
QiXiang Ye
Tong Zhang
MoE
67
1
0
03 Dec 2024
Token Cropr: Faster ViTs for Quite a Few Tasks
Token Cropr: Faster ViTs for Quite a Few Tasks
Benjamin Bergner
C. Lippert
Aravindh Mahendran
ViT
VLM
64
0
0
01 Dec 2024
A Comparative Study of LLM-based ASR and Whisper in Low Resource and
  Code Switching Scenario
A Comparative Study of LLM-based ASR and Whisper in Low Resource and Code Switching Scenario
Zheshu Song
Z. Ma
Yifan Yang
Jianheng Zhuo
Xie Chen
64
2
0
01 Dec 2024
Preserving Deep Representations In One-Shot Pruning: A Hessian-Free
  Second-Order Optimization Framework
Preserving Deep Representations In One-Shot Pruning: A Hessian-Free Second-Order Optimization Framework
Ryan Lucas
Rahul Mazumder
69
0
0
27 Nov 2024
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training
Jared Fernandez
Luca Wehrstedt
Leonid Shamis
Mostafa Elhoushi
Kalyan Saladi
Yonatan Bisk
Emma Strubell
Jacob Kahn
118
3
0
20 Nov 2024
GaussianAnything: Interactive Point Cloud Flow Matching For 3D Object Generation
GaussianAnything: Interactive Point Cloud Flow Matching For 3D Object Generation
Yushi Lan
Shangchen Zhou
Zhaoyang Lyu
Fangzhou Hong
Shuai Yang
Bo Dai
Xingang Pan
Chen Change Loy
3DGS
53
0
0
12 Nov 2024
Moving Off-the-Grid: Scene-Grounded Video Representations
Moving Off-the-Grid: Scene-Grounded Video Representations
Sjoerd van Steenkiste
Daniel Zoran
Yi Yang
Yulia Rubanova
Rishabh Kabra
...
Thomas Keck
João Carreira
Alexey Dosovitskiy
Mehdi S. M. Sajjadi
Thomas Kipf
26
3
0
08 Nov 2024
AsCAN: Asymmetric Convolution-Attention Networks for Efficient
  Recognition and Generation
AsCAN: Asymmetric Convolution-Attention Networks for Efficient Recognition and Generation
Anil Kag
Huseyin Coskun
Jierun Chen
Junli Cao
Willi Menapace
Aliaksandr Siarohin
Sergey Tulyakov
Jian Ren
46
3
0
07 Nov 2024
Character-level Tokenizations as Powerful Inductive Biases for RNA Foundational Models
Adrián Morales-Pastor
Raquel Vázquez-Reza
Miłosz Wieczór
Clàudia Valverde
Manel Gil-Sorribes
Bertran Miquel-Oliver
Álvaro Ciudad
Alexis Molina
AI4CE
56
0
0
05 Nov 2024
Efficient and Effective Adaptation of Multimodal Foundation Models in
  Sequential Recommendation
Efficient and Effective Adaptation of Multimodal Foundation Models in Sequential Recommendation
Junchen Fu
Xuri Ge
Xin Xin
Alexandros Karatzoglou
Ioannis Arapakis
Kaiwen Zheng
Yongxin Ni
J. Jose
23
0
0
05 Nov 2024
ViTally Consistent: Scaling Biological Representation Learning for Cell
  Microscopy
ViTally Consistent: Scaling Biological Representation Learning for Cell Microscopy
Kian Kenyon-Dean
Zitong Jerry Wang
John Urbanik
Konstantin Donhauser
Jason Hartford
...
Safiye Celik
Marta Fay
Juan Sebastian Rodriguez Vera
I. Haque
Oren Z. Kraus
MedIm
24
4
0
04 Nov 2024
Training Compute-Optimal Protein Language Models
Training Compute-Optimal Protein Language Models
Xingyi Cheng
Bo Chen
Pan Li
Jing Gong
Jie Tang
Le Song
74
12
0
04 Nov 2024
Randomized Autoregressive Visual Generation
Randomized Autoregressive Visual Generation
Qihang Yu
Ju He
XueQing Deng
Xiaohui Shen
Liang-Chieh Chen
VGen
DiffM
52
28
1
01 Nov 2024
Enhancing Brain Tumor Classification Using TrAdaBoost and
  Multi-Classifier Deep Learning Approaches
Enhancing Brain Tumor Classification Using TrAdaBoost and Multi-Classifier Deep Learning Approaches
Mahin Mohammadi
Saman Jamshidi
27
1
0
31 Oct 2024
Diffusion Beats Autoregressive: An Evaluation of Compositional Generation in Text-to-Image Models
Diffusion Beats Autoregressive: An Evaluation of Compositional Generation in Text-to-Image Models
Arash Marioriyad
Parham Rezaei
M. Baghshah
M. Rohban
CoGe
79
0
0
30 Oct 2024
ET-Flow: Equivariant Flow-Matching for Molecular Conformer Generation
ET-Flow: Equivariant Flow-Matching for Molecular Conformer Generation
Majdi Hassan
Nikhil Shenoy
Jungyoon Lee
Hannes Stärk
Stephan Thaler
Dominique Beaini
22
6
0
29 Oct 2024
How Does Critical Batch Size Scale in Pre-training?
How Does Critical Batch Size Scale in Pre-training?
Hanlin Zhang
Depen Morwani
Nikhil Vyas
Jingfeng Wu
Difan Zou
Udaya Ghai
Dean Phillips Foster
Sham Kakade
64
8
0
29 Oct 2024
OReole-FM: successes and challenges toward billion-parameter foundation
  models for high-resolution satellite imagery
OReole-FM: successes and challenges toward billion-parameter foundation models for high-resolution satellite imagery
P. Dias
A. Tsaris
Jordan Bowman
Abhishek Potnis
Jacob Arndt
H. Yang
D. Lunga
19
5
0
25 Oct 2024
Frozen-DETR: Enhancing DETR with Image Understanding from Frozen
  Foundation Models
Frozen-DETR: Enhancing DETR with Image Understanding from Frozen Foundation Models
Shenghao Fu
Junkai Yan
Q. Yang
Xihan Wei
Xiaohua Xie
Wei-Shi Zheng
VLM
23
3
0
25 Oct 2024
Rethinking Positive Pairs in Contrastive Learning
Rethinking Positive Pairs in Contrastive Learning
Jiantao Wu
Shentong Mo
Zhenhua Feng
Sara Atito
Josef Kitler
Muhammad Awais
SSL
VLM
33
3
0
23 Oct 2024
Methods of improving LLM training stability
Methods of improving LLM training stability
Oleg Rybakov
Mike Chrzanowski
Peter Dykas
Jinze Xue
Ben Lanir
21
1
0
22 Oct 2024
Towards Optimal Adapter Placement for Efficient Transfer Learning
Towards Optimal Adapter Placement for Efficient Transfer Learning
Aleksandra I. Nowak
Otniel-Bogdan Mercea
Anurag Arnab
Jonas Pfeiffer
Yann N. Dauphin
Utku Evci
23
0
0
21 Oct 2024
TIPS: Text-Image Pretraining with Spatial awareness
TIPS: Text-Image Pretraining with Spatial awareness
Kevis-Kokitsi Maninis
Kaifeng Chen
Soham Ghosh
Arjun Karpur
Koert Chen
...
Jan Dlabal
Dan Gnanapragasam
Mojtaba Seyedhosseini
Howard Zhou
Andre Araujo
VLM
30
3
0
21 Oct 2024
Fluid: Scaling Autoregressive Text-to-image Generative Models with
  Continuous Tokens
Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens
Lijie Fan
Tianhong Li
Siyang Qin
Yuanzhen Li
Chen Sun
Michael Rubinstein
Deqing Sun
Kaiming He
Yonglong Tian
VLM
DiffM
35
40
0
17 Oct 2024
Exploring the Design Space of Visual Context Representation in Video
  MLLMs
Exploring the Design Space of Visual Context Representation in Video MLLMs
Yifan Du
Yuqi Huo
K. Zhou
Zijia Zhao
Haoyu Lu
Han Huang
Wayne Xin Zhao
B. Wang
Weipeng Chen
Ji-Rong Wen
31
2
0
17 Oct 2024
FiTv2: Scalable and Improved Flexible Vision Transformer for Diffusion
  Model
FiTv2: Scalable and Improved Flexible Vision Transformer for Diffusion Model
ZiDong Wang
Zeyu Lu
Di Huang
Cai Zhou
Wanli Ouyang
and Lei Bai
69
3
0
17 Oct 2024
MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained
  Vision-Language Understanding
MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding
Yue Cao
Yangzhou Liu
Zhe Chen
Guangchen Shi
Wenhai Wang
Danhuai Zhao
Tong Lu
41
5
0
15 Oct 2024
ControlMM: Controllable Masked Motion Generation
ControlMM: Controllable Masked Motion Generation
Ekkasit Pinyoanuntapong
Muhammad Usama Saleem
Korrawe Karunratanakul
Pu Wang
Hongfei Xue
C. L. P. Chen
Chuan Guo
Junli Cao
J. Ren
Sergey Tulyakov
VGen
29
4
0
14 Oct 2024
big.LITTLE Vision Transformer for Efficient Visual Recognition
big.LITTLE Vision Transformer for Efficient Visual Recognition
He Guo
Yulong Wang
Zixuan Ye
Jifeng Dai
Yuwen Xiong
ViT
50
0
0
14 Oct 2024
Locality Alignment Improves Vision-Language Models
Locality Alignment Improves Vision-Language Models
Ian Covert
Tony Sun
James Y. Zou
Tatsunori Hashimoto
VLM
64
3
0
14 Oct 2024
Understanding Robustness of Parameter-Efficient Tuning for Image
  Classification
Understanding Robustness of Parameter-Efficient Tuning for Image Classification
Jiacheng Ruan
Xian Gao
Suncheng Xiang
Mingye Xie
Ting Liu
Yuzhuo Fu
AAML
VLM
21
0
0
13 Oct 2024
SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement
  Learning
SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning
Hojoon Lee
Dongyoon Hwang
Donghu Kim
Hyunseung Kim
Jun Jet Tai
K. Subramanian
Peter R. Wurman
Jaegul Choo
Peter Stone
Takuma Seno
OffRL
62
6
0
13 Oct 2024
Scaling Laws For Diffusion Transformers
Scaling Laws For Diffusion Transformers
Zhengyang Liang
Hao He
Ceyuan Yang
Bo Dai
27
8
0
10 Oct 2024
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Gen Luo
Xue Yang
Wenhan Dou
Zhaokai Wang
Jifeng Dai
Jifeng Dai
Yu Qiao
Xizhou Zhu
VLM
MLLM
62
25
0
10 Oct 2024
One Initialization to Rule them All: Fine-tuning via Explained Variance
  Adaptation
One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation
Fabian Paischer
Lukas Hauzenberger
Thomas Schmied
Benedikt Alkin
Marc Peter Deisenroth
Sepp Hochreiter
29
4
0
09 Oct 2024
EvolveDirector: Approaching Advanced Text-to-Image Generation with Large
  Vision-Language Models
EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models
Rui Zhao
Hangjie Yuan
Yujie Wei
Shiwei Zhang
Yuchao Gu
...
Xiang Wang
Zhangjie Wu
Junhao Zhang
Yingya Zhang
Mike Zheng Shou
DiffM
VLM
53
4
0
09 Oct 2024
Falcon Mamba: The First Competitive Attention-free 7B Language Model
Falcon Mamba: The First Competitive Attention-free 7B Language Model
Jingwei Zuo
Maksim Velikanov
Dhia Eddine Rhaiem
Ilyas Chahed
Younes Belkada
Guillaume Kunsch
Hakim Hacid
ALM
52
14
0
07 Oct 2024
DEPT: Decoupled Embeddings for Pre-training Language Models
DEPT: Decoupled Embeddings for Pre-training Language Models
Alex Iacob
Lorenzo Sani
Meghdad Kurmanji
William F. Shen
Xinchi Qiu
Dongqi Cai
Yan Gao
Nicholas D. Lane
VLM
67
0
0
07 Oct 2024
Selective Attention Improves Transformer
Selective Attention Improves Transformer
Yaniv Leviathan
Matan Kalman
Yossi Matias
49
8
0
03 Oct 2024
Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts
Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts
Minh Le
Chau Nguyen
Huy Nguyen
Quyen Tran
Trung Le
Nhat Ho
30
3
0
03 Oct 2024
DaWin: Training-free Dynamic Weight Interpolation for Robust Adaptation
DaWin: Training-free Dynamic Weight Interpolation for Robust Adaptation
Changdae Oh
Yixuan Li
Kyungwoo Song
Sangdoo Yun
Dongyoon Han
OOD
MoMe
36
4
0
03 Oct 2024
Positional Attention: Expressivity and Learnability of Algorithmic Computation
Positional Attention: Expressivity and Learnability of Algorithmic Computation
Artur Back de Luca
George Giapitzakis
Shenghao Yang
Petar Veličković
K. Fountoulakis
37
0
0
02 Oct 2024
Scaling Optimal LR Across Token Horizons
Scaling Optimal LR Across Token Horizons
Johan Bjorck
Alon Benhaim
Vishrav Chaudhary
Furu Wei
Xia Song
46
4
0
30 Sep 2024
Scaling Diffusion Policy in Transformer to 1 Billion Parameters for
  Robotic Manipulation
Scaling Diffusion Policy in Transformer to 1 Billion Parameters for Robotic Manipulation
Minjie Zhu
Yichen Zhu
Jinming Li
Junjie Wen
Zhiyuan Xu
...
Ran Cheng
Chaomin Shen
Yaxin Peng
Feifei Feng
Jian Tang
28
13
0
22 Sep 2024
Formula-Supervised Visual-Geometric Pre-training
Formula-Supervised Visual-Geometric Pre-training
Ryosuke Yamada
Kensho Hara
Hirokatsu Kataoka
Koshi Makihara
Nakamasa Inoue
Rio Yokota
Y. Satoh
21
1
0
20 Sep 2024
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal
  Reasoning with Large Language Models
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models
Shengsheng Qian
Zuyi Zhou
Dizhan Xue
Bing Wang
Changsheng Xu
LRM
34
1
0
19 Sep 2024
NVLM: Open Frontier-Class Multimodal LLMs
NVLM: Open Frontier-Class Multimodal LLMs
Wenliang Dai
Nayeon Lee
Boxin Wang
Zhuoling Yang
Zihan Liu
Jon Barker
Tuomas Rintamaki
M. Shoeybi
Bryan Catanzaro
Wei Ping
MLLM
VLM
LRM
40
50
0
17 Sep 2024
SOAP: Improving and Stabilizing Shampoo using Adam
SOAP: Improving and Stabilizing Shampoo using Adam
Nikhil Vyas
Depen Morwani
Rosie Zhao
Itai Shapira
David Brandfonbrener
Lucas Janson
Sham Kakade
Sham Kakade
59
23
0
17 Sep 2024
Previous
123456789
Next