ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.16588
  4. Cited By
Vision Transformers Need Registers

Vision Transformers Need Registers

28 September 2023
Zilong Chen
Maxime Oquab
Julien Mairal
Huaping Liu
    ViT
ArXivPDFHTML

Papers citing "Vision Transformers Need Registers"

50 / 239 papers shown
Title
Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models
Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models
Amir Mohammad Karimi Mamaghan
Samuele Papa
Karl Henrik Johansson
Stefan Bauer
Andrea Dittadi
OCL
32
5
0
22 Jul 2024
Token-level Correlation-guided Compression for Efficient Multimodal
  Document Understanding
Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding
Renshan Zhang
Yibo Lyu
Rui Shao
Gongwei Chen
Weili Guan
Liqiang Nie
20
1
0
19 Jul 2024
ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language
  Inference
ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference
Mengcheng Lan
Chaofeng Chen
Yiping Ke
Xinjiang Wang
Litong Feng
Wayne Zhang
VLM
24
23
0
17 Jul 2024
GeneralAD: Anomaly Detection Across Domains by Attending to Distorted
  Features
GeneralAD: Anomaly Detection Across Domains by Attending to Distorted Features
Luc P.J. Strater
Mohammadreza Salehi
E. Gavves
Cees G. M. Snoek
Yuki M. Asano
25
7
0
17 Jul 2024
DiNO-Diffusion. Scaling Medical Diffusion via Self-Supervised
  Pre-Training
DiNO-Diffusion. Scaling Medical Diffusion via Self-Supervised Pre-Training
Guillermo Jiménez-Pérez
Pedro Osório
Josef Cersovsky
Javier Montalt-Tordera
Jens Hooge
Steffen Vogler
Sadegh Mohammadi
MedIm
24
2
0
16 Jul 2024
Aligning Neuronal Coding of Dynamic Visual Scenes with Foundation Vision
  Models
Aligning Neuronal Coding of Dynamic Visual Scenes with Foundation Vision Models
Rining Wu
Feixiang Zhou
Ziwei Yin
Jian K. Liu
27
0
0
15 Jul 2024
Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic
  Segmentation
Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation
Tong Shao
Zhuotao Tian
Hang Zhao
Jingyong Su
VLM
21
14
0
11 Jul 2024
iiANET: Inception Inspired Attention Hybrid Network for efficient Long-Range Dependency
iiANET: Inception Inspired Attention Hybrid Network for efficient Long-Range Dependency
Haruna Yunusa
Qin Shiyin
Abdulrahman Hamman Adama Chukkol
Isah Bello
A. Lawan
Isah Bello
29
3
0
10 Jul 2024
Multi-Label Plant Species Classification with Self-Supervised Vision
  Transformers
Multi-Label Plant Species Classification with Self-Supervised Vision Transformers
Murilo Gustineli
Anthony Miyaguchi
Ian Stalter
18
1
0
08 Jul 2024
FALIP: Visual Prompt as Foveal Attention Boosts CLIP Zero-Shot
  Performance
FALIP: Visual Prompt as Foveal Attention Boosts CLIP Zero-Shot Performance
Jiedong Zhuang
Jiaqi Hu
Lianrui Mu
Rui Hu
Xiaoyu Liang
Jiangnan Ye
Haoji Hu
CLIP
VLM
23
2
0
08 Jul 2024
PDiscoFormer: Relaxing Part Discovery Constraints with Vision
  Transformers
PDiscoFormer: Relaxing Part Discovery Constraints with Vision Transformers
Ananthu Aniraj
C. Dantas
Dino Ienco
Diego Marcos
21
0
0
05 Jul 2024
ColPali: Efficient Document Retrieval with Vision Language Models
ColPali: Efficient Document Retrieval with Vision Language Models
Manuel Faysse
Hugues Sibille
Tony Wu
Bilel Omrani
Gautier Viaud
C´eline Hudelot
Pierre Colombo
VLM
48
21
0
27 Jun 2024
AlignedCut: Visual Concepts Discovery on Brain-Guided Universal Feature
  Space
AlignedCut: Visual Concepts Discovery on Brain-Guided Universal Feature Space
Huzheng Yang
James Gee
Jianbo Shi
VOS
24
1
0
26 Jun 2024
Beyond the Doors of Perception: Vision Transformers Represent Relations
  Between Objects
Beyond the Doors of Perception: Vision Transformers Represent Relations Between Objects
Michael A. Lepori
Alexa R. Tartaglini
Wai Keen Vong
Thomas Serre
Brenden Lake
Ellie Pavlick
16
1
0
22 Jun 2024
StableSemantics: A Synthetic Language-Vision Dataset of Semantic
  Representations in Naturalistic Images
StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic Images
Rushikesh Zawar
Shaurya Dewan
Andrew F. Luo
Margaret M. Henderson
Michael J. Tarr
Leila Wehbe
VGen
CoGe
23
1
0
19 Jun 2024
ExPLoRA: Parameter-Efficient Extended Pre-Training to Adapt Vision Transformers under Domain Shifts
ExPLoRA: Parameter-Efficient Extended Pre-Training to Adapt Vision Transformers under Domain Shifts
Samar Khanna
Medhanie Irgau
David B. Lobell
Stefano Ermon
VLM
16
4
0
16 Jun 2024
Enhancing Anomaly Detection Generalization through Knowledge Exposure:
  The Dual Effects of Augmentation
Enhancing Anomaly Detection Generalization through Knowledge Exposure: The Dual Effects of Augmentation
Mohammad Akhavan Anvari
Rojina Kashefi
Vahid Reza Khazaie
Mohammad Khalooei
Mohammad Sabokrou
17
0
0
15 Jun 2024
ProtoS-ViT: Visual foundation models for sparse self-explainable
  classifications
ProtoS-ViT: Visual foundation models for sparse self-explainable classifications
Hugues Turbé
Mina Bjelogrlic
G. Mengaldo
Christian Lovis
ViT
24
6
0
14 Jun 2024
Depth Anything V2
Depth Anything V2
Lihe Yang
Bingyi Kang
Zilong Huang
Zhen Zhao
Xiaogang Xu
Jiashi Feng
Hengshuang Zhao
DiffM
VLM
MDE
55
314
0
13 Jun 2024
Memory-Efficient Sparse Pyramid Attention Networks for Whole Slide Image
  Analysis
Memory-Efficient Sparse Pyramid Attention Networks for Whole Slide Image Analysis
Weiyi Wu
Chongyang Gao
Xinwen Xu
Siting Li
Jiang Gui
24
0
0
13 Jun 2024
Let Go of Your Labels with Unsupervised Transfer
Let Go of Your Labels with Unsupervised Transfer
Artyom Gadetsky
Yulun Jiang
Maria Brbić
VLM
19
5
0
11 Jun 2024
Unified Modeling Enhanced Multimodal Learning for Precision
  Neuro-Oncology
Unified Modeling Enhanced Multimodal Learning for Precision Neuro-Oncology
Huahui Yi
Xiaofei Wang
Kang Li
Chao Li
22
0
0
11 Jun 2024
SignMusketeers: An Efficient Multi-Stream Approach for Sign Language
  Translation at Scale
SignMusketeers: An Efficient Multi-Stream Approach for Sign Language Translation at Scale
Shester Gueuwou
Xiaodan Du
Greg Shakhnarovich
Karen Livescu
SLR
16
3
0
11 Jun 2024
Beyond Bare Queries: Open-Vocabulary Object Grounding with 3D Scene Graph
Beyond Bare Queries: Open-Vocabulary Object Grounding with 3D Scene Graph
S. Linok
T. Zemskova
Svetlana Ladanova
Roman Titkov
Dmitry A. Yudin
Maxim Monastyrny
Aleksei Valenkov
LM&Ro
37
3
0
11 Jun 2024
Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding
  of Sound and Language
Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language
Mark Hamilton
Andrew Zisserman
John R. Hershey
William T. Freeman
VLM
19
5
0
09 Jun 2024
Nomic Embed Vision: Expanding the Latent Space
Nomic Embed Vision: Expanding the Latent Space
Zach Nussbaum
Brandon Duderstadt
Andriy Mulyar
VLM
25
5
0
06 Jun 2024
Analyzing the Feature Extractor Networks for Face Image Synthesis
Analyzing the Feature Extractor Networks for Face Image Synthesis
Erdi Sarıtaş
H. K. Ekenel
CVBM
EGVM
32
1
0
04 Jun 2024
Learning to Play Atari in a World of Tokens
Learning to Play Atari in a World of Tokens
Pranav Agarwal
Sheldon Andrews
Samira Ebrahimi Kahou
OffRL
21
0
0
03 Jun 2024
TabPedia: Towards Comprehensive Visual Table Understanding with Concept
  Synergy
TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy
Weichao Zhao
Hao Feng
Qi Liu
Jingqun Tang
Shubo Wei
...
Lei Liao
Yongjie Ye
Hao Liu
Houqiang Li
Can Huang
LMTD
23
17
0
03 Jun 2024
Contextual Counting: A Mechanistic Study of Transformers on a
  Quantitative Task
Contextual Counting: A Mechanistic Study of Transformers on a Quantitative Task
Siavash Golkar
Alberto Bietti
Mariel Pettee
Michael Eickenberg
M. Cranmer
...
Ruben Ohana
Liam Parker
Bruno Régaldo-Saint Blancard
Kyunghyun Cho
Shirley Ho
34
1
0
30 May 2024
Don't drop your samples! Coherence-aware training benefits Conditional diffusion
Don't drop your samples! Coherence-aware training benefits Conditional diffusion
Nicolas Dufour
Victor Besnier
Vicky Kalogeiton
David Picard
DiffM
41
2
0
30 May 2024
Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation
Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation
Ya Lu
Jishnu Jaykumar
Yunhui Guo
Nicholas Ruozzi
Yu Xiang
VLM
ISeg
36
3
0
28 May 2024
Memorize What Matters: Emergent Scene Decomposition from Multitraverse
Memorize What Matters: Emergent Scene Decomposition from Multitraverse
Yiming Li
Zehong Wang
Yue Wang
Zhiding Yu
Zan Gojcic
Marco Pavone
Chen Feng
Jose M. Alvarez
3DGS
43
1
0
27 May 2024
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based
  Approach
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach
Huy V. Vo
Vasil Khalidov
Timothée Darcet
Théo Moutakanni
Nikita Smetanin
...
Maxime Oquab
Armand Joulin
Hervé Jégou
Patrick Labatut
Piotr Bojanowski
SSL
40
18
0
24 May 2024
Mamba-R: Vision Mamba ALSO Needs Registers
Mamba-R: Vision Mamba ALSO Needs Registers
Feng Wang
Jiahao Wang
Sucheng Ren
Guoyizhe Wei
Jieru Mei
Wei Shao
Yuyin Zhou
Alan L. Yuille
Cihang Xie
Mamba
18
19
0
23 May 2024
Transformers for Image-Goal Navigation
Transformers for Image-Goal Navigation
Nikhilanj Pelluri
ViT
14
0
0
23 May 2024
Dinomaly: The Less Is More Philosophy in Multi-Class Unsupervised Anomaly Detection
Dinomaly: The Less Is More Philosophy in Multi-Class Unsupervised Anomaly Detection
Jia Guo
Shuai Lu
Weihang Zhang
Huiqi Li
Huiqi Li
Hongen Liao
ViT
47
7
0
23 May 2024
Register assisted aggregation for Visual Place Recognition
Register assisted aggregation for Visual Place Recognition
Xuan Yu
Zhenyong Fu
22
0
0
19 May 2024
Topicwise Separable Sentence Retrieval for Medical Report Generation
Topicwise Separable Sentence Retrieval for Medical Report Generation
Junting Zhao
Yang Zhou
Zhihao Chen
Huazhu Fu
Liang Wan
MedIm
25
1
0
07 May 2024
What matters when building vision-language models?
What matters when building vision-language models?
Hugo Laurençon
Léo Tronchon
Matthieu Cord
Victor Sanh
VLM
30
155
0
03 May 2024
Exploring Self-Supervised Vision Transformers for Deepfake Detection: A
  Comparative Analysis
Exploring Self-Supervised Vision Transformers for Deepfake Detection: A Comparative Analysis
H. Nguyen
Junichi Yamagishi
Isao Echizen
23
6
0
01 May 2024
Training a high-performance retinal foundation model with half-the-data
  and 400 times less compute
Training a high-performance retinal foundation model with half-the-data and 400 times less compute
Justin Engelmann
Miguel O. Bernabeu
MedIm
OOD
24
0
0
30 Apr 2024
When Medical Imaging Met Self-Attention: A Love Story That Didn't Quite
  Work Out
When Medical Imaging Met Self-Attention: A Love Story That Didn't Quite Work Out
Tristan Piater
Niklas Penzel
Gideon Stein
Joachim Denzler
24
2
0
18 Apr 2024
kNN-CLIP: Retrieval Enables Training-Free Segmentation on Continually
  Expanding Large Vocabularies
kNN-CLIP: Retrieval Enables Training-Free Segmentation on Continually Expanding Large Vocabularies
Zhongrui Gui
Shuyang Sun
Runjia Li
Jianhao Yuan
Zhaochong An
Karsten Roth
Ameya Prabhu
Philip H. S. Torr
VLM
CLL
19
6
0
15 Apr 2024
Human-in-the-Loop Segmentation of Multi-species Coral Imagery
Human-in-the-Loop Segmentation of Multi-species Coral Imagery
Scarlett Raine
Ross Marchant
Brano Kusy
Frederic Maire
Niko Suenderhauf
Tobias Fischer
28
3
0
15 Apr 2024
Probing the 3D Awareness of Visual Foundation Models
Probing the 3D Awareness of Visual Foundation Models
Mohamed El Banani
Amit Raj
Kevis-Kokitsi Maninis
Abhishek Kar
Yuanzhen Li
Michael Rubinstein
Deqing Sun
Leonidas J. Guibas
Justin Johnson
Varun Jampani
20
79
0
12 Apr 2024
Learning Embeddings with Centroid Triplet Loss for Object Identification
  in Robotic Grasping
Learning Embeddings with Centroid Triplet Loss for Object Identification in Robotic Grasping
Anas Gouda
Max Schwarz
Christopher Reining
Sven Behnke
Alice Kirchheim
VLM
20
0
0
09 Apr 2024
LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity
LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity
Walid Bousselham
Angie Boggust
Sofian Chaybouti
Hendrik Strobelt
Hilde Kuehne
83
10
0
04 Apr 2024
Masked Completion via Structured Diffusion with White-Box Transformers
Masked Completion via Structured Diffusion with White-Box Transformers
Druv Pai
Ziyang Wu
Sam Buchanan
Yaodong Yu
Yi-An Ma
19
12
0
03 Apr 2024
Situation Awareness for Driver-Centric Driving Style Adaptation
Situation Awareness for Driver-Centric Driving Style Adaptation
Johann Haselberger
Bonifaz Stuhr
Bernhard Schick
Steffen Müller
21
1
0
28 Mar 2024
Previous
12345
Next