ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.13413
  4. Cited By
Vision Transformers for Dense Prediction

Vision Transformers for Dense Prediction

IEEE International Conference on Computer Vision (ICCV), 2021
24 March 2021
René Ranftl
Alexey Bochkovskiy
V. Koltun
    ViTMDE
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)Github (2138★)

Papers citing "Vision Transformers for Dense Prediction"

50 / 1,221 papers shown
Title
Global-Local Path Networks for Monocular Depth Estimation with Vertical
  CutDepth
Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth
Doyeon Kim
Woonghyun Ka
Pyunghwan Ahn
Donggyu Joo
S. Chun
Junmo Kim
MDE
270
151
0
19 Jan 2022
SwinUNet3D -- A Hierarchical Architecture for Deep Traffic Prediction
  using Shifted Window Transformers
SwinUNet3D -- A Hierarchical Architecture for Deep Traffic Prediction using Shifted Window Transformers
Alabi Bojesomo
Hasan Al Marzouqi
P. Liatsis
ViT
124
6
0
17 Jan 2022
Domain Adaptation via Bidirectional Cross-Attention Transformer
Domain Adaptation via Bidirectional Cross-Attention Transformer
Xiyu Wang
Pengxin Guo
Yu Zhang
ViT
124
26
0
15 Jan 2022
A Survey on RGB-D Datasets
A Survey on RGB-D DatasetsComputer Vision and Image Understanding (CVIU), 2022
Alexandre Lopes
Roberto Souza
Hélio Pedrini
3DVMDE
321
40
0
15 Jan 2022
Language-driven Semantic Segmentation
Language-driven Semantic SegmentationInternational Conference on Learning Representations (ICLR), 2022
Boyi Li
Kilian Q. Weinberger
Serge Belongie
V. Koltun
René Ranftl
VLM
315
774
0
10 Jan 2022
QuadTree Attention for Vision Transformers
QuadTree Attention for Vision TransformersInternational Conference on Learning Representations (ICLR), 2022
Shitao Tang
Jiahui Zhang
Siyu Zhu
Ping Tan
ViT
443
184
0
08 Jan 2022
THE Benchmark: Transferable Representation Learning for Monocular Height
  Estimation
THE Benchmark: Transferable Representation Learning for Monocular Height EstimationIEEE Transactions on Geoscience and Remote Sensing (TGRS), 2021
Zhitong Xiong
Wei Huang
Jingtao Hu
Xiao Xiang Zhu
183
24
0
30 Dec 2021
Learning Generative Vision Transformer with Energy-Based Latent Space
  for Saliency Prediction
Learning Generative Vision Transformer with Energy-Based Latent Space for Saliency PredictionNeural Information Processing Systems (NeurIPS), 2021
Jing Zhang
Jianwen Xie
Nick Barnes
Ping Li
ViT
229
107
0
27 Dec 2021
Multi-View Depth Estimation by Fusing Single-View Depth Probability with
  Multi-View Geometry
Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry
Gwangbin Bae
Ignas Budvytis
R. Cipolla
3DV
196
73
0
15 Dec 2021
E-CRF: Embedded Conditional Random Field for Boundary-caused Class
  Weights Confusion in Semantic Segmentation
E-CRF: Embedded Conditional Random Field for Boundary-caused Class Weights Confusion in Semantic Segmentation
Jie Zhu
Huabin Huang
Banghuai Li
Leye Wang
177
15
0
14 Dec 2021
Stereoscopic Universal Perturbations across Different Architectures and
  Datasets
Stereoscopic Universal Perturbations across Different Architectures and Datasets
Z. Berger
Parth T. Agrawal
Tianlin Liu
Stefano Soatto
A. Wong
AAML
261
24
0
12 Dec 2021
DualFormer: Local-Global Stratified Transformer for Efficient Video
  Recognition
DualFormer: Local-Global Stratified Transformer for Efficient Video Recognition
Yuxuan Liang
Pan Zhou
Roger Zimmermann
Shuicheng Yan
ViT
168
24
0
09 Dec 2021
Unsupervised Domain Adaptation for Semantic Image Segmentation: a
  Comprehensive Survey
Unsupervised Domain Adaptation for Semantic Image Segmentation: a Comprehensive Survey
G. Csurka
Riccardo Volpi
Boris Chidlovskii
OODVLM3DV
230
43
0
06 Dec 2021
GETAM: Gradient-weighted Element-wise Transformer Attention Map for
  Weakly-supervised Semantic segmentation
GETAM: Gradient-weighted Element-wise Transformer Attention Map for Weakly-supervised Semantic segmentation
Weixuan Sun
Jing Zhang
Zheyuan Liu
Yiran Zhong
Nick Barnes
ViT
219
15
0
06 Dec 2021
PolyphonicFormer: Unified Query Learning for Depth-aware Video Panoptic
  Segmentation
PolyphonicFormer: Unified Query Learning for Depth-aware Video Panoptic SegmentationEuropean Conference on Computer Vision (ECCV), 2021
Haobo Yuan
Xiangtai Li
Jianlong Wu
Guangliang Cheng
Jing Zhang
Yunhai Tong
Lefei Zhang
Dacheng Tao
MDE
295
48
0
05 Dec 2021
Toward Practical Monocular Indoor Depth Estimation
Toward Practical Monocular Indoor Depth EstimationComputer Vision and Pattern Recognition (CVPR), 2021
Cho-Ying Wu
Jialiang Wang
Michael Hall
Ulrich Neumann
Shuochen Su
3DVMDE
241
84
0
04 Dec 2021
Machine Learning Subsystem for Autonomous Collision Avoidance on a small
  UAS with Embedded GPU
Machine Learning Subsystem for Autonomous Collision Avoidance on a small UAS with Embedded GPU
Nicholas Polosky
Tyler Gwin
Sean Furman
Parth Barhanpurkar
Jithin Jagannath
127
8
0
03 Dec 2021
Object-aware Monocular Depth Prediction with Instance Convolutions
Object-aware Monocular Depth Prediction with Instance Convolutions
Enis Simsar
Evin Pınar Örnek
Fabian Manhardt
Helisa Dhamo
Nassir Navab
F. Tombari
3DHMDE
198
2
0
02 Dec 2021
3D Photo Stylization: Learning to Generate Stylized Novel Views from a
  Single Image
3D Photo Stylization: Learning to Generate Stylized Novel Views from a Single Image
Fangzhou Mu
Jian Wang
Yichen Wu
Yin Li
DiffM3DH
229
56
0
30 Nov 2021
360MonoDepth: High-Resolution 360° Monocular Depth Estimation
360MonoDepth: High-Resolution 360° Monocular Depth Estimation
M. Rey-Area
Mingze Yuan
Christian Richardt
MDE
388
94
0
30 Nov 2021
AdaViT: Adaptive Vision Transformers for Efficient Image Recognition
AdaViT: Adaptive Vision Transformers for Efficient Image Recognition
Lingchen Meng
Hengduo Li
Bor-Chun Chen
Shiyi Lan
Zuxuan Wu
Yu-Gang Jiang
Ser-Nam Lim
ViT
219
292
0
30 Nov 2021
PlantStereo: A Stereo Matching Benchmark for Plant Surface Dense
  Reconstruction
PlantStereo: A Stereo Matching Benchmark for Plant Surface Dense Reconstruction
Qingyun Wang
Baojian Ma
Wei Liu
Ming Lou
Mingchuan Zhou
Huanyu Jiang
Y. Ying
3DV
187
0
0
30 Nov 2021
Pyramid Adversarial Training Improves ViT Performance
Pyramid Adversarial Training Improves ViT Performance
Charles Herrmann
Kyle Sargent
Lu Jiang
Ramin Zabih
Huiwen Chang
Ce Liu
Dilip Krishnan
Deqing Sun
ViT
257
63
0
30 Nov 2021
TransMVSNet: Global Context-aware Multi-view Stereo Network with
  Transformers
TransMVSNet: Global Context-aware Multi-view Stereo Network with TransformersComputer Vision and Pattern Recognition (CVPR), 2021
Yikang Ding
Wentao Yuan
Qingtian Zhu
Haotian Zhang
Xiangyue Liu
Yuanjiang Wang
Xiao Liu
ViT
180
247
0
29 Nov 2021
The Implicit Values of A Good Hand Shake: Handheld Multi-Frame Neural
  Depth Refinement
The Implicit Values of A Good Hand Shake: Handheld Multi-Frame Neural Depth RefinementComputer Vision and Pattern Recognition (CVPR), 2021
Ilya Chugunov
Yuxuan Zhang
Zhihao Xia
Xuaner
Cecilia Zhang
Jiawen Chen
Felix Heide
3DHMDE
249
15
0
26 Nov 2021
SWAT: Spatial Structure Within and Among Tokens
SWAT: Spatial Structure Within and Among TokensInternational Joint Conference on Artificial Intelligence (IJCAI), 2021
Kumara Kahatapitiya
Michael S. Ryoo
228
7
0
26 Nov 2021
Scene Representation Transformer: Geometry-Free Novel View Synthesis
  Through Set-Latent Scene Representations
Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene RepresentationsComputer Vision and Pattern Recognition (CVPR), 2021
Mehdi S. M. Sajjadi
H. Meyer
Etienne Pot
Urs M. Bergmann
Klaus Greff
...
Daniel Duckworth
Alexey Dosovitskiy
Jakob Uszkoreit
Thomas Funkhouser
Andrea Tagliasacchi
ViT
349
229
0
25 Nov 2021
Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing
Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing
Xiaoxue Chen
Tianyu Liu
Hao Zhao
Guyue Zhou
Ya Zhang
280
26
0
24 Nov 2021
Distortion Reduction for Off-Center Perspective Projection of Panoramas
Distortion Reduction for Off-Center Perspective Projection of Panoramas
Chi-Han Peng
Jiayao Zhang
MDE
90
0
0
23 Nov 2021
Monocular Road Planar Parallax Estimation
Monocular Road Planar Parallax EstimationIEEE Transactions on Image Processing (TIP), 2021
Haobo Yuan
Teng Chen
Wei Sui
Jiafeng Xie
Lefei Zhang
Yuan Li
Qian Zhang
148
5
0
22 Nov 2021
Topological Regularization for Dense Prediction
Topological Regularization for Dense PredictionInternational Conference on Machine Learning and Applications (ICMLA), 2021
Deqing Fu
Bradley J. Nelson
MDE
111
0
0
22 Nov 2021
Towards Comprehensive Monocular Depth Estimation: Multiple Heads Are
  Better Than One
Towards Comprehensive Monocular Depth Estimation: Multiple Heads Are Better Than One
Shuwei Shao
Ran Li
Z. Pei
Zhong Liu
Weihai Chen
Wentao Zhu
Xingming Wu
Baochang Zhang
ViTMDE
117
18
0
16 Nov 2021
Beyond Mono to Binaural: Generating Binaural Audio from Mono Audio with
  Depth and Cross Modal Attention
Beyond Mono to Binaural: Generating Binaural Audio from Mono Audio with Depth and Cross Modal AttentionIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2021
Kranti K. Parida
Siddharth Srivastava
Gaurav Sharma
MDE
180
27
0
15 Nov 2021
Online Mutual Adaptation of Deep Depth Prediction and Visual SLAM
Online Mutual Adaptation of Deep Depth Prediction and Visual SLAM
S. Loo
M. Shakeri
S. Tang
S. Mashohor
Hong Zhang
MDE
148
6
0
07 Nov 2021
Body Size and Depth Disambiguation in Multi-Person Reconstruction from
  Single Images
Body Size and Depth Disambiguation in Multi-Person Reconstruction from Single ImagesInternational Conference on 3D Vision (3DV), 2021
Nicolas Ugrinovic
Adria Ruiz
Antonio Agudo
Alberto Sanfeliu
Francesc Moreno-Noguer
3DH
191
11
0
02 Nov 2021
Transformers for prompt-level EMA non-response prediction
Transformers for prompt-level EMA non-response prediction
Supriya Nagesh
Alexander Moreno
Stephanie M Carpenter
Jamie Yap
Soujanya Chatterjee
...
Santosh Kumar
Cho Lam
D. Wetter
Inbal Nahum-Shani
James M. Rehg
93
1
0
01 Nov 2021
HRFormer: High-Resolution Transformer for Dense Prediction
HRFormer: High-Resolution Transformer for Dense Prediction
Yuhui Yuan
Rao Fu
Lang Huang
Weihong Lin
Chao Zhang
Xilin Chen
Jingdong Wang
ViT
277
296
0
18 Oct 2021
Learning multiplane images from single views with self-supervision
Learning multiplane images from single views with self-supervision
Gustavo Sutter P. Carvalho
D. Luvizon
Antonio Joia Neto
André G. C. Pacheco
O. A. B. Penatti
SSL
194
1
0
18 Oct 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
884
1,445
0
13 Oct 2021
Dense Uncertainty Estimation
Dense Uncertainty Estimation
Jing Zhang
Yuchao Dai
Mochu Xiang
Deng-Ping Fan
Peyman Moghadam
Mingyi He
Christian J. Walder
Kaihao Zhang
Mehrtash Harandi
Nick Barnes
UQCVBDL
263
12
0
13 Oct 2021
Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision
  Datasets from 3D Scans
Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets from 3D ScansIEEE International Conference on Computer Vision (ICCV), 2021
Ainaz Eftekhar
Alexander Sax
Roman Bachmann
Jitendra Malik
Amir Zamir
MedIm
358
381
0
11 Oct 2021
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision
  Transformer
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer
Sachin Mehta
Mohammad Rastegari
ViT
584
1,827
0
05 Oct 2021
Vision Transformer Hashing for Image Retrieval
Vision Transformer Hashing for Image RetrievalIEEE International Conference on Multimedia and Expo (ICME), 2021
S. Dubey
S. Singh
Wei Chu
ViT
196
67
0
26 Sep 2021
Improving 360 Monocular Depth Estimation via Non-local Dense Prediction
  Transformer and Joint Supervised and Self-supervised Learning
Improving 360 Monocular Depth Estimation via Non-local Dense Prediction Transformer and Joint Supervised and Self-supervised LearningAAAI Conference on Artificial Intelligence (AAAI), 2021
I. Yun
Hyuk-Jae Lee
Chae-Eun Rhee
ViTMDE
239
33
0
22 Sep 2021
TNS: Terrain Traversability Mapping and Navigation System for Autonomous
  Excavators
TNS: Terrain Traversability Mapping and Navigation System for Autonomous Excavators
Tianrui Guan
Zhenpeng He
Ruitao Song
Tianyi Zhou
Liangjun Zhang
318
48
0
13 Sep 2021
CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation
CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation
Tongkun Xu
Weihua Chen
Pichao Wang
Fan Wang
Hao Li
Rong Jin
ViT
526
273
0
13 Sep 2021
Scaled ReLU Matters for Training Vision Transformers
Scaled ReLU Matters for Training Vision TransformersAAAI Conference on Artificial Intelligence (AAAI), 2021
Pichao Wang
Qingsong Wen
Haowen Luo
Jingkai Zhou
Zhipeng Zhou
Fan Wang
Hao Li
Rong Jin
220
51
0
08 Sep 2021
Multi-Task Self-Training for Learning General Representations
Multi-Task Self-Training for Learning General RepresentationsIEEE International Conference on Computer Vision (ICCV), 2021
Golnaz Ghiasi
Barret Zoph
E. D. Cubuk
Quoc V. Le
Nayeon Lee
SSL
180
111
0
25 Aug 2021
Lightweight Monocular Depth with a Novel Neural Architecture Search
  Method
Lightweight Monocular Depth with a Novel Neural Architecture Search MethodIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2021
Lam Huynh
Phong H. Nguyen
Jirí Matas
Esa Rahtu
J. Heikkilä
168
11
0
25 Aug 2021
Monocular Depth Estimation Primed by Salient Point Detection and
  Normalized Hessian Loss
Monocular Depth Estimation Primed by Salient Point Detection and Normalized Hessian LossInternational Conference on 3D Vision (3DV), 2021
Lam Huynh
Matteo Pedone
Phong H. Nguyen
Jirí Matas
Esa Rahtu
J. Heikkilä
MDE3DPC
170
4
0
25 Aug 2021
Previous
123...232425
Next