ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.13413
  4. Cited By
Vision Transformers for Dense Prediction

Vision Transformers for Dense Prediction

IEEE International Conference on Computer Vision (ICCV), 2021
24 March 2021
René Ranftl
Alexey Bochkovskiy
V. Koltun
    ViTMDE
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)Github (2138★)

Papers citing "Vision Transformers for Dense Prediction"

50 / 1,223 papers shown
Title
4DLangVGGT: 4D Language-Visual Geometry Grounded Transformer
4DLangVGGT: 4D Language-Visual Geometry Grounded Transformer
Xianfeng Wu
Yajing Bai
Minghan Li
Xianzu Wu
Xueqi Zhao
Zhongyuan Lai
Wenyu Liu
Xinggang Wang
3DGS
182
0
0
04 Dec 2025
MUT3R: Motion-aware Updating Transformer for Dynamic 3D Reconstruction
MUT3R: Motion-aware Updating Transformer for Dynamic 3D Reconstruction
Guole Shen
Tianchen Deng
Xingrui Qin
Nailin Wang
Jianyu Wang
Yanbo Wang
Yongtao Chen
Hesheng Wang
Jingchuan Wang
ViT
136
0
0
03 Dec 2025
Unique Lives, Shared World: Learning from Single-Life Videos
Unique Lives, Shared World: Learning from Single-Life Videos
Tengda Han
Sayna Ebrahimi
Dilara Gokay
Li Yang Ku
M. Ovsjanikov
...
Daniel Zoran
Viorica Patraucean
João Carreira
Andrew Zisserman
Dima Damen
156
0
0
03 Dec 2025
ReasonX: MLLM-Guided Intrinsic Image Decomposition
ReasonX: MLLM-Guided Intrinsic Image Decomposition
Alara Dirik
Tuanfeng Y. Wang
Duygu Ceylan
Stefanos Zafeiriou
Anna Frühstück
16
0
0
03 Dec 2025
Label-Efficient Hyperspectral Image Classification via Spectral FiLM Modulation of Low-Level Pretrained Diffusion Features
Label-Efficient Hyperspectral Image Classification via Spectral FiLM Modulation of Low-Level Pretrained Diffusion Features
Yuzhen Hu
Biplab Banerjee
Saurabh Prasad
44
1
0
03 Dec 2025
AVGGT: Rethinking Global Attention for Accelerating VGGT
AVGGT: Rethinking Global Attention for Accelerating VGGT
Xianbing Sun
Zhikai Zhu
Zhengyu Lou
Bo Yang
Jinyang Tang
Liqing Zhang
He Wang
Jianfu Zhang
ViTLRM
148
0
0
02 Dec 2025
FlashVGGT: Efficient and Scalable Visual Geometry Transformers with Compressed Descriptor Attention
FlashVGGT: Efficient and Scalable Visual Geometry Transformers with Compressed Descriptor Attention
Zipeng Wang
Dan Xu
ViT
88
0
0
01 Dec 2025
Learning What Helps: Task-Aligned Context Selection for Vision Tasks
Jingyu Guo
Emir Konuk
Fredrik Strand
Christos Matsoukas
Kevin Smith
56
0
0
29 Nov 2025
Shoe Style-Invariant and Ground-Aware Learning for Dense Foot Contact Estimation
Shoe Style-Invariant and Ground-Aware Learning for Dense Foot Contact Estimation
Daniel Sungho Jung
Kyoung Mu Lee
88
0
0
27 Nov 2025
Controllable 3D Object Generation with Single Image Prompt
Controllable 3D Object Generation with Single Image PromptInternational Conference on Pattern Recognition (ICPR), 2025
Jaeseok Lee
Jaekoo Lee
DiffM
85
1
0
27 Nov 2025
ColonAdapter: Geometry Estimation Through Foundation Model Adaptation for Colonoscopy
ColonAdapter: Geometry Estimation Through Foundation Model Adaptation for ColonoscopyIEEE Robotics and Automation Letters (IEEE RA-L), 2025
Zhiyi Jiang
Yifu Wang
Xuelian Cheng
Zongyuan Ge
56
0
0
27 Nov 2025
AMB3R: Accurate Feed-forward Metric-scale 3D Reconstruction with Backend
AMB3R: Accurate Feed-forward Metric-scale 3D Reconstruction with Backend
Hengyi Wang
Lourdes Agapito
85
0
0
25 Nov 2025
Training-Free Diffusion Priors for Text-to-Image Generation via Optimization-based Visual Inversion
Training-Free Diffusion Priors for Text-to-Image Generation via Optimization-based Visual Inversion
Samuele DellÉrba
Andrew D. Bagdanov
156
0
0
25 Nov 2025
Vision-Language Enhanced Foundation Model for Semi-supervised Medical Image Segmentation
Vision-Language Enhanced Foundation Model for Semi-supervised Medical Image Segmentation
Jiaqi Guo
Mingzhen Li
Hanyu Su
Santiago López
Lexiaozi Fan
Daniel Kim
Aggelos K. Katsaggelos
VLM
224
0
0
24 Nov 2025
4D-VGGT: A General Foundation Model with SpatioTemporal Awareness for Dynamic Scene Geometry Estimation
4D-VGGT: A General Foundation Model with SpatioTemporal Awareness for Dynamic Scene Geometry Estimation
Haonan Wang
Hanyu Zhou
Haoyue Liu
Luxin Yan
93
1
0
23 Nov 2025
Frequency-Adaptive Sharpness Regularization for Improving 3D Gaussian Splatting Generalization
Frequency-Adaptive Sharpness Regularization for Improving 3D Gaussian Splatting Generalization
Youngsik Yun
Dongjun Gu
Youngjung Uh
141
0
0
22 Nov 2025
NoPe-NeRF++: Local-to-Global Optimization of NeRF with No Pose Prior
NoPe-NeRF++: Local-to-Global Optimization of NeRF with No Pose Prior
D. Shi
Shen Cao
Bojian Wu
Jinhui Guo
Lubin Fan
Renjie Chen
Ligang Liu
Jieping Ye
124
0
0
21 Nov 2025
MuM: Multi-View Masked Image Modeling for 3D Vision
MuM: Multi-View Masked Image Modeling for 3D Vision
David Nordström
Johan Edstedt
Fredrik Kahl
Georg Bökman
188
0
0
21 Nov 2025
LEGO-SLAM: Language-Embedded Gaussian Optimization SLAM
Sibaek Lee
Seongbo Ha
Kyeongsu Kang
Joonyeol Choi
Seungjun Tak
Hyeonwoo Yu
3DGS
123
0
0
20 Nov 2025
NoPo-Avatar: Generalizable and Animatable Avatars from Sparse Inputs without Human Poses
Jing Wen
Alexander Schwing
Shenlong Wang
106
0
0
20 Nov 2025
CylinderDepth: Cylindrical Spatial Attention for Multi-View Consistent Self-Supervised Surround Depth Estimation
Samer Abualhanud
Christian Grannemann
Max Mehltretter
132
0
0
20 Nov 2025
Upsample Anything: A Simple and Hard to Beat Baseline for Feature Upsampling
Upsample Anything: A Simple and Hard to Beat Baseline for Feature Upsampling
Minseok Seo
Mark Hamilton
Changick Kim
228
0
0
20 Nov 2025
CuriGS: Curriculum-Guided Gaussian Splatting for Sparse View Synthesis
Zijian Wu
Mingfeng Jiang
Zidian Lin
Ying Song
Hanjie Ma
Qun Wu
Dongping Zhang
Guiyang Pu
3DGS
419
0
0
20 Nov 2025
RoMa v2: Harder Better Faster Denser Feature Matching
RoMa v2: Harder Better Faster Denser Feature Matching
Johan Edstedt
David Nordström
Yushan Zhang
Georg Bökman
Jonathan Astermark
Viktor Larsson
Anders Heyden
Fredrik Kahl
Mårten Wadenbäck
Michael Felsberg
3DV3DH
427
0
0
19 Nov 2025
EGSA-PT:Edge-Guided Spatial Attention with Progressive Training for Monocular Depth Estimation and Segmentation of Transparent Objects
EGSA-PT:Edge-Guided Spatial Attention with Progressive Training for Monocular Depth Estimation and Segmentation of Transparent Objects
Gbenga Omotara
Ramy M. A. Farag
Seyed Mohamad Ali Tousi
G.N. DeSouza
MDE
276
0
0
18 Nov 2025
PAVE: An End-to-End Dataset for Production Autonomous Vehicle Evaluation
PAVE: An End-to-End Dataset for Production Autonomous Vehicle Evaluation
Xiangyu Li
C. Wang
Yumao Liu
Dengbo He
J. Zhang
Ke Ma
71
0
0
18 Nov 2025
Geometry Meets Light: Leveraging Geometric Priors for Universal Photometric Stereo under Limited Multi-Illumination Cues
Geometry Meets Light: Leveraging Geometric Priors for Universal Photometric Stereo under Limited Multi-Illumination Cues
King-Man Tam
Satoshi Ikehata
Yuta Asano
Zhaoyi An
Rei Kawakami
80
0
0
17 Nov 2025
Depth Anything 3: Recovering the Visual Space from Any Views
Depth Anything 3: Recovering the Visual Space from Any Views
Haotong Lin
Sili Chen
Junhao Liew
Donny Y. Chen
Z. Li
Guang Shi
Jiashi Feng
Bingyi Kang
3DVVLMMDE
656
8
0
13 Nov 2025
Navigating the Wild: Pareto-Optimal Visual Decision-Making in Image Space
Navigating the Wild: Pareto-Optimal Visual Decision-Making in Image Space
Durgakant Pushp
Weizhe (Wesley) Chen
Zheng Chen
Chaomin Luo
Jason M. Gregory
Lantao Liu
84
0
0
11 Nov 2025
Sparse4DGS: 4D Gaussian Splatting for Sparse-Frame Dynamic Scene Reconstruction
Sparse4DGS: 4D Gaussian Splatting for Sparse-Frame Dynamic Scene Reconstruction
Changyue Shi
Chuxiao Yang
Xinyuan Hu
Minghao Chen
Wenwen Pan
Yan Yang
Jiajun Ding
Zhou Yu
Jun Yu
3DGS
108
1
0
10 Nov 2025
FlowFeat: Pixel-Dense Embedding of Motion Profiles
FlowFeat: Pixel-Dense Embedding of Motion Profiles
Nikita Araslanov
Anna Sonnweber
Daniel Cremers
MDE
355
1
0
10 Nov 2025
MUSE: Multi-Scale Dense Self-Distillation for Nucleus Detection and Classification
MUSE: Multi-Scale Dense Self-Distillation for Nucleus Detection and Classification
Zijiang Yang
Hanqing Chao
Bokai Zhao
Yelin Yang
Yunshuo Zhang
...
K. Yan
Dakai Jin
Minfeng Xu
Yun Bian
Hui Jiang
293
0
0
07 Nov 2025
GraspView: Active Perception Scoring and Best-View Optimization for Robotic Grasping in Cluttered Environments
GraspView: Active Perception Scoring and Best-View Optimization for Robotic Grasping in Cluttered Environments
Shenglin Wang
Mingtong Dai
Jingxuan Su
Lingbo Liu
C. Chen
X. Wu
Guanbin Li
109
0
0
06 Nov 2025
Diffusion-Guided Mask-Consistent Paired Mixing for Endoscopic Image Segmentation
Diffusion-Guided Mask-Consistent Paired Mixing for Endoscopic Image Segmentation
Pengyu Jie
Wanquan Liu
Rui He
Yihui Wen
Deyu Meng
Chenqiang Gao
DiffMMedIm
216
0
0
05 Nov 2025
Densemarks: Learning Canonical Embeddings for Human Heads Images via Point Tracks
Densemarks: Learning Canonical Embeddings for Human Heads Images via Point Tracks
Dmitrii Pozdeev
Alexey Artemov
A. Bhattarai
Artem Sevastopolsky
3DH
242
0
0
04 Nov 2025
Terrain-Enhanced Resolution-aware Refinement Attention for Off-Road Segmentation
Terrain-Enhanced Resolution-aware Refinement Attention for Off-Road Segmentation
Seongkyu Choi
Jhonghyun An
85
0
0
03 Nov 2025
SPADE: Sparsity Adaptive Depth Estimator for Zero-Shot, Real-Time, Monocular Depth Estimation in Underwater Environments
SPADE: Sparsity Adaptive Depth Estimator for Zero-Shot, Real-Time, Monocular Depth Estimation in Underwater Environments
Hongjie Zhang
Gideon Billings
Stefan Williams
MDE
180
0
0
29 Oct 2025
More Than Generation: Unifying Generation and Depth Estimation via Text-to-Image Diffusion Models
More Than Generation: Unifying Generation and Depth Estimation via Text-to-Image Diffusion Models
Hongkai Lin
Dingkang Liang
Mingyang Du
Xin Zhou
X. Bai
MoMeMDEVLM
499
0
0
27 Oct 2025
SwiftEmbed: Ultra-Fast Text Embeddings via Static Token Lookup for Real-Time Applications
SwiftEmbed: Ultra-Fast Text Embeddings via Static Token Lookup for Real-Time Applications
Edouard Lansiaux
Antoine Simonet
Eric Wiel
125
0
0
27 Oct 2025
WaveMAE: Wavelet decomposition Masked Auto-Encoder for Remote Sensing
WaveMAE: Wavelet decomposition Masked Auto-Encoder for Remote Sensing
Vittorio Bernuzzi
Leonardo Rossi
Tomaso Fontanini
Massimo Bertozzi
Andrea Prati
98
0
0
26 Oct 2025
IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction
IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction
Hao Li
Zhengyu Zou
Fangfu Liu
Xuanyang Zhang
Fangzhou Hong
...
Yushi Lan
Manyuan Zhang
Gang Yu
Dingwen Zhang
Ziwei Liu
ViT3DV
460
0
0
26 Oct 2025
Cross-view Localization and Synthesis -- Datasets, Challenges and Opportunities
Cross-view Localization and Synthesis -- Datasets, Challenges and Opportunities
N. Xu
R. Qin
DiffM
155
0
0
26 Oct 2025
EndoSfM3D: Learning to 3D Reconstruct Any Endoscopic Surgery Scene using Self-supervised Foundation Model
EndoSfM3D: Learning to 3D Reconstruct Any Endoscopic Surgery Scene using Self-supervised Foundation Model
Changhao Zhang
Matthew J. Clarkson
Mobarak I. Hoque
112
0
0
25 Oct 2025
S3OD: Towards Generalizable Salient Object Detection with Synthetic Data
S3OD: Towards Generalizable Salient Object Detection with Synthetic Data
Orest Kupyn
Hirokatsu Kataoka
Christian Rupprecht
120
1
0
24 Oct 2025
Unveiling the Spatial-temporal Effective Receptive Fields of Spiking Neural Networks
Unveiling the Spatial-temporal Effective Receptive Fields of Spiking Neural Networks
Jieyuan Zhang
Xiaolong Zhou
Shuai Wang
Wenjie Wei
Hanwen Liu
Qian Sun
Malu Zhang
Yang Yang
Haizhou Li
140
0
0
24 Oct 2025
PLANA3R: Zero-shot Metric Planar 3D Reconstruction via Feed-Forward Planar Splatting
PLANA3R: Zero-shot Metric Planar 3D Reconstruction via Feed-Forward Planar Splatting
Changkun Liu
Bin Tan
Zeran Ke
Shangzhan Zhang
Jiachen Liu
Ming Qian
Nan Xue
Yujun Shen
Tristan Braud
3DV
180
0
0
21 Oct 2025
Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views
Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views
Z. Chen
M. Zhang
Xinlei Yu
Xufang Luo
Mingze Sun
Zihao Pan
Yan Feng
Peng Pei
Xunliang Cai
Ruqi Huang
VGenLRM
160
7
0
21 Oct 2025
M2H: Multi-Task Learning with Efficient Window-Based Cross-Task Attention for Monocular Spatial Perception
M2H: Multi-Task Learning with Efficient Window-Based Cross-Task Attention for Monocular Spatial Perception
U.V.B.L Udugama
G. Vosselman
F. Nex
123
0
0
20 Oct 2025
Mapping Hidden Heritage: Self-supervised Pre-training for Archaeological Stone Wall Mapping in Historic Landscapes Using High-Resolution DEM Derivatives
Mapping Hidden Heritage: Self-supervised Pre-training for Archaeological Stone Wall Mapping in Historic Landscapes Using High-Resolution DEM Derivatives
Zexian Huang
Mashnoon Islam
Brian Armstrong
Billy Bell
K. Khoshelham
Martin Tomko
144
0
0
20 Oct 2025
Beyond RGB: Leveraging Vision Transformers for Thermal Weapon Segmentation
Beyond RGB: Leveraging Vision Transformers for Thermal Weapon Segmentation
Akhila Kambhatla
Ahmed R Khaled
ViT
64
0
0
19 Oct 2025
1234...232425
Next