Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2103.13413
Cited By

Vision Transformers for Dense Prediction

Vision Transformers for Dense Prediction

IEEE International Conference on Computer Vision (ICCV), 2021

24 March 2021

Alexey Bochkovskiy

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)Github (2138★)

Papers citing "Vision Transformers for Dense Prediction"

50 / 1,223 papers shown

JointSplat: Probabilistic Joint Flow-Depth Optimization for Sparse-View Gaussian Splatting

JointSplat: Probabilistic Joint Flow-Depth Optimization for Sparse-View Gaussian Splatting

238

1

0

04 Jun 2025

HumanRAM: Feed-forward Human Reconstruction and Animation Model using Transformers

HumanRAM: Feed-forward Human Reconstruction and Animation Model using Transformers

254

3

0

03 Jun 2025

Generative Perception of Shape and Material from Differential Motion

Generative Perception of Shape and Material from Differential Motion

Xinran Nicole Han

373

0

0

03 Jun 2025

Towards In-the-wild 3D Plane Reconstruction from a Single Image

Towards In-the-wild 3D Plane Reconstruction from a Single ImageComputer Vision and Pattern Recognition (CVPR), 2025

Sharon X. Huang

215

5

0

03 Jun 2025

SAB3R: Semantic-Augmented Backbone in 3D Reconstruction

SAB3R: Semantic-Augmented Backbone in 3D Reconstruction

314

2

0

02 Jun 2025

Rig3R: Rig-Aware Conditioning for Learned 3D Reconstruction

Rig3R: Rig-Aware Conditioning for Learned 3D Reconstruction

Prajwal Chidananda

Yasutaka Furukawa

245

5

0

02 Jun 2025

Flying Co-Stereo: Enabling Long-Range Aerial Dense Mapping via Collaborative Stereo Vision of Dynamic-Baseline

Flying Co-Stereo: Enabling Long-Range Aerial Dense Mapping via Collaborative Stereo Vision of Dynamic-Baseline

153

0

0

31 May 2025

UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation

UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation

215

4

0

30 May 2025

MaskAdapt: Unsupervised Geometry-Aware Domain Adaptation Using Multimodal Contextual Learning and RGB-Depth Masking

MaskAdapt: Unsupervised Geometry-Aware Domain Adaptation Using Multimodal Contextual Learning and RGB-Depth Masking

198

2

0

29 May 2025

SpatialSplat: Efficient Semantic 3D from Sparse Unposed Images

SpatialSplat: Efficient Semantic 3D from Sparse Unposed Images

278

6

0

29 May 2025

Bridging Geometric and Semantic Foundation Models for Generalized Monocular Depth Estimation

Bridging Geometric and Semantic Foundation Models for Generalized Monocular Depth Estimation

243

0

0

29 May 2025

Learning Fine-Grained Geometry for Sparse-View Splatting via Cascade Depth Loss

Learning Fine-Grained Geometry for Sparse-View Splatting via Cascade Depth Loss

168

0

0

28 May 2025

RenderFormer: Transformer-based Neural Rendering of Triangle Meshes with Global Illumination

RenderFormer: Transformer-based Neural Rendering of Triangle Meshes with Global Illumination

189

5

0

28 May 2025

CAST: Contrastive Adaptation and Distillation for Semi-Supervised Instance Segmentation

CAST: Contrastive Adaptation and Distillation for Semi-Supervised Instance Segmentation

503

0

0

28 May 2025

Styl3R: Instant 3D Stylized Reconstruction for Arbitrary Scenes and Styles

Styl3R: Instant 3D Stylized Reconstruction for Arbitrary Scenes and Styles

238

2

0

27 May 2025

DepthMatch: Semi-Supervised RGB-D Scene Parsing through Depth-Guided Regularization

DepthMatch: Semi-Supervised RGB-D Scene Parsing through Depth-Guided RegularizationIEEE Signal Processing Letters (IEEE SPL), 2025

Alexander Dvorkovich

250

4

0

26 May 2025

OmniGenBench: A Benchmark for Omnipotent Multimodal Generation across 50+ Tasks

OmniGenBench: A Benchmark for Omnipotent Multimodal Generation across 50+ Tasks

259

0

0

24 May 2025

Semantic segmentation with reward

Semantic segmentation with reward

506

0

0

23 May 2025

EMRA-proxy: Enhancing Multi-Class Region Semantic Segmentation in Remote Sensing Images with Attention Proxy

EMRA-proxy: Enhancing Multi-Class Region Semantic Segmentation in Remote Sensing Images with Attention Proxy

136

0

0

23 May 2025

MonoSplat: Generalizable 3D Gaussian Splatting from Monocular Depth Foundation Models

MonoSplat: Generalizable 3D Gaussian Splatting from Monocular Depth Foundation ModelsComputer Vision and Pattern Recognition (CVPR), 2025

357

8

0

21 May 2025

Diving into the Fusion of Monocular Priors for Generalized Stereo Matching

Diving into the Fusion of Monocular Priors for Generalized Stereo Matching

373

3

0

20 May 2025

Intra-class Patch Swap for Self-Distillation

Intra-class Patch Swap for Self-Distillation

283

0

0

20 May 2025

3D Visual Illusion Depth Estimation

3D Visual Illusion Depth Estimation

633

1

0

19 May 2025

VGGT-SLAM: Dense RGB SLAM Optimized on the SL(4) Manifold

VGGT-SLAM: Dense RGB SLAM Optimized on the SL(4) Manifold

381

41

0

18 May 2025

Always Clear Depth: Robust Monocular Depth Estimation under Adverse Weather

Always Clear Depth: Robust Monocular Depth Estimation under Adverse WeatherInternational Joint Conference on Artificial Intelligence (IJCAI), 2025

285

2

0

18 May 2025

FlowDreamer: A RGB-D World Model with Flow-based Motion Representations for Robot Manipulation

FlowDreamer: A RGB-D World Model with Flow-based Motion Representations for Robot Manipulation

288

8

0

15 May 2025

FreeDriveRF: Monocular RGB Dynamic NeRF without Poses for Autonomous Driving via Point-Level Dynamic-Static Decoupling

FreeDriveRF: Monocular RGB Dynamic NeRF without Poses for Autonomous Driving via Point-Level Dynamic-Static DecouplingIEEE International Conference on Robotics and Automation (ICRA), 2025

350

3

0

14 May 2025

Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image AnalysisIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025

Konrad Schindler

378

31

0

14 May 2025

MELLM: A Flow-Guided Large Language Model for Micro-Expression Understanding

MELLM: A Flow-Guided Large Language Model for Micro-Expression Understanding

388

3

0

11 May 2025

Camera-Only Bird's Eye View Perception: A Neural Approach to LiDAR-Free Environmental Mapping for Autonomous Vehicles

Camera-Only Bird's Eye View Perception: A Neural Approach to LiDAR-Free Environmental Mapping for Autonomous Vehicles

Anupkumar Bochare

105

0

0

09 May 2025

DiffusionSfM: Predicting Structure and Motion via Ray Origin and Endpoint Diffusion

DiffusionSfM: Predicting Structure and Motion via Ray Origin and Endpoint DiffusionComputer Vision and Pattern Recognition (CVPR), 2025

Shubham Tulsiani

402

7

0

08 May 2025

VGLD: Visually-Guided Linguistic Disambiguation for Monocular Depth Scale Recovery

VGLD: Visually-Guided Linguistic Disambiguation for Monocular Depth Scale Recovery

531

0

0

05 May 2025

Pixel3DMM: Versatile Screen-Space Priors for Single-Image 3D Face Reconstruction

Pixel3DMM: Versatile Screen-Space Priors for Single-Image 3D Face Reconstruction

Simon Giebenhain

Tobias Kirschstein

Lourdes Agapito

Matthias Nießner

376

7

0

01 May 2025

JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers

JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers

652

4

0

01 May 2025

Adept: Annotation-Denoising Auxiliary Tasks with Discrete Cosine Transform Map and Keypoint for Human-Centric Pretraining

Adept: Annotation-Denoising Auxiliary Tasks with Discrete Cosine Transform Map and Keypoint for Human-Centric Pretraining

415

1

0

29 Apr 2025

Joint Optimization of Neural Radiance Fields and Continuous Camera Motion from a Monocular Video

Joint Optimization of Neural Radiance Fields and Continuous Camera Motion from a Monocular VideoComputer Vision and Pattern Recognition (CVPR), 2025

Hoang Chuong Nguyen

Jose M. Alvarez

234

0

0

28 Apr 2025

Category-Level and Open-Set Object Pose Estimation for Robotics

Category-Level and Open-Set Object Pose Estimation for Robotics

Matthias Hirschmanner

200

0

0

28 Apr 2025

Leveraging Multi-Modal Saliency and Fusion for Gaze Target Detection

Leveraging Multi-Modal Saliency and Fusion for Gaze Target Detection

Athul M. Mathew

Arshad Ali Khan

400

2

0

27 Apr 2025

Examining the Impact of Optical Aberrations to Image Classification and Object Detection Models

Examining the Impact of Optical Aberrations to Image Classification and Object Detection ModelsIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025

Patrick Müller

Alexander Braun

278

2

0

25 Apr 2025

The Fourth Monocular Depth Estimation Challenge

The Fourth Monocular Depth Estimation Challenge

Ripudaman Singh Arora

...

Minh-Quang Nguyen

Muhammad Shahzad

971

4

0

24 Apr 2025

Federated EndoViT: Pretraining Vision Transformers via Federated Learning on Endoscopic Image Collections

Federated EndoViT: Pretraining Vision Transformers via Federated Learning on Endoscopic Image Collections

Alexander C. Jenke

Fiona Kolbinger

Oliver Saldanha

Jakob N. Kather

Stefanie Speidel

369

4

0

23 Apr 2025

SmallGS: Gaussian Splatting-based Camera Pose Estimation for Small-Baseline Videos

SmallGS: Gaussian Splatting-based Camera Pose Estimation for Small-Baseline Videos

316

2

0

22 Apr 2025

Landmark-Free Preoperative-to-Intraoperative Registration in Laparoscopic Liver Resection

Landmark-Free Preoperative-to-Intraoperative Registration in Laparoscopic Liver ResectionIEEE Transactions on Medical Imaging (IEEE TMI), 2025

346

3

0

21 Apr 2025

VistaDepth: Improving far-range Depth Estimation with Spectral Modulation and Adaptive Reweighting

VistaDepth: Improving far-range Depth Estimation with Spectral Modulation and Adaptive Reweighting

574

0

0

21 Apr 2025

Back on Track: Bundle Adjustment for Dynamic Scene Reconstruction

Back on Track: Bundle Adjustment for Dynamic Scene Reconstruction

Nikita Araslanov

333

10

0

20 Apr 2025

PRISM: A Unified Framework for Photorealistic Reconstruction and Intrinsic Scene Modeling

PRISM: A Unified Framework for Photorealistic Reconstruction and Intrinsic Scene Modeling

Tuanfeng Y. Wang

Stefanos Zafeiriou

Anna Frühstück

242

5

0

19 Apr 2025

Visual Consensus Prompting for Co-Salient Object Detection

Visual Consensus Prompting for Co-Salient Object DetectionComputer Vision and Pattern Recognition (CVPR), 2025

219

2

0

19 Apr 2025

Mono3R: Exploiting Monocular Cues for Geometric 3D Reconstruction

Mono3R: Exploiting Monocular Cues for Geometric 3D Reconstruction

352

3

0

18 Apr 2025

LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models

LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models

Volodymyr Havrylov

215

9

0

18 Apr 2025

St4RTrack: Simultaneous 4D Reconstruction and Tracking in the World

St4RTrack: Simultaneous 4D Reconstruction and Tracking in the World

Michael J. Black

Angjoo Kanazawa

328

38

0

17 Apr 2025

1 2 3 4 5 6...23 24 25