ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2301.02229
  4. Cited By
All in Tokens: Unifying Output Space of Visual Tasks via Soft Token

All in Tokens: Unifying Output Space of Visual Tasks via Soft Token

5 January 2023
Jia Ning
Chen Li
Zheng-Wei Zhang
Zigang Geng
Qi Dai
Kun He
Han Hu
ArXivPDFHTML

Papers citing "All in Tokens: Unifying Output Space of Visual Tasks via Soft Token"

9 / 9 papers shown
Title
VGLD: Visually-Guided Linguistic Disambiguation for Monocular Depth Scale Recovery
VGLD: Visually-Guided Linguistic Disambiguation for Monocular Depth Scale Recovery
Bojin Wu
Jing Chen
MDE
42
0
0
05 May 2025
Enhancing Monocular Depth Estimation with Multi-Source Auxiliary Tasks
Enhancing Monocular Depth Estimation with Multi-Source Auxiliary Tasks
Alessio Quercia
Erenus Yildiz
Zhuo Cao
Kai Krajsek
Abigail Morrison
Ira Assent
Hanno Scharr
48
0
0
22 Jan 2025
PolyMaX: General Dense Prediction with Mask Transformer
PolyMaX: General Dense Prediction with Mask Transformer
Xuan S. Yang
Liangzhe Yuan
Kimberly Wilber
Astuti Sharma
Xiuye Gu
...
Stephanie Debats
Huisheng Wang
Hartwig Adam
Mikhail Sirotenko
Liang-Chieh Chen
21
14
0
09 Nov 2023
The RoboDepth Challenge: Methods and Advancements Towards Robust Depth
  Estimation
The RoboDepth Challenge: Methods and Advancements Towards Robust Depth Estimation
Lingdong Kong
Yaru Niu
Shaoyuan Xie
Hanjiang Hu
Lai Xing Ng
...
Zhenyu Li
Runze Chen
Haiyong Luo
Fang Zhao
Jing Yu
19
13
0
27 Jul 2023
Revealing the Dark Secrets of Masked Image Modeling
Revealing the Dark Secrets of Masked Image Modeling
Zhenda Xie
Zigang Geng
Jingcheng Hu
Zheng-Wei Zhang
Han Hu
Yue Cao
VLM
186
105
0
26 May 2022
UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes
UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes
Alexander Kolesnikov
André Susano Pinto
Lucas Beyer
Xiaohua Zhai
Jeremiah Harmsen
N. Houlsby
103
67
0
20 May 2022
Pix2seq: A Language Modeling Framework for Object Detection
Pix2seq: A Language Modeling Framework for Object Detection
Ting-Li Chen
Saurabh Saxena
Lala Li
David J. Fleet
Geoffrey E. Hinton
MLLM
ViT
VLM
233
341
0
22 Sep 2021
Deep Ordinal Regression Network for Monocular Depth Estimation
Deep Ordinal Regression Network for Monocular Depth Estimation
Huan Fu
Mingming Gong
Chaohui Wang
Kayhan Batmanghelich
Dacheng Tao
MDE
180
1,687
0
06 Jun 2018
Semantic Understanding of Scenes through the ADE20K Dataset
Semantic Understanding of Scenes through the ADE20K Dataset
Bolei Zhou
Hang Zhao
Xavier Puig
Tete Xiao
Sanja Fidler
Adela Barriuso
Antonio Torralba
SSeg
246
1,817
0
18 Aug 2016
1