ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.00031
  4. Cited By
Text-image Alignment for Diffusion-based Perception

Text-image Alignment for Diffusion-based Perception

29 September 2023
Neehar Kondapaneni
Markus Marks
Manuel Knott
Rogério Guimarães
Pietro Perona
    VLM
    DiffM
ArXivPDFHTML

Papers citing "Text-image Alignment for Diffusion-based Perception"

11 / 11 papers shown
Title
VGLD: Visually-Guided Linguistic Disambiguation for Monocular Depth Scale Recovery
VGLD: Visually-Guided Linguistic Disambiguation for Monocular Depth Scale Recovery
Bojin Wu
Jing Chen
MDE
42
0
0
05 May 2025
Segment Any-Quality Images with Generative Latent Space Enhancement
Segment Any-Quality Images with Generative Latent Space Enhancement
Guangqian Guo
Yoong Guo
Xuehui Yu
Wenbo Li
Yaoxing Wang
Shan Gao
VLM
72
0
0
16 Mar 2025
MedDiff-FM: A Diffusion-based Foundation Model for Versatile Medical
  Image Applications
MedDiff-FM: A Diffusion-based Foundation Model for Versatile Medical Image Applications
Yongrui Yu
Yannian Gu
S. Zhang
Xiaofan Zhang
MedIm
36
2
0
20 Oct 2024
EDADepth: Enhanced Data Augmentation for Monocular Depth Estimation
EDADepth: Enhanced Data Augmentation for Monocular Depth Estimation
Nischal Khanal
Shivanand Venkanna Sheshappanavar
MDE
29
0
0
10 Sep 2024
iSeg: An Iterative Refinement-based Framework for Training-free
  Segmentation
iSeg: An Iterative Refinement-based Framework for Training-free Segmentation
Lin Sun
Jiale Cao
J. Xie
F. Khan
Yanwei Pang
DiffM
30
1
0
05 Sep 2024
InvPT++: Inverted Pyramid Multi-Task Transformer for Visual Scene
  Understanding
InvPT++: Inverted Pyramid Multi-Task Transformer for Visual Scene Understanding
Hanrong Ye
Dan Xu
ViT
16
10
0
08 Jun 2023
Unleashing Text-to-Image Diffusion Models for Visual Perception
Unleashing Text-to-Image Diffusion Models for Visual Perception
Wenliang Zhao
Yongming Rao
Zuyan Liu
Benlin Liu
Jie Zhou
Jiwen Lu
ObjD
VLM
MDE
158
213
0
03 Mar 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
247
4,223
0
30 Jan 2023
Decoupled Adaptation for Cross-Domain Object Detection
Decoupled Adaptation for Cross-Domain Object Detection
Junguang Jiang
Baixu Chen
Jianmin Wang
Mingsheng Long
ObjD
49
42
0
06 Oct 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
3,683
0
11 Feb 2021
Unbiased Mean Teacher for Cross-domain Object Detection
Unbiased Mean Teacher for Cross-domain Object Detection
Jinhong Deng
Wen Li
Yuhua Chen
Lixin Duan
73
285
0
02 Mar 2020
1