Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2108.08217
Cited By
X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics
18 August 2021
Yehao Li
Yingwei Pan
Jingwen Chen
Ting Yao
Tao Mei
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics"
12 / 12 papers shown
Title
SERPENT-VLM : Self-Refining Radiology Report Generation Using Vision Language Models
M. Kapadnis
Sohan Patnaik
Abhilash Nandy
Sourjyadip Ray
Pawan Goyal
Debdoot Sheet
VLM
25
3
0
27 Apr 2024
R2GenGPT: Radiology Report Generation with Frozen LLMs
Zhanyu Wang
Lingqiao Liu
Lei Wang
Luping Zhou
MedIm
LM&MA
VLM
20
64
0
18 Sep 2023
METransformer: Radiology Report Generation by Transformer with Multiple Learnable Expert Tokens
Zhanyu Wang
Lingqiao Liu
Lei Wang
Luping Zhou
MedIm
13
71
0
05 Apr 2023
Semantic-Conditional Diffusion Networks for Image Captioning
Jianjie Luo
Yehao Li
Yingwei Pan
Ting Yao
Jianlin Feng
Hongyang Chao
Tao Mei
DiffM
22
62
0
06 Dec 2022
LAVIS: A Library for Language-Vision Intelligence
Dongxu Li
Junnan Li
Hung Le
Guangsen Wang
Silvio Savarese
S. Hoi
VLM
113
51
0
15 Sep 2022
Sports Video Analysis on Large-Scale Data
Dekun Wu
Henghui Zhao
Xingce Bao
Richard P. Wildes
21
13
0
09 Aug 2022
Distinctive Image Captioning via CLIP Guided Group Optimization
Youyuan Zhang
Jiuniu Wang
Hao Wu
Wenjia Xu
VLM
24
8
0
08 Aug 2022
Boosting Video-Text Retrieval with Explicit High-Level Semantics
Haoran Wang
Di Xu
Dongliang He
Fu Li
Zhong Ji
Jungong Han
Errui Ding
16
11
0
08 Aug 2022
Long-term Leap Attention, Short-term Periodic Shift for Video Classification
H. M. Zhang
Lechao Cheng
Y. Hao
Chong-Wah Ngo
ViT
18
10
0
12 Jul 2022
Symmetric Network with Spatial Relationship Modeling for Natural Language-based Vehicle Retrieval
Chuyang Zhao
Haobo Chen
Wenyuan Zhang
Junru Chen
Sipeng Zhang
Yadong Li
Boxun Li
16
10
0
22 Jun 2022
Comprehending and Ordering Semantics for Image Captioning
Yehao Li
Yingwei Pan
Ting Yao
Tao Mei
13
87
0
14 Jun 2022
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
250
927
0
24 Sep 2019
1