Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2211.07912
Cited By
YORO -- Lightweight End to End Visual Grounding
15 November 2022
Chih-Hui Ho
Srikar Appalaraju
Bhavan A. Jasani
R. Manmatha
Nuno Vasconcelos
ObjD
Re-assign community
ArXiv
PDF
HTML
Papers citing
"YORO -- Lightweight End to End Visual Grounding"
22 / 22 papers shown
Title
SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding
Liangtao Shi
Ting Liu
Xiantao Hu
Yue Hu
Quanjun Yin
Richang Hong
ObjD
46
0
0
24 Feb 2025
MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension
Ting Liu
Zunnan Xu
Yue Hu
Liangtao Shi
Zhiqiang Wang
Quanjun Yin
57
2
0
03 Jan 2025
Towards Visual Grounding: A Survey
Linhui Xiao
Xiaoshan Yang
X. Lan
Yaowei Wang
Changsheng Xu
ObjD
44
3
0
31 Dec 2024
OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling
Linhui Xiao
Xiaoshan Yang
Fang Peng
Yaowei Wang
Changsheng Xu
ObjD
16
5
0
10 Oct 2024
HiVG: Hierarchical Multimodal Fine-grained Modulation for Visual Grounding
Linhui Xiao
Xiaoshan Yang
Fang Peng
Yaowei Wang
Changsheng Xu
ObjD
18
8
0
20 Apr 2024
Enhancing Vision-Language Pre-training with Rich Supervisions
Yuan Gao
Kunyu Shi
Pengkai Zhu
Edouard Belval
Oren Nuriel
Srikar Appalaraju
Shabnam Ghadar
Vijay Mahadevan
Zhuowen Tu
Stefano Soatto
VLM
CLIP
60
11
0
05 Mar 2024
Multiple-Question Multiple-Answer Text-VQA
Peng Tang
Srikar Appalaraju
R. Manmatha
Yusheng Xie
Vijay Mahadevan
44
5
0
15 Nov 2023
Single-Stage Visual Relationship Learning using Conditional Queries
Alakh Desai
Tz-Ying Wu
Subarna Tripathi
Nuno Vasconcelos
19
5
0
09 Jun 2023
Language Adaptive Weight Generation for Multi-task Visual Grounding
Wei Su
Peihan Miao
Huanzhang Dou
Gaoang Wang
Liang Qiao
Zheyang Li
Xi Li
ObjD
19
32
0
06 Jun 2023
DocFormerv2: Local Features for Document Understanding
Srikar Appalaraju
Peng Tang
Qi Dong
Nishant Sankaran
Yichu Zhou
R. Manmatha
16
39
0
02 Jun 2023
Benchmarking Diverse-Modal Entity Linking with Generative Models
Sijia Wang
A. Li
He Zhu
Shenmin Zhang
Chung-Wei Hang
...
William Wang
Zhiguo Wang
Vittorio Castelli
Bing Xiang
Patrick K. L. Ng
VLM
27
8
0
27 May 2023
CLIP-VG: Self-paced Curriculum Adapting of CLIP for Visual Grounding
Linhui Xiao
Xiaoshan Yang
Fang Peng
Ming Yan
Yaowei Wang
Changsheng Xu
ObjD
VLM
29
28
0
15 May 2023
Toward Unsupervised Realistic Visual Question Answering
Yuwei Zhang
Chih-Hui Ho
Nuno Vasconcelos
CoGe
8
2
0
09 Mar 2023
Unifying Vision-and-Language Tasks via Text Generation
Jaemin Cho
Jie Lei
Hao Tan
Mohit Bansal
MLLM
249
518
0
04 Feb 2021
Similarity Reasoning and Filtration for Image-Text Matching
Haiwen Diao
Ying Zhang
Lingyun Ma
Huchuan Lu
202
331
0
05 Jan 2021
Transformers in Vision: A Survey
Salman Khan
Muzammal Naseer
Munawar Hayat
Syed Waqas Zamir
F. Khan
M. Shah
ViT
222
2,404
0
04 Jan 2021
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
249
1,982
0
28 Jul 2020
Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation
Gen Luo
Yiyi Zhou
Xiaoshuai Sun
Liujuan Cao
Chenglin Wu
Cheng Deng
Rongrong Ji
ObjD
156
282
0
19 Mar 2020
Adaptive Offline Quintuplet Loss for Image-Text Matching
Tianlang Chen
Jiajun Deng
Jiebo Luo
166
68
0
07 Mar 2020
A Real-Time Cross-modality Correlation Filtering Method for Referring Expression Comprehension
Yue Liao
Si Liu
Guanbin Li
Fei-Yue Wang
Yanjie Chen
Chao Qian
Bo-wen Li
ObjD
62
174
0
16 Sep 2019
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Andrew G. Howard
Menglong Zhu
Bo Chen
Dmitry Kalenichenko
Weijun Wang
Tobias Weyand
M. Andreetto
Hartwig Adam
3DH
948
20,214
0
17 Apr 2017
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
141
1,458
0
06 Jun 2016
1