ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.04870
  4. Cited By
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for
  Richer Image-to-Sentence Models
v1v2v3v4 (latest)

Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models

19 May 2015
Bryan A. Plummer
Liwei Wang
Christopher M. Cervantes
Juan C. Caicedo
Anjali Narayan-Chen
Svetlana Lazebnik
ArXiv (abs)PDFHTML

Papers citing "Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models"

50 / 1,325 papers shown
Box-based Refinement for Weakly Supervised and Unsupervised Localization
  Tasks
Box-based Refinement for Weakly Supervised and Unsupervised Localization TasksIEEE International Conference on Computer Vision (ICCV), 2023
Eyal Gomel
Tal Shaharabany
Lior Wolf
ObjD
351
6
0
07 Sep 2023
DetermiNet: A Large-Scale Diagnostic Dataset for Complex
  Visually-Grounded Referencing using Determiners
DetermiNet: A Large-Scale Diagnostic Dataset for Complex Visually-Grounded Referencing using DeterminersIEEE International Conference on Computer Vision (ICCV), 2023
Clarence Lee
M Ganesh Kumar
Cheston Tan
198
3
0
07 Sep 2023
A Joint Study of Phrase Grounding and Task Performance in Vision and
  Language Models
A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models
Noriyuki Kojima
Hadar Averbuch-Elor
Yoav Artzi
325
2
0
06 Sep 2023
S3C: Semi-Supervised VQA Natural Language Explanation via Self-Critical
  Learning
S3C: Semi-Supervised VQA Natural Language Explanation via Self-Critical LearningComputer Vision and Pattern Recognition (CVPR), 2023
Wei Suo
Mengyang Sun
Weisong Liu
Yi-Meng Gao
Peifeng Wang
Yanning Zhang
Qi Wu
LRM
204
11
0
05 Sep 2023
MultiWay-Adapater: Adapting large-scale multi-modal models for scalable
  image-text retrieval
MultiWay-Adapater: Adapting large-scale multi-modal models for scalable image-text retrievalIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Zijun Long
George Killick
R. McCreadie
Gerardo Aragon Camarasa
247
2
0
04 Sep 2023
Parameter and Computation Efficient Transfer Learning for
  Vision-Language Pre-trained Models
Parameter and Computation Efficient Transfer Learning for Vision-Language Pre-trained ModelsNeural Information Processing Systems (NeurIPS), 2023
Qiong Wu
Wei Yu
Weihao Ye
Shubin Huang
Xiaoshuai Sun
Rongrong Ji
VLM
261
11
0
04 Sep 2023
Contrastive Feature Masking Open-Vocabulary Vision Transformer
Contrastive Feature Masking Open-Vocabulary Vision TransformerIEEE International Conference on Computer Vision (ICCV), 2023
Dahun Kim
A. Angelova
Weicheng Kuo
ObjDVLM
339
38
0
02 Sep 2023
ViLTA: Enhancing Vision-Language Pre-training through Textual
  Augmentation
ViLTA: Enhancing Vision-Language Pre-training through Textual AugmentationIEEE International Conference on Computer Vision (ICCV), 2023
Weihan Wang
Zhiyong Yang
Bin Xu
Juanzi Li
Yankui Sun
VLM
289
9
0
31 Aug 2023
Catalog Phrase Grounding (CPG): Grounding of Product Textual Attributes
  in Product Images for e-commerce Vision-Language Applications
Catalog Phrase Grounding (CPG): Grounding of Product Textual Attributes in Product Images for e-commerce Vision-Language Applications
Wenyi Wu
Karim Bouyarmane
Ismail B. Tutar
63
2
0
30 Aug 2023
CoVR: Learning Composed Video Retrieval from Web Video Captions
CoVR: Learning Composed Video Retrieval from Web Video CaptionsIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Lucas Ventura
Antoine Yang
Cordelia Schmid
Gül Varol
438
21
0
28 Aug 2023
How to Evaluate the Generalization of Detection? A Benchmark for
  Comprehensive Open-Vocabulary Detection
How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary DetectionAAAI Conference on Artificial Intelligence (AAAI), 2023
Yi Yao
Peng Liu
Tiancheng Zhao
Qianqian Zhang
Jiajia Liao
Chunxin Fang
Kyusong Lee
Qing Wang
VLMObjD
201
17
0
25 Aug 2023
DLIP: Distilling Language-Image Pre-training
DLIP: Distilling Language-Image Pre-training
Huafeng Kuang
Jie Wu
Xiawu Zheng
Ming Li
Xuefeng Xiao
Rui Wang
Min Zheng
Rongrong Ji
VLM
150
6
0
24 Aug 2023
SCoRD: Subject-Conditional Relation Detection with Text-Augmented Data
SCoRD: Subject-Conditional Relation Detection with Text-Augmented DataIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Ziyan Yang
Kushal Kafle
Zhe Lin
Scott D. Cohen
Zhihong Ding
Vicente Ordonez
258
1
0
24 Aug 2023
Grounded Entity-Landmark Adaptive Pre-training for Vision-and-Language
  Navigation
Grounded Entity-Landmark Adaptive Pre-training for Vision-and-Language NavigationIEEE International Conference on Computer Vision (ICCV), 2023
Yibo Cui
Liang Xie
Yakun Zhang
Meishan Zhang
Ye Yan
Erwei Yin
LM&Ro
224
28
0
24 Aug 2023
InstructionGPT-4: A 200-Instruction Paradigm for Fine-Tuning MiniGPT-4
InstructionGPT-4: A 200-Instruction Paradigm for Fine-Tuning MiniGPT-4
Lai Wei
Zihao Jiang
Weiran Huang
Lichao Sun
VLMMLLM
325
75
0
23 Aug 2023
CgT-GAN: CLIP-guided Text GAN for Image Captioning
CgT-GAN: CLIP-guided Text GAN for Image CaptioningACM Multimedia (ACM MM), 2023
Jiarui Yu
Haoran Li
Y. Hao
B. Zhu
Tong Xu
Xiangnan He
VLMCLIP
229
24
0
23 Aug 2023
RefEgo: Referring Expression Comprehension Dataset from First-Person
  Perception of Ego4D
RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4DIEEE International Conference on Computer Vision (ICCV), 2023
Shuhei Kurita
Naoki Katsura
Eri Onami
EgoV
262
23
0
23 Aug 2023
EVE: Efficient Vision-Language Pre-training with Masked Prediction and
  Modality-Aware MoE
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoEAAAI Conference on Artificial Intelligence (AAAI), 2023
Junyi Chen
Longteng Guo
Jianxiang Sun
Shuai Shao
Zehuan Yuan
Liang Lin
Dongyu Zhang
MLLMVLMMoE
205
20
0
23 Aug 2023
GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive
  Language-Image Pre-training
GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-trainingIEEE International Conference on Computer Vision (ICCV), 2023
Xi Deng
Han Shi
Runhu Huang
Changlin Li
Hang Xu
Jianhua Han
James T. Kwok
Shen Zhao
Wei Zhang
Xiaodan Liang
CLIPVLM
211
3
0
22 Aug 2023
ConcatPlexer: Additional Dim1 Batching for Faster ViTs
ConcatPlexer: Additional Dim1 Batching for Faster ViTs
D. Han
Seunghyeon Seo
D. Jeon
Jiho Jang
Chaerin Kong
Nojun Kwak
ViTMoE
193
0
0
22 Aug 2023
VQA Therapy: Exploring Answer Differences by Visually Grounding Answers
VQA Therapy: Exploring Answer Differences by Visually Grounding AnswersIEEE International Conference on Computer Vision (ICCV), 2023
Chongyan Chen
Samreen Anjum
Danna Gurari
247
16
0
21 Aug 2023
On the Adversarial Robustness of Multi-Modal Foundation Models
On the Adversarial Robustness of Multi-Modal Foundation Models
Christian Schlarmann
Matthias Hein
AAML
378
139
0
21 Aug 2023
AltDiffusion: A Multilingual Text-to-Image Diffusion Model
AltDiffusion: A Multilingual Text-to-Image Diffusion ModelAAAI Conference on Artificial Intelligence (AAAI), 2023
Fulong Ye
Guangyi Liu
Xinya Wu
Ledell Yu Wu
VLM
309
47
0
19 Aug 2023
Tackling Vision Language Tasks Through Learning Inner Monologues
Tackling Vision Language Tasks Through Learning Inner MonologuesAAAI Conference on Artificial Intelligence (AAAI), 2023
Diji Yang
Kezhen Chen
Jinmeng Rao
Xiaoyuan Guo
Yawen Zhang
Jie Yang
Yujiao Shi
MLLM
234
14
0
19 Aug 2023
Artificial-Spiking Hierarchical Networks for Vision-Language
  Representation Learning
Artificial-Spiking Hierarchical Networks for Vision-Language Representation Learning
Ye-Ting Chen
Siyu Zhang
Yaoru Sun
Weijian Liang
Haoran Wang
213
3
0
18 Aug 2023
DiffDis: Empowering Generative Diffusion Model with Cross-Modal
  Discrimination Capability
DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination CapabilityIEEE International Conference on Computer Vision (ICCV), 2023
Runhu Huang
Jianhua Han
Guansong Lu
Xiaodan Liang
Yihan Zeng
Wei Zhang
Hang Xu
DiffM
171
8
0
18 Aug 2023
Language-Guided Diffusion Model for Visual Grounding
Language-Guided Diffusion Model for Visual Grounding
Sijia Chen
Baochun Li
655
6
0
18 Aug 2023
ALIP: Adaptive Language-Image Pre-training with Synthetic Caption
ALIP: Adaptive Language-Image Pre-training with Synthetic CaptionIEEE International Conference on Computer Vision (ICCV), 2023
Kaicheng Yang
Jiankang Deng
Xiang An
Jiawei Li
Ziyong Feng
Jia Guo
Jing Yang
Tongliang Liu
VLMCLIP
224
82
0
16 Aug 2023
Exploring Transfer Learning in Medical Image Segmentation using
  Vision-Language Models
Exploring Transfer Learning in Medical Image Segmentation using Vision-Language ModelsInternational Conference on Medical Imaging with Deep Learning (MIDL), 2023
K. Poudel
Manish Dhakal
Prasiddha Bhandari
Rabin Adhikari
Safal Thapaliya
Bishesh Khanal
VLM
564
32
0
15 Aug 2023
Vision-Language Dataset Distillation
Vision-Language Dataset Distillation
Xindi Wu
Byron Zhang
Zhiwei Deng
Olga Russakovsky
DDVLM
456
15
0
15 Aug 2023
Taming Self-Training for Open-Vocabulary Object Detection
Taming Self-Training for Open-Vocabulary Object DetectionComputer Vision and Pattern Recognition (CVPR), 2023
Shiyu Zhao
S. Schulter
Long Zhao
Zhixing Zhang
Vijay Kumar B.G
Yumin Suh
Manmohan Chandraker
Dimitris N. Metaxas
VLMObjD
375
21
0
11 Aug 2023
Foundation Model is Efficient Multimodal Multitask Model Selector
Foundation Model is Efficient Multimodal Multitask Model SelectorNeural Information Processing Systems (NeurIPS), 2023
Fanqing Meng
Wenqi Shao
Zhanglin Peng
Chong Jiang
Kaipeng Zhang
Yu Qiao
Ping Luo
175
21
0
11 Aug 2023
RegionBLIP: A Unified Multi-modal Pre-training Framework for Holistic
  and Regional Comprehension
RegionBLIP: A Unified Multi-modal Pre-training Framework for Holistic and Regional Comprehension
Qiang-feng Zhou
Chaohui Yu
Shaofeng Zhang
Sitong Wu
Zhibin Wang
Fan Wang
184
32
0
03 Aug 2023
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive
  Vision-Language Models
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
Anas Awadalla
Irena Gao
Josh Gardner
Jack Hessel
Yusuf Hanafy
...
Simon Kornblith
Pang Wei Koh
Gabriel Ilharco
Mitchell Wortsman
Ludwig Schmidt
MLLM
349
549
0
02 Aug 2023
VL-Grasp: a 6-Dof Interactive Grasp Policy for Language-Oriented Objects
  in Cluttered Indoor Scenes
VL-Grasp: a 6-Dof Interactive Grasp Policy for Language-Oriented Objects in Cluttered Indoor ScenesIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2023
Yuhao Lu
Yixuan Fan
Beixing Deng
Fan Liu
Yali Li
Shengjin Wang
269
59
0
01 Aug 2023
Exploring Annotation-free Image Captioning with Retrieval-augmented
  Pseudo Sentence Generation
Exploring Annotation-free Image Captioning with Retrieval-augmented Pseudo Sentence GenerationACM Multimedia Asia (MA), 2023
Zhiyuan Li
Dongnan Liu
Heng Wang
Chaoyi Zhang
Weidong (Tom) Cai
RALM
193
2
0
27 Jul 2023
Set-level Guidance Attack: Boosting Adversarial Transferability of
  Vision-Language Pre-training Models
Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training ModelsIEEE International Conference on Computer Vision (ICCV), 2023
Dong Lu
Zhiqiang Wang
Teng Wang
Weili Guan
Hongchang Gao
Feng Zheng
AAML
273
121
0
26 Jul 2023
3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding
3DRP-Net: 3D Relative Position-aware Network for 3D Visual GroundingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Zehan Wang
Haifeng Huang
Yang Zhao
Lin Li
Xize Cheng
Yichen Zhu
Aoxiong Yin
Zhou Zhao
3DPC
194
29
0
25 Jul 2023
Described Object Detection: Liberating Object Detection with Flexible
  Expressions
Described Object Detection: Liberating Object Detection with Flexible ExpressionsNeural Information Processing Systems (NeurIPS), 2023
Chi Xie
Zhao Zhang
YiXuan Wu
Feng Zhu
Rui Zhao
Shuang Liang
ObjD
243
51
0
24 Jul 2023
Iterative Robust Visual Grounding with Masked Reference based
  Centerpoint Supervision
Iterative Robust Visual Grounding with Masked Reference based Centerpoint Supervision
Menghao Li
Chunlei Wang
W. Feng
Shuchang Lyu
Guangliang Cheng
Xiangtai Li
Binghao Liu
Qi Zhao
276
7
0
23 Jul 2023
Advancing Visual Grounding with Scene Knowledge: Benchmark and Method
Advancing Visual Grounding with Scene Knowledge: Benchmark and MethodComputer Vision and Pattern Recognition (CVPR), 2023
Zhihong Chen
Ruifei Zhang
Yibing Song
Xiang Wan
Guanbin Li
181
30
0
21 Jul 2023
Embedded Heterogeneous Attention Transformer for Cross-lingual Image
  Captioning
Embedded Heterogeneous Attention Transformer for Cross-lingual Image CaptioningIEEE transactions on multimedia (IEEE TMM), 2023
Zijie Song
Zhenzhen Hu
Yuanen Zhou
Ye Zhao
Richang Hong
Meng Wang
209
19
0
19 Jul 2023
Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly
  Supervised 3D Visual Grounding
Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual GroundingIEEE International Conference on Computer Vision (ICCV), 2023
Zehan Wang
Haifeng Huang
Yang Zhao
Lin Li
Xize Cheng
Yichen Zhu
Aoxiong Yin
Zhou Zhao
194
29
0
18 Jul 2023
A Survey on Open-Vocabulary Detection and Segmentation: Past, Present,
  and Future
A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and FutureIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Chaoyang Zhu
Long Chen
ObjDVLM
511
72
0
18 Jul 2023
BUS:Efficient and Effective Vision-language Pre-training with Bottom-Up
  Patch Summarization
BUS:Efficient and Effective Vision-language Pre-training with Bottom-Up Patch SummarizationIEEE International Conference on Computer Vision (ICCV), 2023
Chaoya Jiang
Haiyang Xu
Wei Ye
Qinghao Ye
Chenliang Li
Mingshi Yan
Bin Bi
Shikun Zhang
Fei Huang
Songfang Huang
VLM
205
9
0
17 Jul 2023
Bootstrapping Vision-Language Learning with Decoupled Language
  Pre-training
Bootstrapping Vision-Language Learning with Decoupled Language Pre-trainingNeural Information Processing Systems (NeurIPS), 2023
Yiren Jian
Chongyang Gao
Soroush Vosoughi
VLMMLLM
389
44
0
13 Jul 2023
mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs
mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs
Gregor Geigle
Abhay Jain
Radu Timofte
Goran Glavaš
VLMMLLM
229
42
0
13 Jul 2023
GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic
  Manipulation
GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic ManipulationIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2023
Junghyun Kim
Gi-Cheon Kang
Suhyung Choi
Suyeon Shin
Byoung-Tak Zhang
LM&Ro
213
9
0
12 Jul 2023
Open-Vocabulary Object Detection via Scene Graph Discovery
Open-Vocabulary Object Detection via Scene Graph DiscoveryACM Multimedia (ACM MM), 2023
Hengcan Shi
Munawar Hayat
Jianfei Cai
ObjD
281
16
0
07 Jul 2023
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Shilong Zhang
Pei Sun
Shoufa Chen
Min Xiao
Wenqi Shao
Wenwei Zhang
Yu Liu
Kai-xiang Chen
Ping Luo
MLLMVLM
916
320
0
07 Jul 2023
Previous
123...121314...252627
Next
Page 13 of 27
Pageof 27