Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.18115
Cited By
Instructify: Demystifying Metadata to Visual Instruction Tuning Data Conversion
23 May 2025
Jacob A. Hansen
Wei Lin
Junmo Kang
M. Jehanzeb Mirza
Hongyin Luo
Rogerio Feris
Alan Ritter
James R. Glass
Leonid Karlinsky
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Instructify: Demystifying Metadata to Visual Instruction Tuning Data Conversion"
15 / 65 papers shown
Title
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
681
28,659
0
26 Feb 2021
VisualMRC: Machine Reading Comprehension on Document Images
Ryota Tanaka
Kyosuke Nishida
Sen Yoshida
51
143
0
27 Jan 2021
DocVQA: A Dataset for VQA on Document Images
Minesh Mathew
Dimosthenis Karatzas
C. V. Jawahar
97
700
0
01 Jul 2020
TextCaps: a Dataset for Image Captioning with Reading Comprehension
Oleksii Sidorov
Ronghang Hu
Marcus Rohrbach
Amanpreet Singh
56
406
0
24 Mar 2020
Connecting Vision and Language with Localized Narratives
Jordi Pont-Tuset
J. Uijlings
Soravit Changpinyo
Radu Soricut
V. Ferrari
ObjD
63
245
0
06 Dec 2019
Expressing Visual Relationships via Language
Hao Tan
Franck Dernoncourt
Zhe Lin
Trung Bui
Joey Tianyi Zhou
38
65
0
18 Jun 2019
OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge
Kenneth Marino
Mohammad Rastegari
Ali Farhadi
Roozbeh Mottaghi
57
1,050
0
31 May 2019
Towards VQA Models That Can Read
Amanpreet Singh
Vivek Natarajan
Meet Shah
Yu Jiang
Xinlei Chen
Dhruv Batra
Devi Parikh
Marcus Rohrbach
EgoV
50
1,174
0
18 Apr 2019
VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions
Qing Li
Qingyi Tao
Shafiq Joty
Jianfei Cai
Jiebo Luo
62
108
0
20 Mar 2018
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
297
3,187
0
02 Dec 2016
A Hierarchical Approach for Generating Descriptive Image Paragraphs
J. Krause
Justin Johnson
Ranjay Krishna
Li Fei-Fei
VLM
58
376
0
20 Nov 2016
A Diagram Is Worth A Dozen Images
Aniruddha Kembhavi
M. Salvato
Eric Kolve
Minjoon Seo
Hannaneh Hajishirzi
Ali Farhadi
3DV
36
472
0
24 Mar 2016
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
...
Yannis Kalantidis
Li Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
170
5,706
0
23 Feb 2016
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
Bryan A. Plummer
Liwei Wang
Christopher M. Cervantes
Juan C. Caicedo
Julia Hockenmaier
Svetlana Lazebnik
177
2,033
0
19 May 2015
Microsoft COCO Captions: Data Collection and Evaluation Server
Xinlei Chen
Hao Fang
Nayeon Lee
Ramakrishna Vedantam
Saurabh Gupta
Piotr Dollar
C. L. Zitnick
153
2,461
0
01 Apr 2015
Previous
1
2