Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.05845
Cited By
Visual Goal-Step Inference using wikiHow
12 April 2021
Yue Yang
Artemis Panagopoulou
Qing Lyu
Li Zhang
Mark Yatskar
Chris Callison-Burch
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Visual Goal-Step Inference using wikiHow"
32 / 32 papers shown
Title
Long-horizon Visual Instruction Generation with Logic and Attribute Self-reflection
Yucheng Suo
Fan Ma
Kaixin Shen
Linchao Zhu
Yi Yang
VLM
47
0
0
12 Mar 2025
RecipeGen: A Benchmark for Real-World Recipe Image Generation
Ruoxuan Zhang
Hongxia Xie
Yi Yao
Jian-Yu Jiang-Lin
Bin Wen
Ling Lo
Hong-Han Shuai
Yung-Hui Li
Wen-Huang Cheng
62
0
0
07 Mar 2025
VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks
Shailaja Keyur Sampat
Mutsumi Nakamura
Shankar Kailas
Kartik Aggarwal
Mandy Zhou
Yezhou Yang
Chitta Baral
MLLM
CoGe
ReLM
VLM
LRM
24
0
0
17 Oct 2024
ActionCOMET: A Zero-shot Approach to Learn Image-specific Commonsense Concepts about Actions
Shailaja Keyur Sampat
Yezhou Yang
Chitta Baral
LM&Ro
13
0
0
17 Oct 2024
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
Peng Xia
Siwei Han
Shi Qiu
Yiyang Zhou
Zhaoyang Wang
...
Chenhang Cui
Mingyu Ding
Linjie Li
Lijuan Wang
Huaxiu Yao
52
10
0
14 Oct 2024
WorldAPIs: The World Is Worth How Many APIs? A Thought Experiment
Jiefu Ou
Arda Uzunoglu
Benjamin Van Durme
Daniel Khashabi
LM&Ro
VGen
30
3
0
10 Jul 2024
Holistic Evaluation for Interleaved Text-and-Image Generation
Minqian Liu
Zhiyang Xu
Zihao Lin
Trevor Ashby
Joy Rimchala
Jiaxin Zhang
Lifu Huang
EGVM
36
7
0
20 Jun 2024
MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models
Tianle Gu
Zeyang Zhou
Kexin Huang
Dandan Liang
Yixu Wang
...
Keqing Wang
Yujiu Yang
Yan Teng
Yu Qiao
Yingchun Wang
ELM
42
9
0
11 Jun 2024
Coherent Zero-Shot Visual Instruction Generation
Quynh Phung
Songwei Ge
Jia-Bin Huang
47
2
0
06 Jun 2024
Many-to-many Image Generation with Auto-regressive Diffusion Models
Ying Shen
Yizhe Zhang
Shuangfei Zhai
Lifu Huang
J. Susskind
Jiatao Gu
38
6
0
03 Apr 2024
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Haoyu Lu
Wen Liu
Bo Zhang
Bing-Li Wang
Kai Dong
...
Yaofeng Sun
Chengqi Deng
Hanwei Xu
Zhenda Xie
Chong Ruan
VLM
24
286
0
08 Mar 2024
Generating Illustrated Instructions
Sachit Menon
Ishan Misra
Rohit Girdhar
DiffM
24
4
0
07 Dec 2023
MULTISCRIPT: Multimodal Script Learning for Supporting Open Domain Everyday Tasks
Jingyuan Qi
Minqian Liu
Ying Shen
Zhiyang Xu
Lifu Huang
LRM
VGen
27
2
0
08 Oct 2023
Benchmarking Procedural Language Understanding for Low-Resource Languages: A Case Study on Turkish
Arda Uzunouglu
Gözde Gül Sahin
13
4
0
13 Sep 2023
Rewriting the Script: Adapting Text Instructions for Voice Interaction
Alyssa Hwang
Natasha Oza
Chris Callison-Burch
Andrew Head
15
12
0
16 Jun 2023
Non-Sequential Graph Script Induction via Multimedia Grounding
Yu Zhou
Sha Li
Manling Li
Xudong Lin
Shih-Fu Chang
Mohit Bansal
Heng Ji
19
8
0
27 May 2023
OpenPI2.0: An Improved Dataset for Entity Tracking in Texts
Li Zhang
Hainiu Xu
Abhinav Kommula
Chris Callison-Burch
Niket Tandon
25
6
0
24 May 2023
Procedure-Aware Pretraining for Instructional Video Understanding
Honglu Zhou
Roberto Martín-Martín
Mubbasir Kapadia
Silvio Savarese
Juan Carlos Niebles
23
38
0
31 Mar 2023
Causal Reasoning of Entities and Events in Procedural Texts
Li Zhang
Hainiu Xu
Yue Yang
Shuyan Zhou
Weiqiu You
Manni Arora
Chris Callison-Burch
ReLM
LRM
21
35
0
26 Jan 2023
Learning Action-Effect Dynamics from Pairs of Scene-graphs
Shailaja Keyur Sampat
Pratyay Banerjee
Yezhou Yang
Chitta Baral
GNN
11
0
0
07 Dec 2022
Evaluating and Improving Factuality in Multimodal Abstractive Summarization
David Wan
Mohit Bansal
13
10
0
04 Nov 2022
Visualizing the Obvious: A Concreteness-based Ensemble Model for Noun Property Prediction
Yue Yang
Artemis Panagopoulou
Marianna Apidianaki
Mark Yatskar
Chris Callison-Burch
15
2
0
24 Oct 2022
Incorporating Task-specific Concept Knowledge into Script Learning
Chenkai Sun
Tie Xu
Chengxiang Zhai
Heng Ji
21
4
0
31 Aug 2022
Coalescing Global and Local Information for Procedural Text Understanding
Kaixin Ma
Filip Ilievski
Jonathan M Francis
Eric Nyberg
A. Oltramari
23
11
0
26 Aug 2022
Multimedia Generative Script Learning for Task Planning
Qingyun Wang
Manling Li
Hou Pong Chan
Lifu Huang
J. Hockenmaier
Girish Chowdhary
Heng Ji
VGen
19
10
0
25 Aug 2022
Reasoning about Actions over Visual and Linguistic Modalities: A Survey
Shailaja Keyur Sampat
Maitreya Patel
Subhasish Das
Yezhou Yang
Chitta Baral
ReLM
LM&Ro
LRM
8
12
0
15 Jul 2022
Schema-Guided Event Graph Completion
Hongwei Wang
Zixuan Zhang
Sha Li
Jiawei Han
Yizhou Sun
Hanghang Tong
Joseph P. Olive
Heng Ji
14
5
0
06 Jun 2022
Reasoning about Procedures with Natural Language Processing: A Tutorial
Li Zhang
AI4TS
22
9
0
16 May 2022
Show Me More Details: Discovering Hierarchies of Procedures from Semi-structured Web Data
Shuyan Zhou
Li Zhang
Yue Yang
Qing Lyu
Pengcheng Yin
Chris Callison-Burch
Graham Neubig
27
28
0
14 Mar 2022
Induce, Edit, Retrieve: Language Grounded Multimodal Schema for Instructional Video Retrieval
Yue Yang
Joongwon Kim
Artemis Panagopoulou
Mark Yatskar
Chris Callison-Burch
LM&Ro
14
14
0
17 Nov 2021
Understanding Multimodal Procedural Knowledge by Sequencing Multimodal Instructional Manuals
Te-Lin Wu
Alexander Spangher
Pegah Alipoormolabashi
Marjorie Freedman
R. Weischedel
Nanyun Peng
13
20
0
16 Oct 2021
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,724
0
26 Sep 2016
1