Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2201.05078
Cited By
v1
v2 (latest)
CLIP-Event: Connecting Text and Images with Event Structures
Computer Vision and Pattern Recognition (CVPR), 2022
13 January 2022
Pengfei Yu
Ruochen Xu
Shuohang Wang
Luowei Zhou
Xudong Lin
Chenguang Zhu
Michael Zeng
Heng Ji
Shih-Fu Chang
VLM
CLIP
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"CLIP-Event: Connecting Text and Images with Event Structures"
50 / 73 papers shown
Revisiting Theory of Contrastive Learning for Domain Generalization
Ali Alvandi
Mina Rezaei
OOD
SSL
267
0
0
02 Dec 2025
InfMasking: Unleashing Synergistic Information by Contrastive Multimodal Interactions
Liangjian Wen
Qun Dai
Jianzhuang Liu
Jiangtao Zheng
Yong Dai
Dongkai Wang
Zhao Kang
Jun Wang
Z. Xu
Jiang Duan
319
1
0
28 Sep 2025
PRISM: Phase-enhanced Radial-based Image Signature Mapping framework for fingerprinting AI-generated images
Emanuele Ricco
Elia Onofri
Lorenzo Cima
S. Cresci
Roberto Di Pietro
184
0
0
18 Sep 2025
Benchmarking and Improving LVLMs on Event Extraction from Multimedia Documents
Fuyu Xing
Zimu Wang
Wei Wang
Haiyang Zhang
VLM
126
2
0
16 Sep 2025
The Demon is in Ambiguity: Revisiting Situation Recognition with Single Positive Multi-Label Learning
Yiming Lin
Yuchen Niu
Shang Wang
K. Huang
Qiufeng Wang
Xiao-Bo Jin
168
0
0
29 Aug 2025
Logic Unseen: Revealing the Logical Blindspots of Vision-Language Models
Yuchen Zhou
Jiayu Tang
Shuo Yang
Xiaoyan Xiao
Yuqin Dai
Wenhao Yang
Chao Gou
Xiaobo Xia
Tat-Seng Chua
VLM
CoGe
LRM
173
2
0
15 Aug 2025
Towards Robust Evaluation of Visual Activity Recognition: Resolving Verb Ambiguity with Sense Clustering
Louie Hong Yao
Nicholas Jarvis
Tianyu Jiang
159
0
0
07 Aug 2025
Merging Smarter, Generalizing Better: Enhancing Model Merging on OOD Data
Bingjie Zhang
Hongkang Li
Changlong Shi
Guowei Rong
He Zhao
Dongsheng Wang
Dandan Guo
Meng Wang
MoMe
333
1
0
10 Jun 2025
BiMa: Towards Biases Mitigation for Text-Video Retrieval via Scene Element Guidance
Huy Le
Nhat Chung
Tung Kieu
A. Nguyen
Ngan Le
521
3
0
04 Jun 2025
VidEvent: A Large Dataset for Understanding Dynamic Evolution of Events in Videos
AAAI Conference on Artificial Intelligence (AAAI), 2025
Baoyu Liang
Qile Su
Shoutai Zhu
Yuchen Liang
Chao Tong
VGen
294
4
0
03 Jun 2025
Dual-Schedule Inversion: Training- and Tuning-Free Inversion for Real Image Editing
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Jiancheng Huang
Yi Huang
Jianzhuang Liu
Donghao Zhou
Wenshu Fan
Shifeng Chen
DiffM
385
13
0
15 Dec 2024
Scalable Early Childhood Reading Performance Prediction
Neural Information Processing Systems (NeurIPS), 2024
Zhongkai Shangguan
Zanming Huang
Eshed Ohn-Bar
Ola Ozernov-Palchik
Derek Kosty
Michael Stoolmiller
Hank Fien
AI4Ed
393
2
0
05 Dec 2024
Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation
European Conference on Computer Vision (ECCV), 2024
Seongsu Ha
Chaeyun Kim
Donghwa Kim
Junho Lee
Sangho Lee
Joonseok Lee
289
7
0
03 Nov 2024
Identifying Implicit Social Biases in Vision-Language Models
AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2024
Kimia Hamidieh
Haoran Zhang
Walter Gerych
Thomas Hartvigsen
Elisa Kreiss
VLM
332
41
0
01 Nov 2024
ARMADA: Attribute-Based Multimodal Data Augmentation
Xiaomeng Jin
Jeonghwan Kim
Yu Zhou
Kuan-Hao Huang
Te-Lin Wu
Nanyun Peng
Heng Ji
254
5
0
19 Aug 2024
DIVE: Towards Descriptive and Diverse Visual Commonsense Generation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Jun-Hyung Park
Hyuntae Park
Youjin Kang
Eojin Jeon
SangKeun Lee
221
0
0
15 Aug 2024
Order-Preserving Dimension Reduction for Multimodal Semantic Embedding
Chengyu Gong
Gefei Shen
Luanzheng Guo
Nathan R. Tallent
Dongfang Zhao
241
1
0
15 Aug 2024
MM-Forecast: A Multimodal Approach to Temporal Event Forecasting with Large Language Models
ACM Multimedia (MM), 2024
Haoxuan Li
Zhengmao Yang
Yunshan Ma
Yi Bin
Yang Yang
Tat-Seng Chua
271
8
0
08 Aug 2024
Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos
International Journal of Computer Vision (IJCV), 2024
Dhruv Verma
Debaditya Roy
Basura Fernando
321
3
0
30 Jul 2024
MMUTF: Multimodal Multimedia Event Argument Extraction with Unified Template Filling
Philipp Seeberger
Dominik Wagner
Korbinian Riedhammer
336
7
0
18 Jun 2024
GenEARL: A Training-Free Generative Framework for Multimodal Event Argument Role Labeling
Hritik Bansal
Po-Nien Kung
P. Brantingham
Weisheng Wang
Miao Zheng
VLM
267
3
0
07 Apr 2024
Cross-Modal Conditioned Reconstruction for Language-guided Medical Image Segmentation
IEEE Transactions on Medical Imaging (IEEE TMI), 2024
Xiaoshuang Huang
Hongxiang Li
Meng Cao
Long Chen
Chenyu You
Dong An
VLM
317
26
0
03 Apr 2024
Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval
Computer Vision and Pattern Recognition (CVPR), 2024
Yuchen Suo
Fan Ma
Linchao Zhu
Yi Yang
263
51
0
24 Mar 2024
Approximated Likelihood Ratio: A Forward-Only and Parallel Framework for Boosting Neural Network Training
Zeliang Zhang
Jinyang Jiang
Zhuo Liu
Susan Liang
Yijie Peng
Chenliang Xu
164
0
0
18 Mar 2024
Text-Guided Variational Image Generation for Industrial Anomaly Detection and Segmentation
Computer Vision and Pattern Recognition (CVPR), 2024
Mingyu Lee
Jongwon Choi
433
23
0
10 Mar 2024
UMIE: Unified Multimodal Information Extraction with Instruction Tuning
Lin Sun
Kai Zhang
Qingyuan Li
Renze Lou
322
35
0
05 Jan 2024
Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Kung-Hsiang Huang
Mingyang Zhou
Hou Pong Chan
Yi R. Fung
Zhenhailong Wang
Lingyu Zhang
Shih-Fu Chang
Chenhui Xu
412
57
0
15 Dec 2023
Learning Generalizable Perceptual Representations for Data-Efficient No-Reference Image Quality Assessment
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Suhas Srinath
Shankhanil Mitra
Shika Rao
R. Soundararajan
OOD
269
13
0
08 Dec 2023
Localized Symbolic Knowledge Distillation for Visual Commonsense Models
Neural Information Processing Systems (NeurIPS), 2023
Jinho Park
Jack Hessel
Khyathi Chandu
Paul Pu Liang
Ximing Lu
...
Youngjae Yu
Qiuyuan Huang
Jianfeng Gao
Ali Farhadi
Yejin Choi
VLM
326
13
0
08 Dec 2023
Prompt Tuning for Zero-shot Compositional Learning
Lingyu Zhang
Ting Hua
Yilin Shen
Hongxia Jin
VLM
316
0
0
02 Dec 2023
Stochastic Vision Transformers with Wasserstein Distance-Aware Attention
Franciskus Xaverius Erick
Mina Rezaei
Johanna P. Müller
Bernhard Kainz
249
0
0
30 Nov 2023
ViStruct: Visual Structural Knowledge Extraction via Curriculum Guided Code-Vision Representation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yangyi Chen
Xingyao Wang
Pengfei Yu
Derek Hoiem
Heng Ji
279
15
0
22 Nov 2023
SPOT! Revisiting Video-Language Models for Event Understanding
Gengyuan Zhang
Jinhe Bi
Jindong Gu
Yanyu Chen
Volker Tresp
497
19
0
21 Nov 2023
TextEE: Benchmark, Reevaluation, Reflections, and Future Challenges in Event Extraction
Kuan-Hao Huang
I-Hung Hsu
Tanmay Parekh
Zhiyu Xie
Zixuan Zhang
Premkumar Natarajan
Kai-Wei Chang
Nanyun Peng
Heng Ji
534
32
0
16 Nov 2023
Towards a Unified Transformer-based Framework for Scene Graph Generation and Human-object Interaction Detection
IEEE Transactions on Image Processing (IEEE TIP), 2023
Tao He
Lianli Gao
Jingkuan Song
Yuan-Fang Li
ViT
279
16
0
03 Nov 2023
Defining a New NLP Playground
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Sha Li
Chi Han
Pengfei Yu
Carl Edwards
Pengfei Yu
...
Yi R. Fung
Charles Yu
Joel R. Tetreault
Eduard H. Hovy
Heng Ji
432
7
0
31 Oct 2023
Envisioning Narrative Intelligence: A Creative Visual Storytelling Anthology
International Conference on Human Factors in Computing Systems (CHI), 2023
Brett A. Halperin
S. Lukin
CoGe
255
31
0
06 Oct 2023
Multimodal Question Answering for Unified Information Extraction
Yuxuan Sun
Kai Zhang
Yu-Chuan Su
212
11
0
04 Oct 2023
Seal2Real: Prompt Prior Learning on Diffusion Model for Unsupervised Document Seal Data Generation and Realisation
Mingfu Yan
Jiancheng Huang
Yi Huang
DiffM
VLM
348
4
0
01 Oct 2023
FEC: Three Finetuning-free Methods to Enhance Consistency for Real Image Editing
Songyan Chen
Jiancheng Huang
DiffM
187
16
0
26 Sep 2023
Diversified Ensemble of Independent Sub-Networks for Robust Self-Supervised Representation Learning
Amirhossein Vahidi
Lisa Wimmer
H. Gündüz
B. Bischl
Eyke Hüllermeier
Mina Rezaei
OOD
UQCV
334
4
0
28 Aug 2023
Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features
Alberto Baldrati
Marco Bertini
Tiberio Uricchio
Marco Bertini
CLIP
CoGe
179
72
0
22 Aug 2023
ClipSitu: Effectively Leveraging CLIP for Conditional Predictions in Situation Recognition
Debaditya Roy
Dhruv Verma
Basura Fernando
VLM
CLIP
563
9
0
02 Jul 2023
Training Multimedia Event Extraction With Generated Images and Captions
ACM Multimedia (ACM MM), 2023
Zilin Du
Yunxin Li
Xu Guo
Yidan Sun
Boyang Albert Li
DiffM
408
18
0
15 Jun 2023
Z-GMOT: Zero-shot Generic Multiple Object Tracking
Kim Hoang Tran
Anh Duy Le Dinh
Tien-Phat Nguyen
Thinh Phan
Pha Nguyen
Khoa Luu
Don Adjeroh
Gianfranco Doretto
Ngan Hoang Le
VOT
388
10
0
28 May 2023
Few-shot Domain-Adaptive Visually-fused Event Detection from Text
Fusion (Fusion), 2023
Farhad Moghimifar
Fatemeh Shiri
Van Nguyen
Gholamreza Haffari
Yuanyou Li
VLM
252
4
0
04 May 2023
VERITE: A Robust Benchmark for Multimodal Misinformation Detection Accounting for Unimodal Bias
Stefanos-Iordanis Papadopoulos
C. Koutlis
Symeon Papadopoulos
P. Petrantonakis
722
44
0
27 Apr 2023
Verbs in Action: Improving verb understanding in video-language models
IEEE International Conference on Computer Vision (ICCV), 2023
Liliane Momeni
Mathilde Caron
Arsha Nagrani
Andrew Zisserman
Cordelia Schmid
547
89
0
13 Apr 2023
Subject-driven Text-to-Image Generation via Apprenticeship Learning
Neural Information Processing Systems (NeurIPS), 2023
Wenhu Chen
Hexiang Hu
Yandong Li
Nataniel Rui
Xuhui Jia
Ming-Wei Chang
William W. Cohen
DiffM
1.2K
241
0
01 Apr 2023
Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert
Computer Vision and Pattern Recognition (CVPR), 2023
Jiadong Wang
Xinyuan Qian
Malu Zhang
R. Tan
Haizhou Li
EGVM
260
150
0
29 Mar 2023
1
2
Next
Page 1 of 2