Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.03719
Cited By
Multimodal Transformer for Comics Text-Cloze
6 March 2024
Emanuele Vivoli
Joan Lafuente Baeza
Ernest Valveny Llobet
Dimosthenis Karatzas
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Multimodal Transformer for Comics Text-Cloze"
8 / 8 papers shown
Title
ComicsPAP: understanding comic strips by picking the correct panel
Emanuele Vivoli
Artemis LLabres
Mohamed Ali Soubgui
Marco Bertini
Ernest Valveny Llobet
Dimosthenis Karatzas
55
0
0
11 Mar 2025
Toward accessible comics for blind and low vision readers
Christophe Rigaud
J. Burie
Samuel Petit
41
3
0
11 Jul 2024
CoMix: A Comprehensive Benchmark for Multi-Task Comic Understanding
Emanuele Vivoli
Marco Bertini
Dimosthenis Karatzas
39
1
0
04 Jul 2024
Comics Datasets Framework: Mix of Comics datasets for detection benchmarking
Emanuele Vivoli
Irene Campaioli
Mariateresa Nardoni
Niccoló Biondi
Marco Bertini
Dimosthenis Karatzas
20
5
0
03 Jul 2024
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Jun Chen
Deyao Zhu
Xiaoqian Shen
Xiang Li
Zechun Liu
Pengchuan Zhang
Raghuraman Krishnamoorthi
Vikas Chandra
Yunyang Xiong
Mohamed Elhoseiny
MLLM
154
280
0
14 Oct 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
A Comprehensive Gold Standard and Benchmark for Comics Text Detection and Recognition
Gurkan Soykan
Deniz Yuret
T. M. Sezgin
16
3
0
27 Dec 2022
Unifying Vision-and-Language Tasks via Text Generation
Jaemin Cho
Jie Lei
Hao Tan
Mohit Bansal
MLLM
249
518
0
04 Feb 2021
1