ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.03719
  4. Cited By
Multimodal Transformer for Comics Text-Cloze

Multimodal Transformer for Comics Text-Cloze

6 March 2024
Emanuele Vivoli
Joan Lafuente Baeza
Ernest Valveny Llobet
Dimosthenis Karatzas
ArXivPDFHTML

Papers citing "Multimodal Transformer for Comics Text-Cloze"

8 / 8 papers shown
Title
ComicsPAP: understanding comic strips by picking the correct panel
ComicsPAP: understanding comic strips by picking the correct panel
Emanuele Vivoli
Artemis LLabres
Mohamed Ali Soubgui
Marco Bertini
Ernest Valveny Llobet
Dimosthenis Karatzas
55
0
0
11 Mar 2025
Toward accessible comics for blind and low vision readers
Toward accessible comics for blind and low vision readers
Christophe Rigaud
J. Burie
Samuel Petit
41
3
0
11 Jul 2024
CoMix: A Comprehensive Benchmark for Multi-Task Comic Understanding
CoMix: A Comprehensive Benchmark for Multi-Task Comic Understanding
Emanuele Vivoli
Marco Bertini
Dimosthenis Karatzas
39
1
0
04 Jul 2024
Comics Datasets Framework: Mix of Comics datasets for detection
  benchmarking
Comics Datasets Framework: Mix of Comics datasets for detection benchmarking
Emanuele Vivoli
Irene Campaioli
Mariateresa Nardoni
Niccoló Biondi
Marco Bertini
Dimosthenis Karatzas
20
5
0
03 Jul 2024
MiniGPT-v2: large language model as a unified interface for
  vision-language multi-task learning
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Jun Chen
Deyao Zhu
Xiaoqian Shen
Xiang Li
Zechun Liu
Pengchuan Zhang
Raghuraman Krishnamoorthi
Vikas Chandra
Yunyang Xiong
Mohamed Elhoseiny
MLLM
154
280
0
14 Oct 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
A Comprehensive Gold Standard and Benchmark for Comics Text Detection
  and Recognition
A Comprehensive Gold Standard and Benchmark for Comics Text Detection and Recognition
Gurkan Soykan
Deniz Yuret
T. M. Sezgin
16
3
0
27 Dec 2022
Unifying Vision-and-Language Tasks via Text Generation
Unifying Vision-and-Language Tasks via Text Generation
Jaemin Cho
Jie Lei
Hao Tan
Mohit Bansal
MLLM
249
518
0
04 Feb 2021
1