ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.11593
  4. Cited By
Improving Image Captioning Descriptiveness by Ranking and LLM-based
  Fusion

Improving Image Captioning Descriptiveness by Ranking and LLM-based Fusion

20 June 2023
Simone Bianco
Luigi Celona
Marco Donzella
Paolo Napoletano
ArXivPDFHTML

Papers citing "Improving Image Captioning Descriptiveness by Ranking and LLM-based Fusion"

16 / 16 papers shown
Title
Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception
Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception
Ruotian Peng
Haiying He
Yake Wei
Yandong Wen
D. Hu
VLM
34
0
0
09 Apr 2025
JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model
JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model
Yi Nian
Shenzhe Zhu
Yuehan Qin
Li Li
Z. Wang
Chaowei Xiao
Yue Zhao
21
0
0
03 Apr 2025
MathAgent: Leveraging a Mixture-of-Math-Agent Framework for Real-World Multimodal Mathematical Error Detection
MathAgent: Leveraging a Mixture-of-Math-Agent Framework for Real-World Multimodal Mathematical Error Detection
Yibo Yan
Shen Wang
Jiahao Huo
Philip S. Yu
Xuming Hu
Qingsong Wen
53
1
0
23 Mar 2025
Knowledge Bridger: Towards Training-free Missing Multi-modality Completion
Knowledge Bridger: Towards Training-free Missing Multi-modality Completion
Guanzhou Ke
Shengfeng He
X. Wang
Bo Wang
Guoqing Chao
Y. Zhang
Yi Xie
HeXing Su
48
0
0
27 Feb 2025
Image Embedding Sampling Method for Diverse Captioning
Image Embedding Sampling Method for Diverse Captioning
Sania Waheed
Na Min An
47
0
0
14 Feb 2025
GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training
GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training
Renqiu Xia
M. Li
Hancheng Ye
Wenjie Wu
Hongbin Zhou
...
Conghui He
Botian Shi
Tao Chen
Junchi Yan
Bo Zhang
82
7
0
16 Dec 2024
Towards Unified Benchmark and Models for Multi-Modal Perceptual Metrics
Towards Unified Benchmark and Models for Multi-Modal Perceptual Metrics
Sara Ghazanfari
Siddharth Garg
Nicolas Flammarion
P. Krishnamurthy
Farshad Khorrami
Francesco Croce
VLM
84
0
0
13 Dec 2024
Help Me Identify: Is an LLM+VQA System All We Need to Identify Visual
  Concepts?
Help Me Identify: Is an LLM+VQA System All We Need to Identify Visual Concepts?
Shailaja Keyur Sampat
Maitreya Patel
Yezhou Yang
Chitta Baral
6
0
0
17 Oct 2024
K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences
K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences
Zhikai Li
Xuewen Liu
Dongrong Fu
Jianquan Li
Qingyi Gu
Kurt Keutzer
Zhen Dong
EGVM
VGen
DiffM
72
1
0
26 Aug 2024
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation
Yunhao Ge
Xiaohui Zeng
Jacob Samuel Huffman
Tsung-Yi Lin
Ming-Yu Liu
Yin Cui
CoGe
DiffM
22
14
0
30 Apr 2024
Inserting Faces inside Captions: Image Captioning with Attention Guided
  Merging
Inserting Faces inside Captions: Image Captioning with Attention Guided Merging
Yannis Tevissen
Khalil Guetari
Marine Tassel
Erwan Kerleroux
Frédéric Petitpont
22
0
0
20 Mar 2024
M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts
M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts
Mingsheng Li
Xin Chen
C. Zhang
Sijin Chen
Hongyuan Zhu
Fukun Yin
Gang Yu
Tao Chen
14
23
0
17 Dec 2023
Learning Distinct and Representative Styles for Image Captioning
Learning Distinct and Representative Styles for Image Captioning
Qi Chen
Chaorui Deng
Qi Wu
VLM
21
23
0
17 Sep 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
380
4,010
0
28 Jan 2022
How Much Can CLIP Benefit Vision-and-Language Tasks?
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Mohit Bansal
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIP
VLM
MLLM
180
342
0
13 Jul 2021
Unifying Vision-and-Language Tasks via Text Generation
Unifying Vision-and-Language Tasks via Text Generation
Jaemin Cho
Jie Lei
Hao Tan
Mohit Bansal
MLLM
249
518
0
04 Feb 2021
1