ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.00974
  4. Cited By
Consensus Graph Representation Learning for Better Grounded Image
  Captioning
v1v2 (latest)

Consensus Graph Representation Learning for Better Grounded Image Captioning

2 December 2021
Wenqiao Zhang
Haochen Shi
Siliang Tang
Jun Xiao
Qiang Yu
Yueting Zhuang
ArXiv (abs)PDFHTML

Papers citing "Consensus Graph Representation Learning for Better Grounded Image Captioning"

28 / 28 papers shown
Title
Heartcare Suite: A Unified Multimodal ECG Suite for Dual Signal-Image Modeling and Understanding
Heartcare Suite: A Unified Multimodal ECG Suite for Dual Signal-Image Modeling and Understanding
Yihan Xie
Sijing Li
Tianwei Lin
Zhuonan Wang
Chenglin Yang
...
Haoyuan Li
Hao Jiang
Tai-wei Chang
Qishan Chen
Jun Xiao
231
2
0
24 Dec 2025
MRFD: Multi-Region Fusion Decoding with Self-Consistency for Mitigating Hallucinations in LVLMs
MRFD: Multi-Region Fusion Decoding with Self-Consistency for Mitigating Hallucinations in LVLMs
Haonan Ge
Yiwei Wang
Ming-Hsuan Yang
Yujun Cai
123
5
0
14 Aug 2025
RORPCap: Retrieval-based Objects and Relations Prompt for Image Captioning
RORPCap: Retrieval-based Objects and Relations Prompt for Image Captioning
Jinjing Gu
Tianbao Qin
Yuanyuan Pu
Zhengpeng Zhao
VLM
84
0
0
10 Aug 2025
Attention-based transformer models for image captioning across languages: An in-depth survey and evaluation
Attention-based transformer models for image captioning across languages: An in-depth survey and evaluationComputer Science Review (CSR), 2025
Israa A. Albadarneh
Bassam Hammo
Omar Al-Kadi
VLM
151
2
0
03 Jun 2025
Adaptation Method for Misinformation Identification
Adaptation Method for Misinformation Identification
Yangping Chen
Weijie Shi
Mengze Li
Yue Cui
Hechang Chen
Jia Zhu
Jiajie Xu
185
0
0
19 Apr 2025
Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, EditingNeural Information Processing Systems (NeurIPS), 2024
Hao Fei
Shengqiong Wu
Hao Zhang
Tat-Seng Chua
Shuicheng Yan
443
70
0
31 Dec 2024
See or Guess: Counterfactually Regularized Image Captioning
See or Guess: Counterfactually Regularized Image CaptioningACM Multimedia (MM), 2024
Qian Cao
Xu Chen
Ruihua Song
Xiting Wang
Xinting Huang
Yuchen Ren
CML
174
3
0
29 Aug 2024
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis
Uri Berger
Gabriel Stanovsky
Omri Abend
Lea Frermann
340
0
0
09 Aug 2024
Semantic Codebook Learning for Dynamic Recommendation Models
Semantic Codebook Learning for Dynamic Recommendation ModelsACM Multimedia (MM), 2024
Zheqi Lv
Shaoxuan He
Ahmed Salem
Minxing Zhang
Wenqiao Zhang
Jingyuan Chen
Yang Zhang
Fei Wu
274
12
0
31 Jul 2024
Negative Object Presence Evaluation (NOPE) to Measure Object
  Hallucination in Vision-Language Models
Negative Object Presence Evaluation (NOPE) to Measure Object Hallucination in Vision-Language Models
Holy Lovenia
Wenliang Dai
Samuel Cahyawijaya
Ziwei Ji
Pascale Fung
MLLM
231
70
0
09 Oct 2023
Top-Down Framework for Weakly-supervised Grounded Image Captioning
Top-Down Framework for Weakly-supervised Grounded Image Captioning
Chen Cai
Suchen Wang
Kim-Hui Yap
Yi Wang
ObjD
210
5
0
13 Jun 2023
Type-to-Track: Retrieve Any Object via Prompt-based Tracking
Type-to-Track: Retrieve Any Object via Prompt-based TrackingNeural Information Processing Systems (NeurIPS), 2023
Pha Nguyen
Kha Gia Quach
Kris Kitani
Khoa Luu
267
31
0
22 May 2023
Learning in Imperfect Environment: Multi-Label Classification with
  Long-Tailed Distribution and Partial Labels
Learning in Imperfect Environment: Multi-Label Classification with Long-Tailed Distribution and Partial LabelsIEEE International Conference on Computer Vision (ICCV), 2023
Wenqiao Zhang
Changshuo Liu
Lingze Zeng
Beng Chin Ooi
Siliang Tang
Yueting Zhuang
138
24
0
20 Apr 2023
CAusal and collaborative proxy-tasKs lEarning for Semi-Supervised Domain
  Adaptation
CAusal and collaborative proxy-tasKs lEarning for Semi-Supervised Domain Adaptation
Wenqiao Zhang
Changshuo Liu
Can Cui
Beng Chin Ooi
CML
170
0
0
30 Mar 2023
Visually-Prompted Language Model for Fine-Grained Scene Graph Generation
  in an Open World
Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open WorldIEEE International Conference on Computer Vision (ICCV), 2023
Qifan Yu
Juncheng Li
Yuehua Wu
Siliang Tang
Wei Ji
Yueting Zhuang
218
47
0
23 Mar 2023
Visual Semantic Relatedness Dataset for Image Captioning
Visual Semantic Relatedness Dataset for Image Captioning
Ahmed Sabir
Francesc Moreno-Noguer
Lluís Padró
CoGeVLM
165
4
0
20 Jan 2023
Plausible May Not Be Faithful: Probing Object Hallucination in
  Vision-Language Pre-training
Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-trainingConference of the European Chapter of the Association for Computational Linguistics (EACL), 2022
Wenliang Dai
Zihan Liu
Ziwei Ji
Jane Polak Scowcroft
Pascale Fung
MLLMVLM
252
75
0
14 Oct 2022
Belief Revision based Caption Re-ranker with Visual Semantic Information
Belief Revision based Caption Re-ranker with Visual Semantic InformationInternational Conference on Computational Linguistics (COLING), 2022
Ahmed Sabir
Francesc Moreno-Noguer
Pranava Madhyastha
Lluís Padró
BDL
180
2
0
16 Sep 2022
DUET: A Tuning-Free Device-Cloud Collaborative Parameters Generation
  Framework for Efficient Device Model Generalization
DUET: A Tuning-Free Device-Cloud Collaborative Parameters Generation Framework for Efficient Device Model GeneralizationThe Web Conference (WWW), 2022
Zheqi Lv
Wenqiao Zhang
Shengyu Zhang
Kun Kuang
Feng Wang
...
Ruihao Zhang
Tao Shen
Hongxia Yang
Bengchin Ooi
Leilei Gan
244
61
0
12 Sep 2022
Dilated Context Integrated Network with Cross-Modal Consensus for
  Temporal Emotion Localization in Videos
Dilated Context Integrated Network with Cross-Modal Consensus for Temporal Emotion Localization in VideosACM Multimedia (ACM MM), 2022
Juncheng Billy Li
Junlin Xie
Linchao Zhu
Long Qian
Siliang Tang
...
Haochen Shi
Shengyu Zhang
Longhui Wei
Qi Tian
Yueting Zhuang
164
14
0
03 Aug 2022
BOSS: Bottom-up Cross-modal Semantic Composition with Hybrid
  Counterfactual Training for Robust Content-based Image Retrieval
BOSS: Bottom-up Cross-modal Semantic Composition with Hybrid Counterfactual Training for Robust Content-based Image Retrieval
Wenqiao Zhang
Jiannan Guo
Meng Li
Haochen Shi
Shengyu Zhang
Juncheng Li
Siliang Tang
Yueting Zhuang
205
6
0
09 Jul 2022
Collaborative Intelligence Orchestration: Inconsistency-Based Fusion of
  Semi-Supervised Learning and Active Learning
Collaborative Intelligence Orchestration: Inconsistency-Based Fusion of Semi-Supervised Learning and Active LearningKnowledge Discovery and Data Mining (KDD), 2022
Jiannan Guo
Yangyang Kang
Yu Duan
Xiaozhong Liu
Siliang Tang
Wenqiao Zhang
Kun Kuang
Changlong Sun
Leilei Gan
146
4
0
07 Jun 2022
Compositional Temporal Grounding with Structured Variational Cross-Graph
  Correspondence Learning
Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence LearningComputer Vision and Pattern Recognition (CVPR), 2022
Juncheng Li
Junlin Xie
Long Qian
Linchao Zhu
Siliang Tang
Leilei Gan
Yi Yang
Yueting Zhuang
Xinze Wang
213
80
0
24 Mar 2022
End-to-End Modeling via Information Tree for One-Shot Natural Language
  Spatial Video Grounding
End-to-End Modeling via Information Tree for One-Shot Natural Language Spatial Video GroundingAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Meng Li
Tianbao Wang
Haoyu Zhang
Shengyu Zhang
Zhou Zhao
...
Wenming Tan
Jin Wang
Peng Wang
Shi Pu
Leilei Gan
238
46
0
15 Mar 2022
MAGIC: Multimodal relAtional Graph adversarIal inferenCe for Diverse and
  Unpaired Text-based Image Captioning
MAGIC: Multimodal relAtional Graph adversarIal inferenCe for Diverse and Unpaired Text-based Image Captioning
Wenqiao Zhang
Haochen Shi
Jiannan Guo
Shengyu Zhang
Qingpeng Cai
Juncheng Li
Sihui Luo
Yueting Zhuang
DiffM
223
48
0
13 Dec 2021
Relational Graph Learning for Grounded Video Description Generation
Relational Graph Learning for Grounded Video Description Generation
Wenqiao Zhang
Xinze Wang
Siliang Tang
Haizhou Shi
Haochen Shi
Jun Xiao
Yueting Zhuang
Wenjie Wang
175
34
0
02 Dec 2021
Adaptive Hierarchical Graph Reasoning with Semantic Coherence for
  Video-and-Language Inference
Adaptive Hierarchical Graph Reasoning with Semantic Coherence for Video-and-Language InferenceIEEE International Conference on Computer Vision (ICCV), 2021
Juncheng Li
Siliang Tang
Linchao Zhu
Haochen Shi
Xuanwen Huang
Leilei Gan
Yi Yang
Yueting Zhuang
217
28
0
26 Jul 2021
X-GGM: Graph Generative Modeling for Out-of-Distribution Generalization
  in Visual Question Answering
X-GGM: Graph Generative Modeling for Out-of-Distribution Generalization in Visual Question AnsweringACM Multimedia (ACM MM), 2021
Jingjing Jiang
Zi-yi Liu
Yifan Liu
Jingjing Jiang
N. Zheng
OOD
189
20
0
24 Jul 2021
1