ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.18949
27
1

Predicting Winning Captions for Weekly New Yorker Comics

12 July 2024
Stanley Cao
Sonny Young
    ViT
    VLM
ArXivPDFHTML
Abstract

Image captioning using Vision Transformers (ViTs) represents a pivotal convergence of computer vision and natural language processing, offering the potential to enhance user experiences, improve accessibility, and provide textual representations of visual data. This paper explores the application of image captioning techniques to New Yorker cartoons, aiming to generate captions that emulate the wit and humor of winning entries in the New Yorker Cartoon Caption Contest. This task necessitates sophisticated visual and linguistic processing, along with an understanding of cultural nuances and humor. We propose several new baselines for using vision transformer encoder-decoder models to generate captions for the New Yorker cartoon caption contest.

View on arXiv
Comments on this paper