ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.12940
9
0

GEST: the Graph of Events in Space and Time as a Common Representation between Vision and Language

22 May 2023
Mihai Masala
Nicolae Cudlenco
Traian Rebedea
Marius Leordeanu
ArXivPDFHTML
Abstract

One of the essential human skills is the ability to seamlessly build an inner representation of the world. By exploiting this representation, humans are capable of easily finding consensus between visual, auditory and linguistic perspectives. In this work, we set out to understand and emulate this ability through an explicit representation for both vision and language - Graphs of Events in Space and Time (GEST). GEST alows us to measure the similarity between texts and videos in a semantic and fully explainable way, through graph matching. It also allows us to generate text and videos from a common representation that provides a well understood content. In this work we show that the graph matching similarity metrics based on GEST outperform classical text generation metrics and can also boost the performance of state of art, heavily trained metrics.

View on arXiv
Comments on this paper