ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.01650
18
0

MARRS: Multimodal Reference Resolution System

3 November 2023
Halim Cagri Ates
Shruti Bhargava
Site Li
Jiarui Lu
Siddhardha Maddula
Joel Ruben Antony Moniz
Anil Kumar Nalamalapu
R. Nguyen
Melis Ozyildirim
Alkesh Patel
Dhivya Piraviperumal
Vincent Renkens
Ankit Samal
Thy Tran
Bo-Hsiang Tseng
Hong-ye Yu
Yuan-kang Zhang
Rong-min Zou
ArXivPDFHTML
Abstract

Successfully handling context is essential for any dialog understanding task. This context maybe be conversational (relying on previous user queries or system responses), visual (relying on what the user sees, for example, on their screen), or background (based on signals such as a ringing alarm or playing music). In this work, we present an overview of MARRS, or Multimodal Reference Resolution System, an on-device framework within a Natural Language Understanding system, responsible for handling conversational, visual and background context. In particular, we present different machine learning models to enable handing contextual queries; specifically, one to enable reference resolution, and one to handle context via query rewriting. We also describe how these models complement each other to form a unified, coherent, lightweight system that can understand context while preserving user privacy.

View on arXiv
Comments on this paper