ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2412.00684
75
0

Paint Outside the Box: Synthesizing and Selecting Training Data for Visual Grounding

1 December 2024
Zilin Du
Haoxin Li
Jianfei Yu
Boyang Li
ArXivPDFHTML
Abstract

Visual grounding aims to localize the image regions based on a textual query. Given the difficulty of large-scale data curation, we investigate how to effectively learn visual grounding under data-scarce settings in this paper. To address the data scarcity, we propose a novel framework, POBF (Paint Outside the Box and Filter). POBF synthesizes images by inpainting outside the box, tackling a label misalignment issue encountered in previous works. Furthermore, POBF leverages an innovative filtering scheme to select the most effective training data. This scheme combines a hardness score and an overfitting score, balanced by a penalty term. Extensive experiments across four benchmark datasets demonstrate that POBF consistently improves performance, achieving an average gain of 5.83\% over the real-data-only method and outperforming leading baselines by 2.29\%-3.85\% in accuracy. Additionally, we validate the robustness and generalizability of POBF across various generative models, training data sizes, and model architectures.

View on arXiv
@article{du2025_2412.00684,
  title={ Paint Outside the Box: Synthesizing and Selecting Training Data for Visual Grounding },
  author={ Zilin Du and Haoxin Li and Jianfei Yu and Boyang Li },
  journal={arXiv preprint arXiv:2412.00684},
  year={ 2025 }
}
Comments on this paper