ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.19340
14
0

Dataverse: Open-Source ETL (Extract, Transform, Load) Pipeline for Large Language Models

28 March 2024
Hyunbyung Park
Sukyung Lee
Gyoungjin Gim
Yungi Kim
Dahyun Kim
Chanjun Park
    VLM
ArXivPDFHTML
Abstract

To address the challenges associated with data processing at scale, we propose Dataverse, a unified open-source Extract-Transform-Load (ETL) pipeline for large language models (LLMs) with a user-friendly design at its core. Easy addition of custom processors with block-based interface in Dataverse allows users to readily and efficiently use Dataverse to build their own ETL pipeline. We hope that Dataverse will serve as a vital tool for LLM development and open source the entire library to welcome community contribution. Additionally, we provide a concise, two-minute video demonstration of our system, illustrating its capabilities and implementation.

View on arXiv
@article{park2025_2403.19340,
  title={ Dataverse: Open-Source ETL (Extract, Transform, Load) Pipeline for Large Language Models },
  author={ Hyunbyung Park and Sukyung Lee and Gyoungjin Gim and Yungi Kim and Dahyun Kim and Chanjun Park },
  journal={arXiv preprint arXiv:2403.19340},
  year={ 2025 }
}
Comments on this paper