ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.17721
30
0

Aligning Compound AI Systems via System-level DPO

24 February 2025
Xiangwen Wang
Yibo Zhang
Zhoujie Ding
Katherine Tsai
Sanmi Koyejo
ArXivPDFHTML
Abstract

Compound AI systems, comprising multiple interacting components such as LLM agents and external tools, demonstrate state-of-the-art results across diverse tasks. It is hence crucial to align components within the system to produce consistent results that match human expectations. However, conventional alignment methods, such as Direct Preference Optimization (DPO), are not directly applicable to compound AI systems. These challenges include the non-differentiable interactions between components, making end-to-end gradient optimization infeasible. Additionally, system-level preferences cannot be directly translated into component-level preferences, further complicating alignment. We address the issues by formulating compound AI systems as Directed Acyclic Graphs (DAGs), capturing the connections between agents and the data generation processes. We propose a system-level DPO (SysDPO) to jointly align compound systems by adapting the DPO to operate on these DAGs. We study the joint alignment of an LLM and a diffusion model to demonstrate the effectiveness of our approach. Our exploration provides insights into the alignment of compound AI systems and lays a foundation for future advancements.

View on arXiv
@article{wang2025_2502.17721,
  title={ Aligning Compound AI Systems via System-level DPO },
  author={ Xiangwen Wang and Yibo Jacky Zhang and Zhoujie Ding and Katherine Tsai and Sanmi Koyejo },
  journal={arXiv preprint arXiv:2502.17721},
  year={ 2025 }
}
Comments on this paper