Learning to Route LLMs from Bandit Feedback: One Policy, Many Trade-offs

Learning to Route LLMs from Bandit Feedback: One Policy, Many Trade-offs

8 October 2025

Franck Dernoncourt

ArXiv (abs)PDF HTML HuggingFace (3 upvotes)

Papers citing "Learning to Route LLMs from Bandit Feedback: One Policy, Many Trade-offs"

0 / 0 papers shown

Title
No papers found