Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2204.06271
Cited By

TangoBERT: Reducing Inference Cost by using Cascaded Architecture

TangoBERT: Reducing Inference Cost by using Cascaded Architecture

13 April 2022

Moshe Wasserblat

ArXiv (abs)PDF HTML Github (161★)

Papers citing "TangoBERT: Reducing Inference Cost by using Cascaded Architecture"

11 / 11 papers shown

In-Context Distillation with Self-Consistency Cascades: A Simple, Training-Free Way to Reduce LLM Agent Costs

In-Context Distillation with Self-Consistency Cascades: A Simple, Training-Free Way to Reduce LLM Agent Costs

Vishnu Sarukkai

Kayvon Fatahalian

114

0

0

02 Dec 2025

MythTriage: Scalable Detection of Opioid Use Disorder Myths on a Video-Sharing Platform

MythTriage: Scalable Detection of Opioid Use Disorder Myths on a Video-Sharing Platform

Shravika Mittal

M. D. Choudhury

Tanushree Mitra

333

4

0

30 May 2025

Gatekeeper: Improving Model Cascades Through Confidence Tuning

Gatekeeper: Improving Model Cascades Through Confidence Tuning

Stephan Rabanser

Nathalie Rauschmayr

Achin Kulshrestha

Wittawat Jitkrittum

Sean Augenstein

Federico Tombari

557

1

0

26 Feb 2025

Translate Smart, not Hard: Cascaded Translation Systems with Quality-Aware Deferral

Translate Smart, not Hard: Cascaded Translation Systems with Quality-Aware Deferral

António Farinhas

Nuno M. Guerreiro

André F. T. Martins

438

4

0

18 Feb 2025

Cascade-Aware Training of Language Models

Cascade-Aware Training of Language Models

Sean Augenstein

Wittawat Jitkrittum

Harikrishna Narasimhan

348

14

0

29 May 2024

Faster Cascades via Speculative Decoding

Faster Cascades via Speculative Decoding

Harikrishna Narasimhan

Wittawat Jitkrittum

Sanjiv Kumar

409

25

0

29 May 2024

Language Model Cascades: Token-level uncertainty and beyond

Language Model Cascades: Token-level uncertainty and beyond

Harikrishna Narasimhan

Wittawat Jitkrittum

Sanjiv Kumar

586

109

0

15 Apr 2024

Towards Optimizing the Costs of LLM Usage

Towards Optimizing the Costs of LLM Usage

Shivanshu Shekhar

Koyel Mukherjee

263

53

0

29 Jan 2024

When Does Confidence-Based Cascade Deferral Suffice?

When Does Confidence-Based Cascade Deferral Suffice?Neural Information Processing Systems (NeurIPS), 2023

Wittawat Jitkrittum

Harikrishna Narasimhan

229

43

0

06 Jul 2023

Finding the SWEET Spot: Analysis and Improvement of Adaptive Inference
in Low Resource Settings

Finding the SWEET Spot: Analysis and Improvement of Adaptive Inference in Low Resource SettingsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

267

7

0

04 Jun 2023

BabyBear: Cheap inference triage for expensive language models

BabyBear: Cheap inference triage for expensive language models

208

12

0

24 May 2022

Page 1 of 1