ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.06271
  4. Cited By
TangoBERT: Reducing Inference Cost by using Cascaded Architecture

TangoBERT: Reducing Inference Cost by using Cascaded Architecture

13 April 2022
Jonathan Mamou
Oren Pereg
Moshe Wasserblat
Roy Schwartz
ArXiv (abs)PDFHTMLGithub (161★)

Papers citing "TangoBERT: Reducing Inference Cost by using Cascaded Architecture"

11 / 11 papers shown
In-Context Distillation with Self-Consistency Cascades: A Simple, Training-Free Way to Reduce LLM Agent Costs
In-Context Distillation with Self-Consistency Cascades: A Simple, Training-Free Way to Reduce LLM Agent Costs
Vishnu Sarukkai
Asanshay Gupta
James Hong
Michael Gharbi
Kayvon Fatahalian
114
0
0
02 Dec 2025
MythTriage: Scalable Detection of Opioid Use Disorder Myths on a Video-Sharing Platform
MythTriage: Scalable Detection of Opioid Use Disorder Myths on a Video-Sharing Platform
Hayoung Jung
Shravika Mittal
Ananya Aatreya
Navreet Kaur
M. D. Choudhury
Tanushree Mitra
333
4
0
30 May 2025
Gatekeeper: Improving Model Cascades Through Confidence Tuning
Gatekeeper: Improving Model Cascades Through Confidence Tuning
Stephan Rabanser
Nathalie Rauschmayr
Achin Kulshrestha
Petra Poklukar
Wittawat Jitkrittum
Sean Augenstein
Congchao Wang
Federico Tombari
557
1
0
26 Feb 2025
Translate Smart, not Hard: Cascaded Translation Systems with Quality-Aware Deferral
Translate Smart, not Hard: Cascaded Translation Systems with Quality-Aware Deferral
António Farinhas
Nuno M. Guerreiro
Sweta Agrawal
Ricardo Rei
André F. T. Martins
438
4
0
18 Feb 2025
Cascade-Aware Training of Language Models
Cascade-Aware Training of Language Models
Congchao Wang
Sean Augenstein
Keith Rush
Wittawat Jitkrittum
Harikrishna Narasimhan
A. S. Rawat
A. Menon
Alec Go
348
14
0
29 May 2024
Faster Cascades via Speculative Decoding
Faster Cascades via Speculative Decoding
Harikrishna Narasimhan
Wittawat Jitkrittum
A. S. Rawat
Seungyeon Kim
Neha Gupta
A. Menon
Sanjiv Kumar
LRM
409
25
0
29 May 2024
Language Model Cascades: Token-level uncertainty and beyond
Language Model Cascades: Token-level uncertainty and beyond
Neha Gupta
Harikrishna Narasimhan
Wittawat Jitkrittum
A. S. Rawat
A. Menon
Sanjiv Kumar
UQLM
586
109
0
15 Apr 2024
Towards Optimizing the Costs of LLM Usage
Towards Optimizing the Costs of LLM Usage
Shivanshu Shekhar
Tanishq Dubey
Koyel Mukherjee
Apoorv Saxena
Atharv Tyagi
Nishanth Kotla
263
53
0
29 Jan 2024
When Does Confidence-Based Cascade Deferral Suffice?
When Does Confidence-Based Cascade Deferral Suffice?Neural Information Processing Systems (NeurIPS), 2023
Wittawat Jitkrittum
Neha Gupta
A. Menon
Harikrishna Narasimhan
A. S. Rawat
Surinder Kumar
229
43
0
06 Jul 2023
Finding the SWEET Spot: Analysis and Improvement of Adaptive Inference
  in Low Resource Settings
Finding the SWEET Spot: Analysis and Improvement of Adaptive Inference in Low Resource SettingsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Daniel Rotem
Michael Hassid
Jonathan Mamou
Roy Schwartz
267
7
0
04 Jun 2023
BabyBear: Cheap inference triage for expensive language models
BabyBear: Cheap inference triage for expensive language models
Leila Khalili
Yao You
John Bohannon
208
12
0
24 May 2022
1
Page 1 of 1