ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.06194
  4. Cited By
SuperBench: Improving Cloud AI Infrastructure Reliability with Proactive
  Validation

SuperBench: Improving Cloud AI Infrastructure Reliability with Proactive Validation

9 February 2024
Yifan Xiong
Yuting Jiang
Ziyue Yang
L. Qu
Guoshuai Zhao
Shuguang Liu
Dong Zhong
Boris Pinzur
Jie Zhang
Yang Wang
Jithin Jose
Hossein Pourreza
Jeff Baxter
Kushal Datta
Prabhat Ram
Luke Melton
Joe Chau
Peng Cheng
Yongqiang Xiong
Lidong Zhou
ArXivPDFHTML

Papers citing "SuperBench: Improving Cloud AI Infrastructure Reliability with Proactive Validation"

2 / 2 papers shown
Title
Minder: Faulty Machine Detection for Large-scale Distributed Model Training
Minder: Faulty Machine Detection for Large-scale Distributed Model Training
Yangtao Deng
Xiang Shi
Zhuo Jiang
X. Zhang
Lei Zhang
...
Fuliang Li
Shuguang Wang
H. Lin
Jianxi Ye
Minlan Yu
LRM
67
2
0
04 Nov 2024
Densely Connected Convolutional Networks
Densely Connected Convolutional Networks
Gao Huang
Zhuang Liu
L. V. D. van der Maaten
Kilian Q. Weinberger
PINN
3DV
244
35,884
0
25 Aug 2016
1