Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.05467
Cited By
The infrastructure powering IBM's Gen AI model development
7 July 2024
Talia Gershon
Seetharami R. Seelam
Brian M. Belgodere
Milton Bonilla
Lan Hoang
Danny Barnett
I-Hsin Chung
Apoorve Mohan
Ming-Hung Chen
Lixiang Luo
Robert Walkup
Constantinos Evangelinos
Shweta Salaria
Marc Dombrowa
Yoonho Park
Apo Kayi
L. Schour
Alim Alim
Ali Sydney
P. Maniotis
L. Schares
Bernard Metzler
Bengi Karacali-Akyamac
Sophia Wen
Tatsuhiro Chiba
Sunyanan Choochotkaew
Takeshi Yoshimura
C. Misale
Tonia Elengikal
Kevin O Connor
Zhuoran Liu
Richard Molina
L. Schneidenbach
James Caden
Christopher Laibinis
Carlos Fonseca
Vasily Tarasov
S. Sundararaman
Frank B. Schmuck
S. Guthridge
Jeremy Cohn
Marc Eshel
Paul Muench
Runyu Liu
W. Pointer
D. Wyskida
Bob Krull
Ray Rose
Brent Wolfe
William Cornejo
John Walter
Colm Malone
Clifford Perucci
Frank Franco
Nigel Hinds
Bob Calio
Pavel Druyan
R. Kilduff
John Kienle
Connor McStay
Andrew Figueroa
Matthew Connolly
Edie Fost
Gina Roma
Jake Fonseca
Ido Levy
Michele Payne
Ryan Schenkel
Amir Malki
Lion Schneider
Aniruddha Narkhede
Shekeba Moshref
Alexandra Kisin
Olga Dodin
Bill Rippon
Henry Wrieth
John M. Ganci
Johnny Colino
Donna Habeger-Rose
Rakesh Pandey
Aditya Gidh
Aditya Gaur
Dennis Patterson
Samsuddin Salmani
Rambilas Varma
Rumana Rumana
Shubham Sharma
Aditya Gaur
Mayank Mishra
Rameswar Panda
Aditya Prasad
Matt Stallone
Gaoyuan Zhang
Yikang Shen
David D. Cox
Ruchir Puri
Dakshi Agrawal
Drew Thorstensen
Joel Belog
Brent Tang
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The infrastructure powering IBM's Gen AI model development"
4 / 4 papers shown
Title
Revisiting Reliability in Large-Scale Machine Learning Research Clusters
Apostolos Kokolis
Michael Kuchnik
John Hoffman
Adithya Kumar
Parth Malani
Faye Ma
Zachary DeVito
S.
Kalyan Saladi
Carole-Jean Wu
53
7
0
29 Oct 2024
FlowTracer: A Tool for Uncovering Network Path Usage Imbalance in AI Training Clusters
Hasibul Jamil
Abdul Alim
L. Schares
P. Maniotis
L. Schour
Ali Sydney
Abdullah Kayi
T. Kosar
Bengi Karacali
22
0
0
22 Oct 2024
Efficient Training of Large Language Models on Distributed Infrastructures: A Survey
Jiangfei Duan
Shuo Zhang
Zerui Wang
Lijuan Jiang
Wenwen Qu
...
Dahua Lin
Yonggang Wen
Xin Jin
Tianwei Zhang
Peng Sun
67
7
0
29 Jul 2024
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,791
0
17 Sep 2019
1