Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2408.14677
Cited By
v1
v2 (latest)
Can Optimization Trajectories Explain Multi-Task Transfer?
26 August 2024
David Mueller
Mark Dredze
Nicholas Andrews
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Can Optimization Trajectories Explain Multi-Task Transfer?"
50 / 50 papers shown
Title
Rep-MTL: Unleashing the Power of Representation-level Task Saliency for Multi-Task Learning
Zedong Wang
Siyuan Li
Dan Xu
139
1
0
28 Jul 2025
Federated Communication-Efficient Multi-Objective Optimization
International Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Baris Askin
Pranay Sharma
Gauri Joshi
Carlee Joe-Wong
FedML
524
2
0
21 Oct 2024
Scalarization for Multi-Task and Multi-Domain Learning at Scale
Neural Information Processing Systems (NeurIPS), 2023
Amelie Royer
Tijmen Blankevoort
B. Bejnordi
203
25
0
13 Oct 2023
Normalization Layers Are All That Sharpness-Aware Minimization Needs
Neural Information Processing Systems (NeurIPS), 2023
Maximilian Mueller
Tiffany J. Vlaar
David Rolnick
Matthias Hein
216
31
0
07 Jun 2023
Identification of Negative Transfers in Multitask Learning Using Surrogate Models
Dongyue Li
Huy Le Nguyen
Hongyang R. Zhang
195
17
0
25 Mar 2023
A Modern Look at the Relationship between Sharpness and Generalization
International Conference on Machine Learning (ICML), 2023
Maksym Andriushchenko
Francesco Croce
Maximilian Müller
Matthias Hein
Nicolas Flammarion
3DH
278
80
0
14 Feb 2023
ForkMerge: Mitigating Negative Transfer in Auxiliary-Task Learning
Neural Information Processing Systems (NeurIPS), 2023
Junguang Jiang
Baixu Chen
Junwei Pan
Ximei Wang
Liu Dapeng
Jie Jiang
Mingsheng Long
MoMe
170
34
0
30 Jan 2023
Disentangling the Mechanisms Behind Implicit Regularization in SGD
International Conference on Learning Representations (ICLR), 2022
Cheng-i Wang
Simran Kaur
Tanya Marwah
Saurabh Garg
Zachary Chase Lipton
FedML
255
2
0
29 Nov 2022
Do Current Multi-Task Optimization Methods in Deep Learning Even Help?
Neural Information Processing Systems (NeurIPS), 2022
Derrick Xin
Behrooz Ghorbani
Ankush Garg
Orhan Firat
Justin Gilmer
MoMe
181
75
0
23 Sep 2022
On the Maximum Hessian Eigenvalue and Generalization
Simran Kaur
Jérémy E. Cohen
Zachary Chase Lipton
296
52
0
21 Jun 2022
Linear Connectivity Reveals Generalization Strategies
International Conference on Learning Representations (ICLR), 2022
Jeevesh Juneja
Rachit Bansal
Kyunghyun Cho
João Sedoc
Naomi Saphra
685
55
0
24 May 2022
Auto-Lambda: Disentangling Dynamic Task Relationships
Shikun Liu
Stephen James
Andrew J. Davison
Edward Johns
252
92
0
07 Feb 2022
Multi-Task Learning as a Bargaining Game
International Conference on Machine Learning (ICML), 2022
Aviv Navon
Aviv Shamsian
Idan Achituve
Haggai Maron
Kenji Kawaguchi
Gal Chechik
Ethan Fetaya
301
205
0
02 Feb 2022
In Defense of the Unitary Scalarization for Deep Multi-Task Learning
Neural Information Processing Systems (NeurIPS), 2022
Vitaly Kurin
Alessandro De Palma
Ilya Kostrikov
Shimon Whiteson
M. P. Kumar
250
85
0
11 Jan 2022
Conflict-Averse Gradient Descent for Multi-task Learning
Bo Liu
Xingchao Liu
Xiaojie Jin
Peter Stone
Qiang Liu
349
421
0
26 Oct 2021
Stochastic Training is Not Necessary for Generalization
Jonas Geiping
Micah Goldblum
Phillip E. Pope
Michael Moeller
Tom Goldstein
311
81
0
29 Sep 2021
RotoGrad: Gradient Homogenization in Multitask Learning
International Conference on Learning Representations (ICLR), 2021
Adrián Javaloy
Isabel Valera
376
105
0
03 Mar 2021
Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability
International Conference on Learning Representations (ICLR), 2021
Jeremy M. Cohen
Simran Kaur
Yuanzhi Li
J. Zico Kolter
Ameet Talwalkar
ODL
353
332
0
26 Feb 2021
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)
Neural Information Processing Systems (NeurIPS), 2021
Zhiyuan Li
Sadhika Malladi
Sanjeev Arora
209
95
0
24 Feb 2021
On the Origin of Implicit Regularization in Stochastic Gradient Descent
International Conference on Learning Representations (ICLR), 2021
Samuel L. Smith
Benoit Dherin
David Barrett
Soham De
MLT
211
220
0
28 Jan 2021
Underspecification Presents Challenges for Credibility in Modern Machine Learning
Alexander DÁmour
Katherine A. Heller
D. Moldovan
Ben Adlam
B. Alipanahi
...
Kellie Webster
Steve Yadlowsky
T. Yun
Xiaohua Zhai
D. Sculley
OffRL
377
750
0
06 Nov 2020
Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent Kernel
Neural Information Processing Systems (NeurIPS), 2020
Stanislav Fort
Gintare Karolina Dziugaite
Mansheej Paul
Sepideh Kharaghani
Daniel M. Roy
Surya Ganguli
253
218
0
28 Oct 2020
Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout
Zhao Chen
Jiquan Ngiam
Yanping Huang
Thang Luong
Henrik Kretzschmar
Yuning Chai
Dragomir Anguelov
152
261
0
14 Oct 2020
Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models
Zirui Wang
Yulia Tsvetkov
Orhan Firat
Yuan Cao
291
225
0
12 Oct 2020
Multi-Task Learning with Deep Neural Networks: A Survey
M. Crawshaw
CVBM
410
708
0
10 Sep 2020
On the Generalization Benefit of Noise in Stochastic Gradient Descent
Samuel L. Smith
Erich Elsen
Soham De
MLT
149
114
0
26 Jun 2020
Balancing Training for Multilingual Neural Machine Translation
Annual Meeting of the Association for Computational Linguistics (ACL), 2020
Xinyi Wang
Yulia Tsvetkov
Graham Neubig
372
110
0
14 Apr 2020
The Two Regimes of Deep Network Training
Guillaume Leclerc
Aleksander Madry
180
49
0
24 Feb 2020
The Early Phase of Neural Network Training
International Conference on Learning Representations (ICLR), 2020
Jonathan Frankle
D. Schwab
Ari S. Morcos
331
188
0
24 Feb 2020
The Break-Even Point on Optimization Trajectories of Deep Neural Networks
International Conference on Learning Representations (ICLR), 2020
Stanislaw Jastrzebski
Maciej Szymczak
Stanislav Fort
Devansh Arpit
Jacek Tabor
Dong Wang
Krzysztof J. Geras
221
183
0
21 Feb 2020
Gradient Surgery for Multi-Task Learning
Neural Information Processing Systems (NeurIPS), 2020
Tianhe Yu
Saurabh Kumar
Abhishek Gupta
Sergey Levine
Karol Hausman
Chelsea Finn
657
1,479
0
19 Jan 2020
Linear Mode Connectivity and the Lottery Ticket Hypothesis
International Conference on Machine Learning (ICML), 2019
Jonathan Frankle
Gintare Karolina Dziugaite
Daniel M. Roy
Michael Carbin
MoMe
623
696
0
11 Dec 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
2.2K
27,489
0
26 Jul 2019
Understanding Generalization through Visualizations
Wenjie Huang
Z. Emam
Micah Goldblum
Liam H. Fowl
J. K. Terry
Furong Huang
Tom Goldstein
AI4CE
255
86
0
07 Jun 2019
Which Tasks Should Be Learned Together in Multi-task Learning?
International Conference on Machine Learning (ICML), 2019
Trevor Scott Standley
Amir Zamir
Dawn Chen
Leonidas Guibas
Jitendra Malik
Silvio Savarese
483
585
0
18 May 2019
An Empirical Model of Large-Batch Training
Sam McCandlish
Jared Kaplan
Dario Amodei
OpenAI Dota Team
718
349
0
14 Dec 2018
Adapting Auxiliary Losses Using Gradient Similarity
Yunshu Du
Wojciech M. Czarnecki
Siddhant M. Jayakumar
Mehrdad Farajtabar
Razvan Pascanu
Balaji Lakshminarayanan
348
169
0
05 Dec 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
2.8K
106,623
0
11 Oct 2018
Multi-Task Learning as Multi-Objective Optimization
Ozan Sener
V. Koltun
469
1,502
0
10 Oct 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
1.6K
7,938
0
20 Apr 2018
Three Factors Influencing Minima in SGD
Stanislaw Jastrzebski
Zachary Kenton
Devansh Arpit
Nicolas Ballas
Asja Fischer
Yoshua Bengio
Amos Storkey
331
502
0
13 Nov 2017
GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks
Zhao Chen
Vijay Badrinarayanan
Chen-Yu Lee
Andrew Rabinovich
ODL
410
1,546
0
07 Nov 2017
Don't Decay the Learning Rate, Increase the Batch Size
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
ODL
548
1,071
0
01 Nov 2017
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms
Han Xiao
Kashif Rasul
Roland Vollgraf
662
9,895
0
25 Aug 2017
Rethinking Atrous Convolution for Semantic Image Segmentation
Liang-Chieh Chen
George Papandreou
Florian Schroff
Hartwig Adam
SSeg
611
9,353
0
17 Jun 2017
An Overview of Multi-Task Learning in Deep Neural Networks
Sebastian Ruder
CVBM
388
3,034
0
15 Jun 2017
Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data
Gintare Karolina Dziugaite
Daniel M. Roy
358
881
0
31 Mar 2017
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
1.1K
3,200
0
15 Sep 2016
The Cityscapes Dataset for Semantic Urban Scene Understanding
Marius Cordts
Mohamed Omran
Sebastian Ramos
Timo Rehfeld
Markus Enzweiler
Rodrigo Benenson
Uwe Franke
Stefan Roth
Bernt Schiele
1.6K
12,710
0
06 Apr 2016
Deep Learning Face Attributes in the Wild
IEEE International Conference on Computer Vision (ICCV), 2014
Ziwei Liu
Ping Luo
Xiaogang Wang
Xiaoou Tang
CVBM
1.3K
9,112
0
28 Nov 2014
1