Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.01694
Cited By
On Losses for Modern Language Models
4 October 2020
Stephane Aroca-Ouellette
Frank Rudzicz
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On Losses for Modern Language Models"
22 / 22 papers shown
Title
One Model to Train them All: Hierarchical Self-Distillation for Enhanced Early Layer Embeddings
Andrea Gurioli
Federico Pennino
João Monteiro
Maurizio Gabbrielli
46
0
0
04 Mar 2025
Ignore Me But Don't Replace Me: Utilizing Non-Linguistic Elements for Pretraining on the Cybersecurity Domain
Eugene Jang
Jian Cui
Dayeon Yim
Youngjin Jin
Jin-Woo Chung
Seung-Eui Shin
Yongjae Lee
57
2
0
15 Mar 2024
Digger: Detecting Copyright Content Mis-usage in Large Language Model Training
Haodong Li
Gelei Deng
Yi Liu
Kailong Wang
Yuekang Li
Tianwei Zhang
Yang Liu
Guoai Xu
Guosheng Xu
Haoyu Wang
16
25
0
01 Jan 2024
A State-Vector Framework for Dataset Effects
E. Sahak
Zining Zhu
Frank Rudzicz
25
1
0
17 Oct 2023
How does the task complexity of masked pretraining objectives affect downstream performance?
Atsuki Yamaguchi
Hiroaki Ozaki
Terufumi Morishita
Gaku Morio
Yasuhiro Sogawa
17
2
0
18 May 2023
An Experimental Study on Pretraining Transformers from Scratch for IR
Carlos Lassance
Hervé Déjean
S. Clinchant
23
11
0
25 Jan 2023
Training self-supervised peptide sequence models on artificially chopped proteins
Gil Sadeh
Zichen Wang
J. Grewal
Huzefa Rangwala
Layne Price
19
2
0
09 Nov 2022
Learning Instructions with Unlabeled Data for Zero-Shot Cross-Task Generalization
Yuxian Gu
Pei Ke
Xiaoyan Zhu
Minlie Huang
ALM
31
17
0
17 Oct 2022
Multi-CLS BERT: An Efficient Alternative to Traditional Ensembling
Haw-Shiuan Chang
Ruei-Yao Sun
Kathryn Ricci
Andrew McCallum
41
14
0
10 Oct 2022
RankGen: Improving Text Generation with Large Ranking Models
Kalpesh Krishna
Yapei Chang
John Wieting
Mohit Iyyer
AIMat
16
68
0
19 May 2022
How does the pre-training objective affect what large language models learn about linguistic properties?
Ahmed Alajrami
Nikolaos Aletras
23
20
0
20 Mar 2022
Assemble Foundation Models for Automatic Code Summarization
Jian Gu
P. Salza
H. Gall
28
34
0
13 Jan 2022
Fine-Tuning Large Neural Language Models for Biomedical Natural Language Processing
Robert Tinn
Hao Cheng
Yu Gu
Naoto Usuyama
Xiaodong Liu
Tristan Naumann
Jianfeng Gao
Hoifung Poon
LM&MA
17
111
0
15 Dec 2021
Building Chinese Biomedical Language Models via Multi-Level Text Discrimination
Quan Wang
Songtai Dai
Benfeng Xu
Yajuan Lyu
Yong Zhu
Hua-Hong Wu
Haifeng Wang
63
14
0
14 Oct 2021
Should We Be Pre-training? An Argument for End-task Aware Training as an Alternative
Lucio Dery
Paul Michel
Ameet Talwalkar
Graham Neubig
CLL
20
35
0
15 Sep 2021
Frustratingly Simple Pretraining Alternatives to Masked Language Modeling
Atsuki Yamaguchi
G. Chrysostomou
Katerina Margatina
Nikolaos Aletras
22
25
0
04 Sep 2021
Training ELECTRA Augmented with Multi-word Selection
Jiaming Shen
Jialu Liu
Tianqi Liu
Cong Yu
Jiawei Han
29
9
0
31 May 2021
AMMU : A Survey of Transformer-based Biomedical Pretrained Language Models
Katikapalli Subramanyam Kalyan
A. Rajasekharan
S. Sangeetha
LM&MA
MedIm
18
164
0
16 Apr 2021
A Primer on Contrastive Pretraining in Language Processing: Methods, Lessons Learned and Perspectives
Nils Rethmeier
Isabelle Augenstein
SSL
VLM
85
90
0
25 Feb 2021
BENDR: using transformers and a contrastive self-supervised learning task to learn from massive amounts of EEG data
Demetres Kostas
Stephane Aroca-Ouellette
Frank Rudzicz
SSL
41
202
0
28 Jan 2021
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,950
0
20 Apr 2018
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,743
0
26 Sep 2016
1