Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1808.06226
Cited By
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
19 August 2018
Taku Kudo
John Richardson
Re-assign community
ArXiv (abs)
PDF
HTML
Github (10925★)
Papers citing
"SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing"
50 / 2,063 papers shown
A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech Recognition Baseline
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2020
Yerbolat Khassanov
Saida Mussakhojayeva
A. Mirzakhmetov
A. Adiyev
Mukhamet Nurpeiissov
H. A. Varol
128
38
0
22 Sep 2020
Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Tahmid Hasan
Abhik Bhattacharjee
Kazi Samin Mubasshir
Masum Hasan
Madhusudan Basak
M. Rahman
Rifat Shahriyar
VLM
261
89
0
20 Sep 2020
Will it Unblend?
Findings (Findings), 2020
Yuval Pinter
Cassandra L. Jacobs
Jacob Eisenstein
151
15
0
18 Sep 2020
Automated Source Code Generation and Auto-completion Using Deep Learning: Comparing and Discussing Current Language-Model-Related Approaches
Applied Informatics (AI), 2020
Juan Cruz-Benito
Sanjay Vishwakarma
Francisco Martín-Fernández
Ismael Faro Ibm Quantum
330
32
0
16 Sep 2020
Extremely Low Bit Transformer Quantization for On-Device Neural Machine Translation
Findings (Findings), 2020
Insoo Chung
Byeongwook Kim
Yoonjung Choi
S. Kwon
Yongkweon Jeon
Baeseong Park
Sangha Kim
Dongsoo Lee
MQ
234
29
0
16 Sep 2020
Multi-span Style Extraction for Generative Reading Comprehension
Junjie Yang
Zhuosheng Zhang
Hai Zhao
SyDa
181
15
0
15 Sep 2020
Iterative Refinement in the Continuous Space for Non-Autoregressive Neural Machine Translation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Jason D. Lee
Raphael Shu
Dong Wang
123
28
0
15 Sep 2020
KoSpeech: Open-Source Toolkit for End-to-End Korean Speech Recognition
Soohwan Kim
Seyoung Bae
Cheolhwang Won
VLM
168
4
0
07 Sep 2020
UPB at SemEval-2020 Task 8: Joint Textual and Visual Modeling in a Multi-Task Learning Architecture for Memotion Analysis
International Workshop on Semantic Evaluation (SemEval), 2020
G. Vlad
George-Eduard Zaharia
Dumitru-Clementin Cercel
Costin-Gabriel Chiru
Stefan Trausan-Matu
142
38
0
06 Sep 2020
GREEK-BERT: The Greeks visiting Sesame Street
Hellenic Conference on Artificial Intelligence (HAI), 2020
John Koutsikakis
Ilias Chalkidis
Prodromos Malakasiotis
Ion Androutsopoulos
184
104
0
27 Aug 2020
Multi-Label Sentiment Analysis on 100 Languages with Dynamic Weighting for Label Imbalance
IEEE Transactions on Neural Networks and Learning Systems (IEEE TNNLS), 2020
Selim F. Yilmaz
E. Kaynak
Aykut Koç
H. Dibeklioğlu
Suleyman S. Kozat
157
33
0
26 Aug 2020
JokeMeter at SemEval-2020 Task 7: Convolutional humor
International Workshop on Semantic Evaluation (SemEval), 2020
Martin Docekal
Martin Fajcik
Josef Jon
Pavel Smrz
150
2
0
25 Aug 2020
HinglishNLP: Fine-tuned Language Models for Hinglish Sentiment Detection
Meghana Bhange
Nirant Kasliwal
76
8
0
22 Aug 2020
Neural Machine Translation without Embeddings
Uri Shaham
Omer Levy
229
17
0
21 Aug 2020
PTT5: Pretraining and validating the T5 model on Brazilian Portuguese data
Diedre Carmo
Marcos Piau
Israel Campiotti
Rodrigo Nogueira
R. Lotufo
LM&MA
129
64
0
20 Aug 2020
Inducing Language-Agnostic Multilingual Representations
Wei Zhao
Steffen Eger
Johannes Bjerva
Isabelle Augenstein
207
70
0
20 Aug 2020
Lite Training Strategies for Portuguese-English and English-Portuguese Translation
Alexandre Lopes
Rodrigo Nogueira
R. Lotufo
Hélio Pedrini
100
9
0
20 Aug 2020
Finding Fast Transformers: One-Shot Neural Architecture Search by Component Composition
Henry Tsai
Jayden Ooi
Chun-Sung Ferng
Hyung Won Chung
Jason Riesa
ViT
145
21
0
15 Aug 2020
Speech To Semantics: Improve ASR and NLU Jointly via All-Neural Interfaces
Milind Rao
A. Raju
Pranav Dheram
Bach Bui
Ariya Rastrow
127
43
0
14 Aug 2020
Conv-Transformer Transducer: Low Latency, Low Frame Rate, Streamable End-to-End Speech Recognition
Interspeech (Interspeech), 2020
Wenyong Huang
Wenchao Hu
Y. Yeung
Xiao Chen
158
52
0
13 Aug 2020
Paraphrase Generation as Zero-Shot Multilingual Translation: Disentangling Semantic Similarity from Lexical and Syntactic Diversity
Conference on Machine Translation (WMT), 2020
Brian Thompson
Matt Post
245
61
0
11 Aug 2020
Revisiting Low Resource Status of Indian Languages in Machine Translation
Jerin Philip
Shashank Siripragada
Vinay P. Namboodiri
C. V. Jawahar
254
30
0
11 Aug 2020
A Multilingual Neural Machine Translation Model for Biomedical Data
Alexandre Berard
Min Namgung
Vassilina Nikoulina
Eunjeong Lucy Park
Matthias Gallé
136
15
0
06 Aug 2020
Designing the Business Conversation Corpus
Matīss Rikters
Ryokan Ri
Tong Li
Toshiaki Nakazawa
101
24
0
05 Aug 2020
A Survey of Orthographic Information in Machine Translation
Bharathi Raja Chakravarthi
P. Rani
Mihael Arcan
John P. Mccrae
149
35
0
04 Aug 2020
Multilingual Translation with Extensible Multilingual Pretraining and Finetuning
Y. Tang
C. Tran
Xian Li
Peng-Jen Chen
Naman Goyal
Vishrav Chaudhary
Jiatao Gu
Angela Fan
CLL
348
522
0
02 Aug 2020
Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing
Yu Gu
Robert Tinn
Hao Cheng
Michael R. Lucas
Naoto Usuyama
Xiaodong Liu
Tristan Naumann
Jianfeng Gao
Hoifung Poon
LM&MA
AI4CE
669
2,143
0
31 Jul 2020
COVID-19 therapy target discovery with context-aware literature mining
IFIP Working Conference on Database Semantics (IWDS), 2020
Matej Martinc
Blaž Škrlj
S. Pirkmajer
Nada Lavrac
B. Cestnik
Martin Marzidovsek
Senja Pollak
143
9
0
30 Jul 2020
BUT-FIT at SemEval-2020 Task 5: Automatic detection of counterfactual statements with deep pre-trained language representation models
International Workshop on Semantic Evaluation (SemEval), 2020
Martin Fajcik
Josef Jon
Martin Docekal
Pavel Smrz
92
11
0
28 Jul 2020
Big Bird: Transformers for Longer Sequences
Neural Information Processing Systems (NeurIPS), 2020
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
1.3K
2,520
0
28 Jul 2020
Active Learning for Video Description With Cluster-Regularized Ensemble Ranking
Asian Conference on Computer Vision (ACCV), 2020
David M. Chan
Sudheendra Vijayanarasimhan
David A. Ross
John F. Canny
VLM
274
6
0
27 Jul 2020
Consistent Transcription and Translation of Speech
Transactions of the Association for Computational Linguistics (TACL), 2020
Matthias Sperber
Hendra Setiawan
Christian Gollan
Udhyakumar Nallasamy
Matthias Paulik
216
20
0
24 Jul 2020
FiSSA at SemEval-2020 Task 9: Fine-tuned For Feelings
International Workshop on Semantic Evaluation (SemEval), 2020
Bertelt Braaksma
R. Scholtens
Stan van Suijlekom
Remy Wang
Ahmet Üstün
256
3
0
24 Jul 2020
CoVoST 2 and Massively Multilingual Speech-to-Text Translation
Changhan Wang
Anne Wu
J. Pino
SLR
175
90
0
20 Jul 2020
Mono vs Multilingual Transformer-based Models: a Comparison across Several Language Tasks
Diego de Vargas Feijó
V. Moreira
MILM
90
10
0
19 Jul 2020
Drinking from a Firehose: Continual Learning with Web-scale Natural Language
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020
Hexiang Hu
Ozan Sener
Fei Sha
V. Koltun
CLL
224
28
0
18 Jul 2020
A Multilingual Parallel Corpora Collection Effort for Indian Languages
International Conference on Language Resources and Evaluation (LREC), 2020
Shashank Siripragrada
Jerin Philip
Vinay P. Namboodiri
C. V. Jawahar
VLM
126
58
0
15 Jul 2020
Multi-Dialect Arabic BERT for Country-Level Dialect Identification
Workshop on Arabic Natural Language Processing (WANLP), 2020
Bashar Talafha
Mohammad Ali
Muhy Eddin Za'ter
Haitham Seelawi
Ibraheem Tuffaha
Mostafa Samir
Wael Farhan
Hussein T. Al-Natsheh
146
66
0
10 Jul 2020
What Can We Learn From Almost a Decade of Food Tweets
Uga Sprocgis
Matīss Rikters
106
12
0
10 Jul 2020
scb-mt-en-th-2020: A Large English-Thai Parallel Corpus
Lalita Lowphansirikul
Charin Polpanumas
Attapol T. Rutherford
Sarana Nutanong
LRM
107
26
0
07 Jul 2020
Deep Contextual Embeddings for Address Classification in E-commerce
Shreyas Mangalgi
Lakshya Kumar
Ravindra Babu Tallamraju
113
8
0
06 Jul 2020
Pretrained Generalized Autoregressive Model with Adaptive Probabilistic Label Clusters for Extreme Multi-label Text Classification
Hui Ye
Zhiyu Zoey Chen
Da-han Wang
Brian D. Davison
VLM
236
56
0
05 Jul 2020
Playing with Words at the National Library of Sweden -- Making a Swedish BERT
Martin Malmsten
Love Borjeson
Chris Haffenden
149
133
0
03 Jul 2020
Iterative Paraphrastic Augmentation with Discriminative Span Alignment
Ryan Culkin
J. E. Hu
Elias Stengel-Eskin
Guanghui Qin
Benjamin Van Durme
113
6
0
01 Jul 2020
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Dmitry Lepikhin
HyoukJoong Lee
Yuanzhong Xu
Dehao Chen
Orhan Firat
Yanping Huang
M. Krikun
Noam M. Shazeer
Zhiwen Chen
MoE
392
1,619
0
30 Jun 2020
Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for Improved Generalization
Sang Michael Xie
Tengyu Ma
Abigail Z. Jacobs
354
17
0
29 Jun 2020
A High-Quality Multilingual Dataset for Structured Documentation Translation
Conference on Machine Translation (WMT), 2020
Kazuma Hashimoto
Raffaella Buschiazzo
James Bradbury
Teresa Marshall
R. Socher
Caiming Xiong
132
23
0
24 Jun 2020
Exploring Software Naturalness through Neural Language Models
Luca Buratti
Saurabh Pujar
Mihaela A. Bornea
Scott McCarley
Yunhui Zheng
...
Alessandro Morari
Jim Laredo
Veronika Thost
Yufan Zhuang
Giacomo Domeniconi
203
101
0
22 Jun 2020
ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion
Bingning Wang
Ting Yao
Tao Gui
Jingfang Xu
Xiaochuan Wang
RALM
179
23
0
22 Jun 2020
Self-Supervised Representations Improve End-to-End Speech Translation
Anne Wu
Changhan Wang
J. Pino
Jiatao Gu
SSL
252
42
0
22 Jun 2020
Previous
1
2
3
...
36
37
38
...
40
41
42
Next