ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1808.06226
  4. Cited By
SentencePiece: A simple and language independent subword tokenizer and
  detokenizer for Neural Text Processing

SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing

19 August 2018
Taku Kudo
John Richardson
ArXiv (abs)PDFHTMLGithub (10925★)

Papers citing "SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing"

50 / 2,061 papers shown
Title
A deep-learning view of chemical space designed to facilitate drug
  discovery
A deep-learning view of chemical space designed to facilitate drug discoveryJournal of Chemical Information and Modeling (JCIM), 2020
P. Maragakis
Hunter M. Nisonoff
B. Cole
D. Shaw
187
33
0
07 Feb 2020
Graph Constrained Reinforcement Learning for Natural Language Action
  Spaces
Graph Constrained Reinforcement Learning for Natural Language Action SpacesInternational Conference on Learning Representations (ICLR), 2020
Prithviraj Ammanabrolu
Matthew J. Hausknecht
AI4CELLMAG
207
135
0
23 Jan 2020
Pre-training via Leveraging Assisting Languages and Data Selection for
  Neural Machine Translation
Pre-training via Leveraging Assisting Languages and Data Selection for Neural Machine Translation
Israfel Salazar
Mary Dabre
Zhuoyuan Mao
Fei Cheng
Sadao Kurohashi
Eiichiro Sumita
128
2
0
23 Jan 2020
Multilingual Denoising Pre-training for Neural Machine Translation
Multilingual Denoising Pre-training for Neural Machine TranslationTransactions of the Association for Computational Linguistics (TACL), 2020
Yinhan Liu
Jiatao Gu
Naman Goyal
Xian Li
Sergey Edunov
Marjan Ghazvininejad
M. Lewis
Luke Zettlemoyer
AI4CEAIMat
576
1,963
0
22 Jan 2020
Normalization of Input-output Shared Embeddings in Text Generation
  Models
Normalization of Input-output Shared Embeddings in Text Generation Models
Jinyang Liu
Yujia Zhai
Zizhong Chen
121
0
0
22 Jan 2020
Unsupervised Sentiment Analysis for Code-mixed Data
Unsupervised Sentiment Analysis for Code-mixed Data
Siddharth Yadav
Tanmoy Chakraborty
141
15
0
20 Jan 2020
Streaming automatic speech recognition with the transformer model
Streaming automatic speech recognition with the transformer modelIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Niko Moritz
Takaaki Hori
Jonathan Le Roux
346
198
0
08 Jan 2020
Language Models Are An Effective Patient Representation Learning
  Technique For Electronic Health Record Data
Language Models Are An Effective Patient Representation Learning Technique For Electronic Health Record DataJournal of Biomedical Informatics (JBI), 2020
E. Steinberg
Kenneth Jung
Jason Alan Fries
Conor K. Corbin
Stephen Pfohl
N. Shah
230
141
0
06 Jan 2020
Exploring Benefits of Transfer Learning in Neural Machine Translation
Exploring Benefits of Transfer Learning in Neural Machine Translation
Tom Kocmi
183
18
0
06 Jan 2020
A Comprehensive Survey of Multilingual Neural Machine Translation
A Comprehensive Survey of Multilingual Neural Machine Translation
Mary Dabre
Chenhui Chu
Anoop Kunchukuttan
LRM
340
34
0
04 Jan 2020
TED: A Pretrained Unsupervised Summarization Model with Theme Modeling
  and Denoising
TED: A Pretrained Unsupervised Summarization Model with Theme Modeling and DenoisingFindings (Findings), 2020
Ziyi Yang
Chenguang Zhu
R. Gmyr
Michael Zeng
Xuedong Huang
Eric Darve
317
64
0
03 Jan 2020
Leveraging Lead Bias for Zero-shot Abstractive News Summarization
Leveraging Lead Bias for Zero-shot Abstractive News Summarization
Chenguang Zhu
Ziyi Yang
R. Gmyr
Michael Zeng
Xuedong Huang
190
20
0
25 Dec 2019
BERTje: A Dutch BERT Model
BERTje: A Dutch BERT Model
Wietse de Vries
Andreas van Cranenburgh
Arianna Bisazza
Tommaso Caselli
Gertjan van Noord
Malvina Nissim
VLMSSeg
191
316
0
19 Dec 2019
Multilingual is not enough: BERT for Finnish
Multilingual is not enough: BERT for Finnish
Antti Virtanen
Jenna Kanerva
Rami Ilo
Jouni Luoma
Juhani Luotolahti
T. Salakoski
Filip Ginter
S. Pyysalo
210
298
0
15 Dec 2019
Personalized Patent Claim Generation and Measurement
Personalized Patent Claim Generation and Measurement
Jieh-Sheng Lee
166
4
0
07 Dec 2019
Neural Machine Translation: A Review and Survey
Neural Machine Translation: A Review and SurveyJournal of Artificial Intelligence Research (JAIR), 2019
Felix Stahlberg
3DVAI4TSMedIm
343
377
0
04 Dec 2019
Using Sequence-to-Sequence Learning for Repairing C Vulnerabilities
Using Sequence-to-Sequence Learning for Repairing C Vulnerabilities
Zimin Chen
Steve Kommrusch
Monperrus Martin
70
5
0
04 Dec 2019
Leveraging Contextual Embeddings for Detecting Diachronic Semantic Shift
Leveraging Contextual Embeddings for Detecting Diachronic Semantic ShiftInternational Conference on Language Resources and Evaluation (LREC), 2019
Matej Martinc
Petra Kralj Novak
Senja Pollak
255
75
0
02 Dec 2019
Fiction Sentence Expansion and Enhancement via Focused Objective and
  Novelty Curve Sampling
Fiction Sentence Expansion and Enhancement via Focused Objective and Novelty Curve SamplingIEEE International Conference on Tools with Artificial Intelligence (ICTAI), 2019
Yuri Safovich
A. Azaria
160
7
0
02 Dec 2019
Jejueo Datasets for Machine Translation and Speech Synthesis
Jejueo Datasets for Machine Translation and Speech SynthesisInternational Conference on Language Resources and Evaluation (LREC), 2019
Kyubyong Park
Yo Joong Choe
Jiyeon Ham
62
8
0
27 Nov 2019
Simultaneous Neural Machine Translation using Connectionist Temporal
  Classification
Simultaneous Neural Machine Translation using Connectionist Temporal Classification
Katsuki Chousa
Katsuhito Sudoh
Satoshi Nakamura
90
5
0
27 Nov 2019
JParaCrawl: A Large Scale Web-Based English-Japanese Parallel Corpus
JParaCrawl: A Large Scale Web-Based English-Japanese Parallel CorpusInternational Conference on Language Resources and Evaluation (LREC), 2019
Makoto Morishita
Jun Suzuki
Masaaki Nagata
LRM
274
67
0
25 Nov 2019
End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern
  Architectures
End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures
Gabriel Synnaeve
Qiantong Xu
Jacob Kahn
Tatiana Likhomanenko
Edouard Grave
Vineel Pratap
Anuroop Sriram
Vitaliy Liptchinsky
R. Collobert
SSLAI4TS
466
260
0
19 Nov 2019
The Eighth Dialog System Technology Challenge
The Eighth Dialog System Technology Challenge
Seokhwan Kim
Michel Galley
Chulaka Gunasekara
Sungjin Lee
Adam Atkinson
...
Tim K. Marks
Abhinav Rastogi
Xiaoxue Zang
Srinivas Sunkara
Raghav Gupta
VLM
134
66
0
14 Nov 2019
CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEB
CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEBAnnual Meeting of the Association for Computational Linguistics (ACL), 2019
Holger Schwenk
Guillaume Wenzek
Sergey Edunov
Edouard Grave
Armand Joulin
252
289
0
10 Nov 2019
A Bilingual Generative Transformer for Semantic Sentence Embedding
A Bilingual Generative Transformer for Semantic Sentence EmbeddingConference on Empirical Methods in Natural Language Processing (EMNLP), 2019
John Wieting
Graham Neubig
Taylor Berg-Kirkpatrick
174
32
0
10 Nov 2019
CamemBERT: a Tasty French Language Model
CamemBERT: a Tasty French Language ModelAnnual Meeting of the Association for Computational Linguistics (ACL), 2019
Louis Martin
Benjamin Muller
Pedro Ortiz Suarez
Yoann Dupont
Laurent Romary
Eric Villemonte de la Clergerie
Djamé Seddah
Benoît Sagot
467
1,046
0
10 Nov 2019
CCAligned: A Massive Collection of Cross-Lingual Web-Document Pairs
CCAligned: A Massive Collection of Cross-Lingual Web-Document PairsConference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Ahmed El-Kishky
Vishrav Chaudhary
Francisco Guzman
Philipp Koehn
230
218
0
10 Nov 2019
Enforcing Encoder-Decoder Modularity in Sequence-to-Sequence Models
Enforcing Encoder-Decoder Modularity in Sequence-to-Sequence Models
Siddharth Dalmia
Abdel-rahman Mohamed
M. Lewis
Florian Metze
Luke Zettlemoyer
143
12
0
09 Nov 2019
A Simplified Fully Quantized Transformer for End-to-end Speech
  Recognition
A Simplified Fully Quantized Transformer for End-to-end Speech Recognition
Alex Bie
Bharat Venkitesh
João Monteiro
Md. Akmal Haidar
Mehdi Rezagholizadeh
MQ
446
27
0
09 Nov 2019
Unsupervised Cross-lingual Representation Learning at Scale
Unsupervised Cross-lingual Representation Learning at ScaleAnnual Meeting of the Association for Computational Linguistics (ACL), 2019
Alexis Conneau
Kartikay Khandelwal
Naman Goyal
Vishrav Chaudhary
Guillaume Wenzek
Francisco Guzmán
Edouard Grave
Myle Ott
Luke Zettlemoyer
Veselin Stoyanov
453
7,540
0
05 Nov 2019
RNN-T For Latency Controlled ASR With Improved Beam Search
RNN-T For Latency Controlled ASR With Improved Beam Search
Mahaveer Jain
Kjell Schubert
Jay Mahadeokar
Ching-Feng Yeh
Kaustubh Kalgaonkar
Anuroop Sriram
Christian Fuegen
M. Seltzer
205
46
0
05 Nov 2019
Machine Translation of Restaurant Reviews: New Corpus for Domain
  Adaptation and Robustness
Machine Translation of Restaurant Reviews: New Corpus for Domain Adaptation and RobustnessConference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Alexandre Berard
Ioan Calapodescu
Marc Dymetman
Claude Roux
Jean-Luc Meunier
Vassilina Nikoulina
123
29
0
31 Oct 2019
Naver Labs Europe's Systems for the Document-Level Generation and
  Translation Task at WNGT 2019
Naver Labs Europe's Systems for the Document-Level Generation and Translation Task at WNGT 2019Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Fahimeh Saleh
Alexandre Berard
Ioan Calapodescu
Laurent Besacier
VLM
152
14
0
31 Oct 2019
Fill in the Blanks: Imputing Missing Sentences for Larger-Context Neural
  Machine Translation
Fill in the Blanks: Imputing Missing Sentences for Larger-Context Neural Machine Translation
Sébastien Jean
Ankur Bapna
Orhan Firat
146
7
0
30 Oct 2019
Transformer-based Cascaded Multimodal Speech Translation
Transformer-based Cascaded Multimodal Speech TranslationInternational Workshop on Spoken Language Translation (IWSLT), 2019
Zixiu "Alex" Wu
Ozan Caglayan
Julia Ive
Josiah Wang
Lucia Specia
163
7
0
29 Oct 2019
Big Bidirectional Insertion Representations for Documents
Big Bidirectional Insertion Representations for DocumentsConference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Lala Li
William Chan
132
4
0
29 Oct 2019
Transformer-Transducer: End-to-End Speech Recognition with
  Self-Attention
Transformer-Transducer: End-to-End Speech Recognition with Self-Attention
Ching-Feng Yeh
Jay Mahadeokar
Kaustubh Kalgaonkar
Yongqiang Wang
Duc Le
Mahaveer Jain
Kjell Schubert
Christian Fuegen
M. Seltzer
179
159
0
28 Oct 2019
Evaluating Lottery Tickets Under Distributional Shifts
Evaluating Lottery Tickets Under Distributional ShiftsConference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Shrey Desai
Hongyuan Zhan
Ahmed Aly
UQCVOOD
150
42
0
28 Oct 2019
Modeling Inter-Speaker Relationship in XLNet for Contextual Spoken
  Language Understanding
Modeling Inter-Speaker Relationship in XLNet for Contextual Spoken Language Understanding
Jonggu Kim
Jong-Hyeok Lee
87
1
0
28 Oct 2019
Training ASR models by Generation of Contextual Information
Training ASR models by Generation of Contextual InformationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019
Kritika Singh
Dmytro Okhonko
Jun Liu
Yongqiang Wang
Frank Zhang
...
Sergey Edunov
Fuchun Peng
Yatharth Saraf
Geoffrey Zweig
Abdel-rahman Mohamed
132
7
0
27 Oct 2019
On the Cross-lingual Transferability of Monolingual Representations
On the Cross-lingual Transferability of Monolingual RepresentationsAnnual Meeting of the Association for Computational Linguistics (ACL), 2019
Mikel Artetxe
Sebastian Ruder
Dani Yogatama
577
851
0
25 Oct 2019
Exploring Lexicon-Free Modeling Units for End-to-End Korean and
  Korean-English Code-Switching Speech Recognition
Exploring Lexicon-Free Modeling Units for End-to-End Korean and Korean-English Code-Switching Speech RecognitionInterspeech (Interspeech), 2019
Jisung Wang
Jihwan Kim
Sangki Kim
Yeha Lee
93
5
0
25 Oct 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
  Transformer
Exploring the Limits of Transfer Learning with a Unified Text-to-Text TransformerJournal of machine learning research (JMLR), 2019
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
1.5K
23,572
0
23 Oct 2019
Sticking to the Facts: Confident Decoding for Faithful Data-to-Text
  Generation
Sticking to the Facts: Confident Decoding for Faithful Data-to-Text Generation
Ran Tian
Shashi Narayan
Thibault Sellam
Ankur P. Parikh
HILM
280
101
0
19 Oct 2019
End-to-End Speech Recognition: A review for the French Language
End-to-End Speech Recognition: A review for the French Language
Florian Boyer
Jean-Luc Rouas
AI4TS
141
10
0
18 Oct 2019
Using Whole Document Context in Neural Machine Translation
Using Whole Document Context in Neural Machine TranslationInternational Workshop on Spoken Language Translation (IWSLT), 2019
Valentin Macé
Christophe Servan
139
28
0
16 Oct 2019
Facebook AI's WAT19 Myanmar-English Translation Task Submission
Facebook AI's WAT19 Myanmar-English Translation Task SubmissionConference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Peng-Jen Chen
Jiajun Shen
Matt Le
Vishrav Chaudhary
Ahmed El-Kishky
Guillaume Wenzek
Myle Ott
MarcÁurelio Ranzato
108
29
0
15 Oct 2019
Novel Applications of Factored Neural Machine Translation
Novel Applications of Factored Neural Machine Translation
P. Wilken
E. Matusov
AAML
137
15
0
09 Oct 2019
On Leveraging the Visual Modality for Neural Machine Translation
On Leveraging the Visual Modality for Neural Machine TranslationInternational Conference on Natural Language Generation (INLG), 2019
Vikas Raunak
Sang Keun Choe
Quanyang Lu
Yi Xu
Florian Metze
100
11
0
07 Oct 2019
Previous
123...39404142
Next