ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1808.06226
  4. Cited By
SentencePiece: A simple and language independent subword tokenizer and
  detokenizer for Neural Text Processing

SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing

19 August 2018
Taku Kudo
John Richardson
ArXiv (abs)PDFHTMLGithub (10925★)

Papers citing "SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing"

50 / 2,061 papers shown
Title
Cross-lingual Retrieval for Iterative Self-Supervised Training
Cross-lingual Retrieval for Iterative Self-Supervised Training
C. Tran
Y. Tang
Xian Li
Jiatao Gu
RALM
175
74
0
16 Jun 2020
Modeling Graph Structure via Relative Position for Text Generation from
  Knowledge Graphs
Modeling Graph Structure via Relative Position for Text Generation from Knowledge Graphs
Martin Schmitt
Leonardo F. R. Ribeiro
Philipp Dufter
Iryna Gurevych
Hinrich Schütze
GNN
211
8
0
16 Jun 2020
Exploration of End-to-End ASR for OpenSTT -- Russian Open Speech-to-Text
  Dataset
Exploration of End-to-End ASR for OpenSTT -- Russian Open Speech-to-Text Dataset
A. Andrusenko
A. Laptev
Ivan Medennikov
VLM
220
13
0
15 Jun 2020
Transferring Monolingual Model to Low-Resource Language: The Case of
  Tigrinya
Transferring Monolingual Model to Low-Resource Language: The Case of TigrinyaApplied Computing and Intelligence (ACI), 2020
Abrhalei Tela
Abraham Woubie
Ville Hautamaki
158
17
0
13 Jun 2020
VirTex: Learning Visual Representations from Textual Annotations
VirTex: Learning Visual Representations from Textual AnnotationsComputer Vision and Pattern Recognition (CVPR), 2020
Karan Desai
Justin Johnson
SSLVLM
432
465
0
11 Jun 2020
Pre-training Polish Transformer-based Language Models at Scale
Pre-training Polish Transformer-based Language Models at Scale
Slawomir Dadas
Michal Perelkiewicz
Rafal Poswiata
186
43
0
07 Jun 2020
Unsupervised Translation of Programming Languages
Unsupervised Translation of Programming Languages
Marie-Anne Lachaux
Baptiste Roziere
L. Chanussot
Guillaume Lample
315
495
0
05 Jun 2020
ELITR Non-Native Speech Translation at IWSLT 2020
ELITR Non-Native Speech Translation at IWSLT 2020
Dominik Machávcek
Jonávs Kratochvíl
Sangeet Sagar
Matúvs vZilinec
Ondrej Bojar
T. Nguyen
Felix Schneider
P. Williams
Yuekun Yao
112
11
0
05 Jun 2020
Contextual RNN-T For Open Domain ASR
Contextual RNN-T For Open Domain ASRInterspeech (Interspeech), 2020
Mahaveer Jain
Gil Keren
Jay Mahadeokar
Geoffrey Zweig
Florian Metze
Yatharth Saraf
209
119
0
04 Jun 2020
M3P: Learning Universal Representations via Multitask Multilingual
  Multimodal Pre-training
M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training
Minheng Ni
Haoyang Huang
Lin Su
Edward Cui
Taroon Bharti
Lijuan Wang
Jianfeng Gao
Dongdong Zhang
Nan Duan
248
7
0
04 Jun 2020
Self-Training for End-to-End Speech Translation
Self-Training for End-to-End Speech TranslationInterspeech (Interspeech), 2020
J. Pino
Qiantong Xu
Xutai Ma
M. Dousti
Yun Tang
214
68
0
03 Jun 2020
WikiBERT models: deep transfer learning for many languages
WikiBERT models: deep transfer learning for many languagesNordic Conference of Computational Linguistics (NODALIDA), 2020
S. Pyysalo
Jenna Kanerva
Antti Virtanen
Filip Ginter
KELM
151
39
0
02 Jun 2020
Cascaded Text Generation with Markov Transformers
Cascaded Text Generation with Markov TransformersNeural Information Processing Systems (NeurIPS), 2020
Yuntian Deng
Alexander M. Rush
170
15
0
01 Jun 2020
Neural Simultaneous Speech Translation Using Alignment-Based Chunking
Neural Simultaneous Speech Translation Using Alignment-Based ChunkingInternational Workshop on Spoken Language Translation (IWSLT), 2020
P. Wilken
Tamer Alkhouli
E. Matusov
Pavel Golik
127
16
0
29 May 2020
The SIGMORPHON 2020 Shared Task on Unsupervised Morphological Paradigm
  Completion
The SIGMORPHON 2020 Shared Task on Unsupervised Morphological Paradigm CompletionSpecial Interest Group on Computational Morphology and Phonology Workshop (SIGMORPHON), 2020
Katharina Kann
Arya D. McCarthy
Garrett Nicolai
Mans Hulden
136
19
0
28 May 2020
Syntactic Structure Distillation Pretraining For Bidirectional Encoders
Syntactic Structure Distillation Pretraining For Bidirectional EncodersTransactions of the Association for Computational Linguistics (TACL), 2020
A. Kuncoro
Lingpeng Kong
Daniel Fried
Dani Yogatama
Laura Rimell
Chris Dyer
Phil Blunsom
186
35
0
27 May 2020
Analysis of the Penn Korean Universal Dependency Treebank (PKT-UD):
  Manual Revision to Build Robust Parsing Model in Korean
Analysis of the Penn Korean Universal Dependency Treebank (PKT-UD): Manual Revision to Build Robust Parsing Model in KoreanInternational Workshop/Conference on Parsing Technologies (IWPT), 2020
Tae Hwan Oh
J. Han
Hyonsu Choe
Seokwon Park
Han He
Jinho Choi
Na-Rae Han
Jena D. Hwang
Hansaem Kim
97
5
0
26 May 2020
GECToR -- Grammatical Error Correction: Tag, Not Rewrite
GECToR -- Grammatical Error Correction: Tag, Not RewriteWorkshop on Innovative Use of NLP for Building Educational Applications (UNBEA), 2020
Kostiantyn Omelianchuk
Vitaliy Atrasevych
Artem Chernodub
Oleksandr Skurzhanskyi
223
354
0
26 May 2020
BEEP! Korean Corpus of Online News Comments for Toxic Speech Detection
BEEP! Korean Corpus of Online News Comments for Toxic Speech DetectionInternational Workshop on Natural Language Processing for Social Media (NLPSM), 2020
Jihyung Moon
Won Ik Cho
Junbum Lee
148
106
0
26 May 2020
ON-TRAC Consortium for End-to-End and Simultaneous Speech Translation
  Challenge Tasks at IWSLT 2020
ON-TRAC Consortium for End-to-End and Simultaneous Speech Translation Challenge Tasks at IWSLT 2020
Maha Elbayad
H. Nguyen
Fethi Bougares
N. Tomashenko
Antoine Caubrière
Benjamin Lecouteux
Yannick Esteve
Laurent Besacier
106
13
0
24 May 2020
Stronger Baselines for Grammatical Error Correction Using Pretrained
  Encoder-Decoder Model
Stronger Baselines for Grammatical Error Correction Using Pretrained Encoder-Decoder Model
Satoru Katsumata
Mamoru Komachi
164
63
0
24 May 2020
Team Neuro at SemEval-2020 Task 8: Multi-Modal Fine Grain Emotion
  Classification of Memes using Multitask Learning
Team Neuro at SemEval-2020 Task 8: Multi-Modal Fine Grain Emotion Classification of Memes using Multitask Learning
Sourya Dipta Das
Soumil Mandal
92
4
0
21 May 2020
Worse WER, but Better BLEU? Leveraging Word Embedding as Intermediate in
  Multitask End-to-End Speech Translation
Worse WER, but Better BLEU? Leveraging Word Embedding as Intermediate in Multitask End-to-End Speech Translation
Shun-Po Chuang
Tzu-Wei Sung
Alexander H. Liu
Hung-yi Lee
190
23
0
21 May 2020
Embeddings-Based Clustering for Target Specific Stances: The Case of a
  Polarized Turkey
Embeddings-Based Clustering for Target Specific Stances: The Case of a Polarized Turkey
Ammar Rashed
Mucahid Kutlu
Kareem Darwish
Tamer Elsayed
Cansin Bayrak
128
55
0
19 May 2020
Iterative Pseudo-Labeling for Speech Recognition
Iterative Pseudo-Labeling for Speech Recognition
Qiantong Xu
Tatiana Likhomanenko
Jacob Kahn
Awni Y. Hannun
Gabriel Synnaeve
R. Collobert
VLM
254
145
0
19 May 2020
Are All Languages Created Equal in Multilingual BERT?
Are All Languages Created Equal in Multilingual BERT?
Shijie Wu
Mark Dredze
236
361
0
18 May 2020
T-VSE: Transformer-Based Visual Semantic Embedding
T-VSE: Transformer-Based Visual Semantic Embedding
M. Bastan
Arnau Ramisa
Mehmet Tek
ViT
119
7
0
17 May 2020
Large scale weakly and semi-supervised learning for low-resource video
  ASR
Large scale weakly and semi-supervised learning for low-resource video ASR
Kritika Singh
Vimal Manohar
Alex Xiao
Sergey Edunov
Ross B. Girshick
Vitaliy Liptchinsky
Christian Fuegen
Yatharth Saraf
Geoffrey Zweig
Abdel-rahman Mohamed
144
10
0
16 May 2020
You Do Not Need More Data: Improving End-To-End Speech Recognition by
  Text-To-Speech Data Augmentation
You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation
A. Laptev
Roman Korostik
A. Svischev
A. Andrusenko
Ivan Medennikov
S. Rybin
176
66
0
14 May 2020
An Evaluation of Recent Neural Sequence Tagging Models in Turkish Named
  Entity Recognition
An Evaluation of Recent Neural Sequence Tagging Models in Turkish Named Entity Recognition
Gizem Aras
Didem Makaroglu
Seniz Demir
Altan Cakir
130
32
0
14 May 2020
Simultaneous paraphrasing and translation by fine-tuning Transformer
  models
Simultaneous paraphrasing and translation by fine-tuning Transformer models
Rakesh Chada
92
5
0
12 May 2020
DiscreTalk: Text-to-Speech as a Machine Translation Problem
DiscreTalk: Text-to-Speech as a Machine Translation Problem
Tomoki Hayashi
Shinji Watanabe
111
35
0
12 May 2020
Leveraging Monolingual Data with Self-Supervision for Multilingual
  Neural Machine Translation
Leveraging Monolingual Data with Self-Supervision for Multilingual Neural Machine Translation
Aditya Siddhant
Ankur Bapna
Yuan Cao
Orhan Firat
Mengzhao Chen
Sneha Kudugunta
N. Arivazhagan
Yonghui Wu
218
88
0
11 May 2020
A Multi-Perspective Architecture for Semantic Code Search
A Multi-Perspective Architecture for Semantic Code SearchAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Rajarshi Haldar
Lingfei Wu
Jinjun Xiong
Anjali Narayan-Chen
110
62
0
06 May 2020
Improving Truthfulness of Headline Generation
Improving Truthfulness of Headline GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Kazuki Matsumaru
Sho Takase
Naoaki Okazaki
HILM
173
49
0
02 May 2020
Language Models as an Alternative Evaluator of Word Order Hypotheses: A
  Case Study in Japanese
Language Models as an Alternative Evaluator of Word Order Hypotheses: A Case Study in JapaneseAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Tatsuki Kuribayashi
Takumi Ito
Jun Suzuki
Kentaro Inui
80
5
0
02 May 2020
On Faithfulness and Factuality in Abstractive Summarization
On Faithfulness and Factuality in Abstractive SummarizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Joshua Maynez
Shashi Narayan
Bernd Bohnet
Ryan T. McDonald
HILM
255
1,237
0
02 May 2020
Imitation Attacks and Defenses for Black-box Machine Translation Systems
Imitation Attacks and Defenses for Black-box Machine Translation SystemsConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Eric Wallace
Mitchell Stern
Basel Alomair
AAML
302
130
0
30 Apr 2020
A Study in Improving BLEU Reference Coverage with Diverse Automatic
  Paraphrasing
A Study in Improving BLEU Reference Coverage with Diverse Automatic Paraphrasing
Rachel Bawden
Biao Zhang
Lisa Yankovskaya
Andre Tattar
Matt Post
168
1
0
30 Apr 2020
Data and Representation for Turkish Natural Language Inference
Data and Representation for Turkish Natural Language Inference
Emrah Budur
Rıza Özçelik
Tunga Güngör
Christopher Potts
73
1
0
30 Apr 2020
Language Model Prior for Low-Resource Neural Machine Translation
Language Model Prior for Low-Resource Neural Machine TranslationConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Christos Baziotis
Barry Haddow
Alexandra Birch
396
59
0
30 Apr 2020
Bridging Linguistic Typology and Multilingual Machine Translation with
  Multi-View Language Representations
Bridging Linguistic Typology and Multilingual Machine Translation with Multi-View Language RepresentationsConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Arturo Oncevay
Barry Haddow
Alexandra Birch
202
37
0
30 Apr 2020
Recipes for Adapting Pre-trained Monolingual and Multilingual Models to
  Machine Translation
Recipes for Adapting Pre-trained Monolingual and Multilingual Models to Machine TranslationConference of the European Chapter of the Association for Computational Linguistics (EACL), 2020
Asa Cooper Stickland
Xian Li
Marjan Ghazvininejad
230
49
0
30 Apr 2020
Mind Your Inflections! Improving NLP for Non-Standard Englishes with
  Base-Inflection Encoding
Mind Your Inflections! Improving NLP for Non-Standard Englishes with Base-Inflection EncodingConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Samson Tan
Shafiq Joty
Lav Varshney
Min-Yen Kan
314
38
0
30 Apr 2020
Enriched Pre-trained Transformers for Joint Slot Filling and Intent
  Detection
Enriched Pre-trained Transformers for Joint Slot Filling and Intent DetectionRecent Advances in Natural Language Processing (RANLP), 2020
Momchil Hardalov
Ivan Koychev
Preslav Nakov
VLM
157
20
0
30 Apr 2020
Vocabulary Adaptation for Distant Domain Adaptation in Neural Machine
  Translation
Vocabulary Adaptation for Distant Domain Adaptation in Neural Machine Translation
Shoetsu Sato
Jin Sakuma
Naoki Yoshinaga
Masashi Toyoda
M. Kitsuregawa
184
3
0
30 Apr 2020
Self-Supervised and Controlled Multi-Document Opinion Summarization
Self-Supervised and Controlled Multi-Document Opinion SummarizationConference of the European Chapter of the Association for Computational Linguistics (EACL), 2020
Hady ElSahar
Maximin Coavoux
Matthias Gallé
Jos Rozen
122
51
0
30 Apr 2020
Automatic Machine Translation Evaluation in Many Languages via Zero-Shot
  Paraphrasing
Automatic Machine Translation Evaluation in Many Languages via Zero-Shot ParaphrasingConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Brian Thompson
Matt Post
LRM
201
199
0
30 Apr 2020
WT5?! Training Text-to-Text Models to Explain their Predictions
WT5?! Training Text-to-Text Models to Explain their Predictions
Sharan Narang
Colin Raffel
Katherine Lee
Adam Roberts
Noah Fiedel
Karishma Malkan
188
212
0
30 Apr 2020
Simulated Multiple Reference Training Improves Low-Resource Machine
  Translation
Simulated Multiple Reference Training Improves Low-Resource Machine TranslationConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Huda Khayrallah
Brian Thompson
Matt Post
Philipp Koehn
259
39
0
30 Apr 2020
Previous
123...373839404142
Next