ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1808.06226
  4. Cited By
SentencePiece: A simple and language independent subword tokenizer and
  detokenizer for Neural Text Processing

SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing

19 August 2018
Taku Kudo
John Richardson
ArXiv (abs)PDFHTMLGithub (10925★)

Papers citing "SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing"

50 / 2,063 papers shown
A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech
  Recognition Baseline
A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech Recognition BaselineConference of the European Chapter of the Association for Computational Linguistics (EACL), 2020
Yerbolat Khassanov
Saida Mussakhojayeva
A. Mirzakhmetov
A. Adiyev
Mukhamet Nurpeiissov
H. A. Varol
128
38
0
22 Sep 2020
Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New
  Datasets for Bengali-English Machine Translation
Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine TranslationConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Tahmid Hasan
Abhik Bhattacharjee
Kazi Samin Mubasshir
Masum Hasan
Madhusudan Basak
M. Rahman
Rifat Shahriyar
VLM
261
89
0
20 Sep 2020
Will it Unblend?
Will it Unblend?Findings (Findings), 2020
Yuval Pinter
Cassandra L. Jacobs
Jacob Eisenstein
151
15
0
18 Sep 2020
Automated Source Code Generation and Auto-completion Using Deep
  Learning: Comparing and Discussing Current Language-Model-Related Approaches
Automated Source Code Generation and Auto-completion Using Deep Learning: Comparing and Discussing Current Language-Model-Related ApproachesApplied Informatics (AI), 2020
Juan Cruz-Benito
Sanjay Vishwakarma
Francisco Martín-Fernández
Ismael Faro Ibm Quantum
330
32
0
16 Sep 2020
Extremely Low Bit Transformer Quantization for On-Device Neural Machine
  Translation
Extremely Low Bit Transformer Quantization for On-Device Neural Machine TranslationFindings (Findings), 2020
Insoo Chung
Byeongwook Kim
Yoonjung Choi
S. Kwon
Yongkweon Jeon
Baeseong Park
Sangha Kim
Dongsoo Lee
MQ
234
29
0
16 Sep 2020
Multi-span Style Extraction for Generative Reading Comprehension
Multi-span Style Extraction for Generative Reading Comprehension
Junjie Yang
Zhuosheng Zhang
Hai Zhao
SyDa
181
15
0
15 Sep 2020
Iterative Refinement in the Continuous Space for Non-Autoregressive
  Neural Machine Translation
Iterative Refinement in the Continuous Space for Non-Autoregressive Neural Machine TranslationConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Jason D. Lee
Raphael Shu
Dong Wang
123
28
0
15 Sep 2020
KoSpeech: Open-Source Toolkit for End-to-End Korean Speech Recognition
KoSpeech: Open-Source Toolkit for End-to-End Korean Speech Recognition
Soohwan Kim
Seyoung Bae
Cheolhwang Won
VLM
168
4
0
07 Sep 2020
UPB at SemEval-2020 Task 8: Joint Textual and Visual Modeling in a
  Multi-Task Learning Architecture for Memotion Analysis
UPB at SemEval-2020 Task 8: Joint Textual and Visual Modeling in a Multi-Task Learning Architecture for Memotion AnalysisInternational Workshop on Semantic Evaluation (SemEval), 2020
G. Vlad
George-Eduard Zaharia
Dumitru-Clementin Cercel
Costin-Gabriel Chiru
Stefan Trausan-Matu
142
38
0
06 Sep 2020
GREEK-BERT: The Greeks visiting Sesame Street
GREEK-BERT: The Greeks visiting Sesame StreetHellenic Conference on Artificial Intelligence (HAI), 2020
John Koutsikakis
Ilias Chalkidis
Prodromos Malakasiotis
Ion Androutsopoulos
184
104
0
27 Aug 2020
Multi-Label Sentiment Analysis on 100 Languages with Dynamic Weighting
  for Label Imbalance
Multi-Label Sentiment Analysis on 100 Languages with Dynamic Weighting for Label ImbalanceIEEE Transactions on Neural Networks and Learning Systems (IEEE TNNLS), 2020
Selim F. Yilmaz
E. Kaynak
Aykut Koç
H. Dibeklioğlu
Suleyman S. Kozat
157
33
0
26 Aug 2020
JokeMeter at SemEval-2020 Task 7: Convolutional humor
JokeMeter at SemEval-2020 Task 7: Convolutional humorInternational Workshop on Semantic Evaluation (SemEval), 2020
Martin Docekal
Martin Fajcik
Josef Jon
Pavel Smrz
150
2
0
25 Aug 2020
HinglishNLP: Fine-tuned Language Models for Hinglish Sentiment Detection
HinglishNLP: Fine-tuned Language Models for Hinglish Sentiment Detection
Meghana Bhange
Nirant Kasliwal
76
8
0
22 Aug 2020
Neural Machine Translation without Embeddings
Neural Machine Translation without Embeddings
Uri Shaham
Omer Levy
229
17
0
21 Aug 2020
PTT5: Pretraining and validating the T5 model on Brazilian Portuguese
  data
PTT5: Pretraining and validating the T5 model on Brazilian Portuguese data
Diedre Carmo
Marcos Piau
Israel Campiotti
Rodrigo Nogueira
R. Lotufo
LM&MA
129
64
0
20 Aug 2020
Inducing Language-Agnostic Multilingual Representations
Inducing Language-Agnostic Multilingual Representations
Wei Zhao
Steffen Eger
Johannes Bjerva
Isabelle Augenstein
207
70
0
20 Aug 2020
Lite Training Strategies for Portuguese-English and English-Portuguese
  Translation
Lite Training Strategies for Portuguese-English and English-Portuguese Translation
Alexandre Lopes
Rodrigo Nogueira
R. Lotufo
Hélio Pedrini
100
9
0
20 Aug 2020
Finding Fast Transformers: One-Shot Neural Architecture Search by
  Component Composition
Finding Fast Transformers: One-Shot Neural Architecture Search by Component Composition
Henry Tsai
Jayden Ooi
Chun-Sung Ferng
Hyung Won Chung
Jason Riesa
ViT
145
21
0
15 Aug 2020
Speech To Semantics: Improve ASR and NLU Jointly via All-Neural
  Interfaces
Speech To Semantics: Improve ASR and NLU Jointly via All-Neural Interfaces
Milind Rao
A. Raju
Pranav Dheram
Bach Bui
Ariya Rastrow
127
43
0
14 Aug 2020
Conv-Transformer Transducer: Low Latency, Low Frame Rate, Streamable
  End-to-End Speech Recognition
Conv-Transformer Transducer: Low Latency, Low Frame Rate, Streamable End-to-End Speech RecognitionInterspeech (Interspeech), 2020
Wenyong Huang
Wenchao Hu
Y. Yeung
Xiao Chen
158
52
0
13 Aug 2020
Paraphrase Generation as Zero-Shot Multilingual Translation:
  Disentangling Semantic Similarity from Lexical and Syntactic Diversity
Paraphrase Generation as Zero-Shot Multilingual Translation: Disentangling Semantic Similarity from Lexical and Syntactic DiversityConference on Machine Translation (WMT), 2020
Brian Thompson
Matt Post
245
61
0
11 Aug 2020
Revisiting Low Resource Status of Indian Languages in Machine
  Translation
Revisiting Low Resource Status of Indian Languages in Machine Translation
Jerin Philip
Shashank Siripragada
Vinay P. Namboodiri
C. V. Jawahar
254
30
0
11 Aug 2020
A Multilingual Neural Machine Translation Model for Biomedical Data
A Multilingual Neural Machine Translation Model for Biomedical Data
Alexandre Berard
Min Namgung
Vassilina Nikoulina
Eunjeong Lucy Park
Matthias Gallé
136
15
0
06 Aug 2020
Designing the Business Conversation Corpus
Designing the Business Conversation Corpus
Matīss Rikters
Ryokan Ri
Tong Li
Toshiaki Nakazawa
101
24
0
05 Aug 2020
A Survey of Orthographic Information in Machine Translation
A Survey of Orthographic Information in Machine Translation
Bharathi Raja Chakravarthi
P. Rani
Mihael Arcan
John P. Mccrae
149
35
0
04 Aug 2020
Multilingual Translation with Extensible Multilingual Pretraining and
  Finetuning
Multilingual Translation with Extensible Multilingual Pretraining and Finetuning
Y. Tang
C. Tran
Xian Li
Peng-Jen Chen
Naman Goyal
Vishrav Chaudhary
Jiatao Gu
Angela Fan
CLL
348
522
0
02 Aug 2020
Domain-Specific Language Model Pretraining for Biomedical Natural
  Language Processing
Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing
Yu Gu
Robert Tinn
Hao Cheng
Michael R. Lucas
Naoto Usuyama
Xiaodong Liu
Tristan Naumann
Jianfeng Gao
Hoifung Poon
LM&MAAI4CE
669
2,143
0
31 Jul 2020
COVID-19 therapy target discovery with context-aware literature mining
COVID-19 therapy target discovery with context-aware literature miningIFIP Working Conference on Database Semantics (IWDS), 2020
Matej Martinc
Blaž Škrlj
S. Pirkmajer
Nada Lavrac
B. Cestnik
Martin Marzidovsek
Senja Pollak
143
9
0
30 Jul 2020
BUT-FIT at SemEval-2020 Task 5: Automatic detection of counterfactual
  statements with deep pre-trained language representation models
BUT-FIT at SemEval-2020 Task 5: Automatic detection of counterfactual statements with deep pre-trained language representation modelsInternational Workshop on Semantic Evaluation (SemEval), 2020
Martin Fajcik
Josef Jon
Martin Docekal
Pavel Smrz
92
11
0
28 Jul 2020
Big Bird: Transformers for Longer Sequences
Big Bird: Transformers for Longer SequencesNeural Information Processing Systems (NeurIPS), 2020
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
1.3K
2,520
0
28 Jul 2020
Active Learning for Video Description With Cluster-Regularized Ensemble
  Ranking
Active Learning for Video Description With Cluster-Regularized Ensemble RankingAsian Conference on Computer Vision (ACCV), 2020
David M. Chan
Sudheendra Vijayanarasimhan
David A. Ross
John F. Canny
VLM
274
6
0
27 Jul 2020
Consistent Transcription and Translation of Speech
Consistent Transcription and Translation of SpeechTransactions of the Association for Computational Linguistics (TACL), 2020
Matthias Sperber
Hendra Setiawan
Christian Gollan
Udhyakumar Nallasamy
Matthias Paulik
216
20
0
24 Jul 2020
FiSSA at SemEval-2020 Task 9: Fine-tuned For Feelings
FiSSA at SemEval-2020 Task 9: Fine-tuned For FeelingsInternational Workshop on Semantic Evaluation (SemEval), 2020
Bertelt Braaksma
R. Scholtens
Stan van Suijlekom
Remy Wang
Ahmet Üstün
256
3
0
24 Jul 2020
CoVoST 2 and Massively Multilingual Speech-to-Text Translation
CoVoST 2 and Massively Multilingual Speech-to-Text Translation
Changhan Wang
Anne Wu
J. Pino
SLR
175
90
0
20 Jul 2020
Mono vs Multilingual Transformer-based Models: a Comparison across
  Several Language Tasks
Mono vs Multilingual Transformer-based Models: a Comparison across Several Language Tasks
Diego de Vargas Feijó
V. Moreira
MILM
90
10
0
19 Jul 2020
Drinking from a Firehose: Continual Learning with Web-scale Natural
  Language
Drinking from a Firehose: Continual Learning with Web-scale Natural LanguageIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020
Hexiang Hu
Ozan Sener
Fei Sha
V. Koltun
CLL
224
28
0
18 Jul 2020
A Multilingual Parallel Corpora Collection Effort for Indian Languages
A Multilingual Parallel Corpora Collection Effort for Indian LanguagesInternational Conference on Language Resources and Evaluation (LREC), 2020
Shashank Siripragrada
Jerin Philip
Vinay P. Namboodiri
C. V. Jawahar
VLM
126
58
0
15 Jul 2020
Multi-Dialect Arabic BERT for Country-Level Dialect Identification
Multi-Dialect Arabic BERT for Country-Level Dialect IdentificationWorkshop on Arabic Natural Language Processing (WANLP), 2020
Bashar Talafha
Mohammad Ali
Muhy Eddin Za'ter
Haitham Seelawi
Ibraheem Tuffaha
Mostafa Samir
Wael Farhan
Hussein T. Al-Natsheh
146
66
0
10 Jul 2020
What Can We Learn From Almost a Decade of Food Tweets
What Can We Learn From Almost a Decade of Food Tweets
Uga Sprocgis
Matīss Rikters
106
12
0
10 Jul 2020
scb-mt-en-th-2020: A Large English-Thai Parallel Corpus
scb-mt-en-th-2020: A Large English-Thai Parallel Corpus
Lalita Lowphansirikul
Charin Polpanumas
Attapol T. Rutherford
Sarana Nutanong
LRM
107
26
0
07 Jul 2020
Deep Contextual Embeddings for Address Classification in E-commerce
Deep Contextual Embeddings for Address Classification in E-commerce
Shreyas Mangalgi
Lakshya Kumar
Ravindra Babu Tallamraju
113
8
0
06 Jul 2020
Pretrained Generalized Autoregressive Model with Adaptive Probabilistic
  Label Clusters for Extreme Multi-label Text Classification
Pretrained Generalized Autoregressive Model with Adaptive Probabilistic Label Clusters for Extreme Multi-label Text Classification
Hui Ye
Zhiyu Zoey Chen
Da-han Wang
Brian D. Davison
VLM
236
56
0
05 Jul 2020
Playing with Words at the National Library of Sweden -- Making a Swedish
  BERT
Playing with Words at the National Library of Sweden -- Making a Swedish BERT
Martin Malmsten
Love Borjeson
Chris Haffenden
149
133
0
03 Jul 2020
Iterative Paraphrastic Augmentation with Discriminative Span Alignment
Iterative Paraphrastic Augmentation with Discriminative Span Alignment
Ryan Culkin
J. E. Hu
Elias Stengel-Eskin
Guanghui Qin
Benjamin Van Durme
113
6
0
01 Jul 2020
GShard: Scaling Giant Models with Conditional Computation and Automatic
  Sharding
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Dmitry Lepikhin
HyoukJoong Lee
Yuanzhong Xu
Dehao Chen
Orhan Firat
Yanping Huang
M. Krikun
Noam M. Shazeer
Zhiwen Chen
MoE
392
1,619
0
30 Jun 2020
Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for
  Improved Generalization
Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for Improved Generalization
Sang Michael Xie
Tengyu Ma
Abigail Z. Jacobs
354
17
0
29 Jun 2020
A High-Quality Multilingual Dataset for Structured Documentation
  Translation
A High-Quality Multilingual Dataset for Structured Documentation TranslationConference on Machine Translation (WMT), 2020
Kazuma Hashimoto
Raffaella Buschiazzo
James Bradbury
Teresa Marshall
R. Socher
Caiming Xiong
132
23
0
24 Jun 2020
Exploring Software Naturalness through Neural Language Models
Exploring Software Naturalness through Neural Language Models
Luca Buratti
Saurabh Pujar
Mihaela A. Bornea
Scott McCarley
Yunhui Zheng
...
Alessandro Morari
Jim Laredo
Veronika Thost
Yufan Zhuang
Giacomo Domeniconi
203
101
0
22 Jun 2020
ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion
ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion
Bingning Wang
Ting Yao
Tao Gui
Jingfang Xu
Xiaochuan Wang
RALM
179
23
0
22 Jun 2020
Self-Supervised Representations Improve End-to-End Speech Translation
Self-Supervised Representations Improve End-to-End Speech Translation
Anne Wu
Changhan Wang
J. Pino
Jiatao Gu
SSL
252
42
0
22 Jun 2020
Previous
123...363738...404142
Next