SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing

19 August 2018

Papers citing "SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing"

50 / 2,061 papers shown

Title
Cross-lingual Retrieval for Iterative Self-Supervised Training C. Tran Y. Tang Xian Li Jiatao Gu RALM 175 74 0 16 Jun 2020
Modeling Graph Structure via Relative Position for Text Generation from Knowledge Graphs Martin Schmitt Leonardo F. R. Ribeiro Philipp Dufter Iryna Gurevych Hinrich Schütze GNN 211 8 0 16 Jun 2020
Exploration of End-to-End ASR for OpenSTT -- Russian Open Speech-to-Text Dataset A. Andrusenko A. Laptev Ivan Medennikov VLM 220 13 0 15 Jun 2020
Transferring Monolingual Model to Low-Resource Language: The Case of TigrinyaApplied Computing and Intelligence (ACI), 2020 Abrhalei Tela Abraham Woubie Ville Hautamaki 158 17 0 13 Jun 2020
VirTex: Learning Visual Representations from Textual AnnotationsComputer Vision and Pattern Recognition (CVPR), 2020 Karan Desai Justin Johnson SSL VLM 432 465 0 11 Jun 2020
Pre-training Polish Transformer-based Language Models at Scale Slawomir Dadas Michal Perelkiewicz Rafal Poswiata 186 43 0 07 Jun 2020
Unsupervised Translation of Programming Languages Marie-Anne Lachaux Baptiste Roziere L. Chanussot Guillaume Lample 315 495 0 05 Jun 2020
ELITR Non-Native Speech Translation at IWSLT 2020 Dominik Machávcek Jonávs Kratochvíl Sangeet Sagar Matúvs vZilinec Ondrej Bojar T. Nguyen Felix Schneider P. Williams Yuekun Yao 112 11 0 05 Jun 2020
Contextual RNN-T For Open Domain ASRInterspeech (Interspeech), 2020 Mahaveer Jain Gil Keren Jay Mahadeokar Geoffrey Zweig Florian Metze Yatharth Saraf 209 119 0 04 Jun 2020
M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training Minheng Ni Haoyang Huang Lin Su Edward Cui Taroon Bharti Lijuan Wang Jianfeng Gao Dongdong Zhang Nan Duan 248 7 0 04 Jun 2020
Self-Training for End-to-End Speech TranslationInterspeech (Interspeech), 2020 J. Pino Qiantong Xu Xutai Ma M. Dousti Yun Tang 214 68 0 03 Jun 2020
WikiBERT models: deep transfer learning for many languagesNordic Conference of Computational Linguistics (NODALIDA), 2020 S. Pyysalo Jenna Kanerva Antti Virtanen Filip Ginter KELM 151 39 0 02 Jun 2020
Cascaded Text Generation with Markov TransformersNeural Information Processing Systems (NeurIPS), 2020 Yuntian Deng Alexander M. Rush 170 15 0 01 Jun 2020
Neural Simultaneous Speech Translation Using Alignment-Based ChunkingInternational Workshop on Spoken Language Translation (IWSLT), 2020 P. Wilken Tamer Alkhouli E. Matusov Pavel Golik 127 16 0 29 May 2020
The SIGMORPHON 2020 Shared Task on Unsupervised Morphological Paradigm CompletionSpecial Interest Group on Computational Morphology and Phonology Workshop (SIGMORPHON), 2020 Katharina Kann Arya D. McCarthy Garrett Nicolai Mans Hulden 136 19 0 28 May 2020
Syntactic Structure Distillation Pretraining For Bidirectional EncodersTransactions of the Association for Computational Linguistics (TACL), 2020 A. Kuncoro Lingpeng Kong Daniel Fried Dani Yogatama Laura Rimell Chris Dyer Phil Blunsom 186 35 0 27 May 2020
Analysis of the Penn Korean Universal Dependency Treebank (PKT-UD): Manual Revision to Build Robust Parsing Model in KoreanInternational Workshop/Conference on Parsing Technologies (IWPT), 2020 Tae Hwan Oh J. Han Hyonsu Choe Seokwon Park Han He Jinho Choi Na-Rae Han Jena D. Hwang Hansaem Kim 97 5 0 26 May 2020
GECToR -- Grammatical Error Correction: Tag, Not RewriteWorkshop on Innovative Use of NLP for Building Educational Applications (UNBEA), 2020 Kostiantyn Omelianchuk Vitaliy Atrasevych Artem Chernodub Oleksandr Skurzhanskyi 223 354 0 26 May 2020
BEEP! Korean Corpus of Online News Comments for Toxic Speech DetectionInternational Workshop on Natural Language Processing for Social Media (NLPSM), 2020 Jihyung Moon Won Ik Cho Junbum Lee 148 106 0 26 May 2020
ON-TRAC Consortium for End-to-End and Simultaneous Speech Translation Challenge Tasks at IWSLT 2020 Maha Elbayad H. Nguyen Fethi Bougares N. Tomashenko Antoine Caubrière Benjamin Lecouteux Yannick Esteve Laurent Besacier 106 13 0 24 May 2020
Stronger Baselines for Grammatical Error Correction Using Pretrained Encoder-Decoder Model Satoru Katsumata Mamoru Komachi 164 63 0 24 May 2020
Team Neuro at SemEval-2020 Task 8: Multi-Modal Fine Grain Emotion Classification of Memes using Multitask Learning Sourya Dipta Das Soumil Mandal 92 4 0 21 May 2020
Worse WER, but Better BLEU? Leveraging Word Embedding as Intermediate in Multitask End-to-End Speech Translation Shun-Po Chuang Tzu-Wei Sung Alexander H. Liu Hung-yi Lee 190 23 0 21 May 2020
Embeddings-Based Clustering for Target Specific Stances: The Case of a Polarized Turkey Ammar Rashed Mucahid Kutlu Kareem Darwish Tamer Elsayed Cansin Bayrak 128 55 0 19 May 2020
Iterative Pseudo-Labeling for Speech Recognition Qiantong Xu Tatiana Likhomanenko Jacob Kahn Awni Y. Hannun Gabriel Synnaeve R. Collobert VLM 254 145 0 19 May 2020
Are All Languages Created Equal in Multilingual BERT? Shijie Wu Mark Dredze 236 361 0 18 May 2020
T-VSE: Transformer-Based Visual Semantic Embedding M. Bastan Arnau Ramisa Mehmet Tek ViT 119 7 0 17 May 2020
Large scale weakly and semi-supervised learning for low-resource video ASR Kritika Singh Vimal Manohar Alex Xiao Sergey Edunov Ross B. Girshick Vitaliy Liptchinsky Christian Fuegen Yatharth Saraf Geoffrey Zweig Abdel-rahman Mohamed 144 10 0 16 May 2020
You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation A. Laptev Roman Korostik A. Svischev A. Andrusenko Ivan Medennikov S. Rybin 176 66 0 14 May 2020
An Evaluation of Recent Neural Sequence Tagging Models in Turkish Named Entity Recognition Gizem Aras Didem Makaroglu Seniz Demir Altan Cakir 130 32 0 14 May 2020
Simultaneous paraphrasing and translation by fine-tuning Transformer models Rakesh Chada 92 5 0 12 May 2020
DiscreTalk: Text-to-Speech as a Machine Translation Problem Tomoki Hayashi Shinji Watanabe 111 35 0 12 May 2020
Leveraging Monolingual Data with Self-Supervision for Multilingual Neural Machine Translation Aditya Siddhant Ankur Bapna Yuan Cao Orhan Firat Mengzhao Chen Sneha Kudugunta N. Arivazhagan Yonghui Wu 218 88 0 11 May 2020
A Multi-Perspective Architecture for Semantic Code SearchAnnual Meeting of the Association for Computational Linguistics (ACL), 2020 Rajarshi Haldar Lingfei Wu Jinjun Xiong Anjali Narayan-Chen 110 62 0 06 May 2020
Improving Truthfulness of Headline GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2020 Kazuki Matsumaru Sho Takase Naoaki Okazaki HILM 173 49 0 02 May 2020
Language Models as an Alternative Evaluator of Word Order Hypotheses: A Case Study in JapaneseAnnual Meeting of the Association for Computational Linguistics (ACL), 2020 Tatsuki Kuribayashi Takumi Ito Jun Suzuki Kentaro Inui 80 5 0 02 May 2020
On Faithfulness and Factuality in Abstractive SummarizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2020 Joshua Maynez Shashi Narayan Bernd Bohnet Ryan T. McDonald HILM 255 1,237 0 02 May 2020
Imitation Attacks and Defenses for Black-box Machine Translation SystemsConference on Empirical Methods in Natural Language Processing (EMNLP), 2020 Eric Wallace Mitchell Stern Basel Alomair AAML 302 130 0 30 Apr 2020
A Study in Improving BLEU Reference Coverage with Diverse Automatic Paraphrasing Rachel Bawden Biao Zhang Lisa Yankovskaya Andre Tattar Matt Post 168 1 0 30 Apr 2020
Data and Representation for Turkish Natural Language Inference Emrah Budur Rıza Özçelik Tunga Güngör Christopher Potts 73 1 0 30 Apr 2020
Language Model Prior for Low-Resource Neural Machine TranslationConference on Empirical Methods in Natural Language Processing (EMNLP), 2020 Christos Baziotis Barry Haddow Alexandra Birch 396 59 0 30 Apr 2020
Bridging Linguistic Typology and Multilingual Machine Translation with Multi-View Language RepresentationsConference on Empirical Methods in Natural Language Processing (EMNLP), 2020 Arturo Oncevay Barry Haddow Alexandra Birch 202 37 0 30 Apr 2020
Recipes for Adapting Pre-trained Monolingual and Multilingual Models to Machine TranslationConference of the European Chapter of the Association for Computational Linguistics (EACL), 2020 Asa Cooper Stickland Xian Li Marjan Ghazvininejad 230 49 0 30 Apr 2020
Mind Your Inflections! Improving NLP for Non-Standard Englishes with Base-Inflection EncodingConference on Empirical Methods in Natural Language Processing (EMNLP), 2020 Samson Tan Shafiq Joty Lav Varshney Min-Yen Kan 314 38 0 30 Apr 2020
Enriched Pre-trained Transformers for Joint Slot Filling and Intent DetectionRecent Advances in Natural Language Processing (RANLP), 2020 Momchil Hardalov Ivan Koychev Preslav Nakov VLM 157 20 0 30 Apr 2020
Vocabulary Adaptation for Distant Domain Adaptation in Neural Machine Translation Shoetsu Sato Jin Sakuma Naoki Yoshinaga Masashi Toyoda M. Kitsuregawa 184 3 0 30 Apr 2020
Self-Supervised and Controlled Multi-Document Opinion SummarizationConference of the European Chapter of the Association for Computational Linguistics (EACL), 2020 Hady ElSahar Maximin Coavoux Matthias Gallé Jos Rozen 122 51 0 30 Apr 2020
Automatic Machine Translation Evaluation in Many Languages via Zero-Shot ParaphrasingConference on Empirical Methods in Natural Language Processing (EMNLP), 2020 Brian Thompson Matt Post LRM 201 199 0 30 Apr 2020
WT5?! Training Text-to-Text Models to Explain their Predictions Sharan Narang Colin Raffel Katherine Lee Adam Roberts Noah Fiedel Karishma Malkan 188 212 0 30 Apr 2020
Simulated Multiple Reference Training Improves Low-Resource Machine TranslationConference on Empirical Methods in Natural Language Processing (EMNLP), 2020 Huda Khayrallah Brian Thompson Matt Post Philipp Koehn 259 39 0 30 Apr 2020