SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing

19 August 2018

Papers citing "SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing"

50 / 2,063 papers shown

A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech Recognition BaselineConference of the European Chapter of the Association for Computational Linguistics (EACL), 2020

Yerbolat Khassanov

128

22 Sep 2020

Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine TranslationConference on Empirical Methods in Natural Language Processing (EMNLP), 2020

261

20 Sep 2020

Will it Unblend?Findings (Findings), 2020

Yuval Pinter

Cassandra L. Jacobs

Jacob Eisenstein

151

18 Sep 2020

Automated Source Code Generation and Auto-completion Using Deep Learning: Comparing and Discussing Current Language-Model-Related ApproachesApplied Informatics (AI), 2020

Juan Cruz-Benito

Sanjay Vishwakarma

Francisco Martín-Fernández

Ismael Faro Ibm Quantum

330

16 Sep 2020

Extremely Low Bit Transformer Quantization for On-Device Neural Machine TranslationFindings (Findings), 2020

234

16 Sep 2020

Multi-span Style Extraction for Generative Reading Comprehension

181

15 Sep 2020

Iterative Refinement in the Continuous Space for Non-Autoregressive Neural Machine TranslationConference on Empirical Methods in Natural Language Processing (EMNLP), 2020

Jason D. Lee

Raphael Shu

Dong Wang

123

15 Sep 2020

KoSpeech: Open-Source Toolkit for End-to-End Korean Speech Recognition

168

07 Sep 2020

UPB at SemEval-2020 Task 8: Joint Textual and Visual Modeling in a Multi-Task Learning Architecture for Memotion AnalysisInternational Workshop on Semantic Evaluation (SemEval), 2020

G. Vlad

George-Eduard Zaharia

Dumitru-Clementin Cercel

Costin-Gabriel Chiru

Stefan Trausan-Matu

142

06 Sep 2020

GREEK-BERT: The Greeks visiting Sesame StreetHellenic Conference on Artificial Intelligence (HAI), 2020

John Koutsikakis

Ilias Chalkidis

Prodromos Malakasiotis

Ion Androutsopoulos

184

104

27 Aug 2020

Multi-Label Sentiment Analysis on 100 Languages with Dynamic Weighting for Label ImbalanceIEEE Transactions on Neural Networks and Learning Systems (IEEE TNNLS), 2020

157

26 Aug 2020

JokeMeter at SemEval-2020 Task 7: Convolutional humorInternational Workshop on Semantic Evaluation (SemEval), 2020

150

25 Aug 2020

HinglishNLP: Fine-tuned Language Models for Hinglish Sentiment Detection

Meghana Bhange

Nirant Kasliwal

22 Aug 2020

Neural Machine Translation without Embeddings

Uri Shaham

Omer Levy

229

21 Aug 2020

PTT5: Pretraining and validating the T5 model on Brazilian Portuguese data

Diedre Carmo

129

20 Aug 2020

Inducing Language-Agnostic Multilingual Representations

207

20 Aug 2020

Lite Training Strategies for Portuguese-English and English-Portuguese Translation

100

20 Aug 2020

Finding Fast Transformers: One-Shot Neural Architecture Search by Component Composition

145

15 Aug 2020

Speech To Semantics: Improve ASR and NLU Jointly via All-Neural Interfaces

127

14 Aug 2020

Conv-Transformer Transducer: Low Latency, Low Frame Rate, Streamable End-to-End Speech RecognitionInterspeech (Interspeech), 2020

158

13 Aug 2020

Paraphrase Generation as Zero-Shot Multilingual Translation: Disentangling Semantic Similarity from Lexical and Syntactic DiversityConference on Machine Translation (WMT), 2020

Brian Thompson

Matt Post

245

11 Aug 2020

Revisiting Low Resource Status of Indian Languages in Machine Translation

254

11 Aug 2020

A Multilingual Neural Machine Translation Model for Biomedical Data

136

06 Aug 2020

Designing the Business Conversation Corpus

101

05 Aug 2020

A Survey of Orthographic Information in Machine Translation

Bharathi Raja Chakravarthi

P. Rani

Mihael Arcan

John P. Mccrae

149

04 Aug 2020

Multilingual Translation with Extensible Multilingual Pretraining and Finetuning

Xian Li

Angela Fan

348

522

02 Aug 2020

Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing

Xiaodong Liu

669

2,143

31 Jul 2020

COVID-19 therapy target discovery with context-aware literature miningIFIP Working Conference on Database Semantics (IWDS), 2020

143

30 Jul 2020

BUT-FIT at SemEval-2020 Task 5: Automatic detection of counterfactual statements with deep pre-trained language representation modelsInternational Workshop on Semantic Evaluation (SemEval), 2020

28 Jul 2020

Big Bird: Transformers for Longer SequencesNeural Information Processing Systems (NeurIPS), 2020

Joshua Ainslie

...

1.3K

2,520

28 Jul 2020

Active Learning for Video Description With Cluster-Regularized Ensemble RankingAsian Conference on Computer Vision (ACCV), 2020

David M. Chan

Sudheendra Vijayanarasimhan

David A. Ross

John F. Canny

VLM

274

27 Jul 2020

Consistent Transcription and Translation of SpeechTransactions of the Association for Computational Linguistics (TACL), 2020

216

24 Jul 2020

FiSSA at SemEval-2020 Task 9: Fine-tuned For FeelingsInternational Workshop on Semantic Evaluation (SemEval), 2020

256

24 Jul 2020

CoVoST 2 and Massively Multilingual Speech-to-Text Translation

175

20 Jul 2020

Mono vs Multilingual Transformer-based Models: a Comparison across Several Language Tasks

Diego de Vargas Feijó

V. Moreira

MILM

19 Jul 2020

Drinking from a Firehose: Continual Learning with Web-scale Natural LanguageIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020

224

18 Jul 2020

A Multilingual Parallel Corpora Collection Effort for Indian LanguagesInternational Conference on Language Resources and Evaluation (LREC), 2020

Shashank Siripragrada

126

15 Jul 2020

Multi-Dialect Arabic BERT for Country-Level Dialect IdentificationWorkshop on Arabic Natural Language Processing (WANLP), 2020

Hussein T. Al-Natsheh

146

10 Jul 2020

What Can We Learn From Almost a Decade of Food Tweets

Uga Sprocgis

Matīss Rikters

106

10 Jul 2020

scb-mt-en-th-2020: A Large English-Thai Parallel Corpus

Lalita Lowphansirikul

Charin Polpanumas

Attapol T. Rutherford

Sarana Nutanong

LRM

107

07 Jul 2020

Deep Contextual Embeddings for Address Classification in E-commerce

Shreyas Mangalgi

Lakshya Kumar

Ravindra Babu Tallamraju

113

06 Jul 2020

Pretrained Generalized Autoregressive Model with Adaptive Probabilistic Label Clusters for Extreme Multi-label Text Classification

236

05 Jul 2020

Playing with Words at the National Library of Sweden -- Making a Swedish BERT

Martin Malmsten

Love Borjeson

Chris Haffenden

149

133

03 Jul 2020

Iterative Paraphrastic Augmentation with Discriminative Span Alignment

Elias Stengel-Eskin

113

01 Jul 2020

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

392

1,619

30 Jun 2020

Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for Improved Generalization

Sang Michael Xie

Tengyu Ma

Abigail Z. Jacobs

354

29 Jun 2020

A High-Quality Multilingual Dataset for Structured Documentation TranslationConference on Machine Translation (WMT), 2020

132

24 Jun 2020

Exploring Software Naturalness through Neural Language Models

...

203

101

22 Jun 2020

ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion

179

22 Jun 2020

Self-Supervised Representations Improve End-to-End Speech Translation

252

22 Jun 2020