ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.00400
  4. Cited By
Sub-Character Tokenization for Chinese Pretrained Language Models
v1v2v3 (latest)

Sub-Character Tokenization for Chinese Pretrained Language Models

Transactions of the Association for Computational Linguistics (TACL), 2021
1 June 2021
Chenglei Si
Zhengyan Zhang
Yingfa Chen
Fanchao Qi
Xiaozhi Wang
Zhiyuan Liu
Yasheng Wang
Qun Liu
Maosong Sun
ArXiv (abs)PDFHTMLGithub (44★)

Papers citing "Sub-Character Tokenization for Chinese Pretrained Language Models"

6 / 6 papers shown
Title
Beyond Fertility: Analyzing STRR as a Metric for Multilingual Tokenization Evaluation
Beyond Fertility: Analyzing STRR as a Metric for Multilingual Tokenization Evaluation
Mir Tafseer Nayeem
Sawsan Alqahtani
Md Tahmid Rahman Laskar
Tasnim Mohiuddin
M Saiful Bari
87
0
0
11 Oct 2025
Entropy-Driven Pre-Tokenization for Byte-Pair Encoding
Entropy-Driven Pre-Tokenization for Byte-Pair Encoding
Yifan Hu
Frank Liang
Dachuan Zhao
Jonathan Geuter
Varshini Reddy
Craig W. Schmidt
Chris Tanner
216
1
0
18 Jun 2025
Romanization Encoding For Multilingual ASR
Romanization Encoding For Multilingual ASR
Wen Ding
Fei Jia
Hainan Xu
Yu Xi
Junjie Lai
Boris Ginsburg
177
1
0
05 Jul 2024
Optimal Transport Posterior Alignment for Cross-lingual Semantic Parsing
Optimal Transport Posterior Alignment for Cross-lingual Semantic ParsingTransactions of the Association for Computational Linguistics (TACL), 2023
Tom Sherborne
Tom Hosking
Mirella Lapata
OT
223
6
0
09 Jul 2023
READIN: A Chinese Multi-Task Benchmark with Realistic and Diverse Input
  Noises
READIN: A Chinese Multi-Task Benchmark with Realistic and Diverse Input NoisesAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Chenglei Si
Zhengyan Zhang
Yingfa Chen
Xiaozhi Wang
Zhiyuan Liu
Maosong Sun
AAML
187
1
0
14 Feb 2023
Pronunciation-aware unique character encoding for RNN Transducer-based
  Mandarin speech recognition
Pronunciation-aware unique character encoding for RNN Transducer-based Mandarin speech recognitionSpoken Language Technology Workshop (SLT), 2022
Peng Shen
Xugang Lu
Hisashi Kawai
85
2
0
29 Jul 2022
1