ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.17733
  4. Cited By
Building a Large Japanese Web Corpus for Large Language Models

Building a Large Japanese Web Corpus for Large Language Models

27 April 2024
Naoaki Okazaki
Kakeru Hattori
Hirai Shota
Hiroki Iida
Masanari Ohi
Kazuki Fujii
Taishi Nakamura
Mengsay Loem
Rio Yokota
Sakae Mizuki
ArXivPDFHTML

Papers citing "Building a Large Japanese Web Corpus for Large Language Models"

12 / 12 papers shown
Title
Rewriting Pre-Training Data Boosts LLM Performance in Math and Code
Rewriting Pre-Training Data Boosts LLM Performance in Math and Code
Kazuki Fujii
Yukito Tajima
Sakae Mizuki
Hinari Shimada
Taihei Shiotani
...
Kakeru Hattori
Youmi Ma
Hiroya Takamura
Rio Yokota
Naoaki Okazaki
SyDa
45
0
0
05 May 2025
Kuwain 1.5B: An Arabic SLM via Language Injection
Kuwain 1.5B: An Arabic SLM via Language Injection
Khalil Hennara
Sara Chrouf
Mohamed Motaism Hamed
Zeina Aldallal
Omar Hadid
Safwan AlModhayan
27
1
0
21 Apr 2025
Application of Contrastive Learning on ECG Data: Evaluating Performance in Japanese and Classification with Around 100 Labels
Application of Contrastive Learning on ECG Data: Evaluating Performance in Japanese and Classification with Around 100 Labels
Junichiro Takahashi
JingChuan Guan
Masataka Sato
Kaito Baba
Kazuto Haruguchi
Daichi Nagashima
Satoshi Kodera
Norihiko Takeda
18
0
0
12 Apr 2025
Building Instruction-Tuning Datasets from Human-Written Instructions with Open-Weight Large Language Models
Building Instruction-Tuning Datasets from Human-Written Instructions with Open-Weight Large Language Models
Youmi Ma
Sakae Mizuki
Kazuki Fujii
Taishi Nakamura
Masanari Ohi
...
Takumi Okamoto
Shigeki Ishida
Rio Yokota
Hiroya Takamura
Naoaki Okazaki
ALM
50
0
0
31 Mar 2025
How Language Model Hallucinations Can Snowball
How Language Model Hallucinations Can Snowball
Muru Zhang
Ofir Press
William Merrill
Alisa Liu
Noah A. Smith
HILM
LRM
75
246
0
22 May 2023
Poisoning Language Models During Instruction Tuning
Poisoning Language Models During Instruction Tuning
Alexander Wan
Eric Wallace
Sheng Shen
Dan Klein
SILM
90
124
0
01 May 2023
Language Models are Multilingual Chain-of-Thought Reasoners
Language Models are Multilingual Chain-of-Thought Reasoners
Freda Shi
Mirac Suzgun
Markus Freitag
Xuezhi Wang
Suraj Srivats
...
Yi Tay
Sebastian Ruder
Denny Zhou
Dipanjan Das
Jason W. Wei
ReLM
LRM
162
320
0
06 Oct 2022
Large Language Models are Zero-Shot Reasoners
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
291
2,712
0
24 May 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLM
BDL
LRM
AI4CE
297
3,163
0
21 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
Deduplicating Training Data Makes Language Models Better
Deduplicating Training Data Makes Language Models Better
Katherine Lee
Daphne Ippolito
A. Nystrom
Chiyuan Zhang
Douglas Eck
Chris Callison-Burch
Nicholas Carlini
SyDa
234
447
0
14 Jul 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
236
1,508
0
31 Dec 2020
1