ChatGPT, an Artificial Intelligence model, has the potential to revolutionize education. However, its effectiveness in solving non-English questions remains uncertain. This study evaluates ChatGPT's robustness using 586 Korean mathematics questions. ChatGPT achieves 66.72% accuracy, correctly answering 391 out of 586 questions. We also assess its ability to rate mathematics questions based on eleven criteria and perform a topic analysis. Our findings show that ChatGPT's ratings align with educational theory and test-taker perspectives. While ChatGPT performs well in question classification, it struggles with non-English contexts, highlighting areas for improvement. Future research should address linguistic biases and enhance accuracy across diverse languages. Domain-specific optimizations and multilingual training could improve ChatGPT's role in personalized education.
View on arXiv@article{nguyen2025_2502.11915, title={ On the robustness of ChatGPT in teaching Korean Mathematics }, author={ Phuong-Nam Nguyen and Quang Nguyen-The and An Vu-Minh and Diep-Anh Nguyen and Xuan-Lam Pham }, journal={arXiv preprint arXiv:2502.11915}, year={ 2025 } }