Kuzushiji-MNIST (KMNIST)

by Tarin Clanuwat,Mikel Bober-Irizar,Asanobu Kitamoto,Alex Lamb,Kazuaki Yamamoto,David HaCC-BY-SA

Kuzushiji-MNIST (KMNIST)

Kuzushiji-MNIST is a drop-in replacement for the MNIST dataset (28x28 grayscale, 70,000 images), provided in the original MNIST format as well as a NumPy format. Since MNIST restricts us to 10 classes, we chose one character to represent each of the 10 rows of Hiragana when creating Kuzushiji-MNIST. Kuzushiji-49, as the name suggests, has 49 classes (28x28 grayscale, 270,912 images), is a much larger, but imbalanced dataset containing 48 Hiragana characters and one Hiragana iteration mark. Kuzushiji-Kanji is an imbalanced dataset of total 3832 Kanji characters (64x64 grayscale, 140,426 images), ranging from 1,766 examples to only a single example per class.

Dataset Attributes

Label SVG
TasksClassification
Label SVG
CategoriesJapanese Characters