During my graduate education in Japanese history, interpreting handwritten primary source material from the 19th century and earlier was one of my greatest challenges. Typeset historic documents exist, especially in my period of focus during the Bakumatsu-Meiji transition. But the further back in time one’s research focus is situated, the rarer these documents become. There is a plethora of handwritten documents, written in historic cursive, but learning how to read them is a significant investment of time and resources beyond the means of most people who might otherwise have the inclination to learn.
Further, because of a variety of language and education reforms implemented since the turn of the 20th century, even many native speakers of Japanese cannot read historic cursive. Even someone in my position, who had university affiliation but only sporadic classroom instruction on the subject, will find it to be a challenge. I also couldn’t afford travel to pursue targeted study in Japanese cursive at other universities. That being the case, I had to shoestring together print resources and online glossaries by which to learn how to read cursive by self-study. To this day, I still struggle with it.
To be fair, learning to read handwriting in any language, even a contemporary one, is a challenge for a non-native speaker. Paleography, which is the study and interpretation of historic writing systems, is even more of a challenge.
But although I had to teach myself how to read Edo period script from mostly ink-and-paper sources, a new generation of apps promises to make parsing Japanese cursive script, called kuzushiji (“crushed letters”) in Japanese, a far easier proposition for current and future students of Japanese history.
Demystifying Kuzushiji
On 13 September, Tokyo-based printing company Toppan Inc. announced the next stage in the development of their new kuzushiji software. Called Fuminoha Zemi, it is an AI-driven OCR (optical character recognition) service. Toppan has been developing its kuzushiji OCR technology with the help of a number of research organizations since 2015. It was based off the Bunsho Gazō System, a character recognition database developed by Professor Terasawa Kengo of Hakodate Mirai University that gathered examples of a given character– say, a hiragana に or a kanji 阿– across different digitized documents.
When a user takes a snapshot of a kuzushiji document, the Fuminoha system analyzes the characters and superimposes a typeset equivalent for ease of reading and further analysis. This can be exported into plain text, PDF, and HTML formats, for greater convenience of reading.
Of course, even so, one still needs training in and experience with reading historic Japanese (yes, including historic kanji), which is not the same as contemporary standard Japanese. But when transformed to print form rather than kuzushiji, this is a significantly easier task. Fuminoha is currently deployed in its pre-beta proof of concept form at select museums and archives, and is not available for public use. In the new year, however, this is set to change. Its beta release for iOS is due in January of 2023, with a full public release to follow shortly thereafter in March for private use. It will be available via both app and web browser versions.
Planning a trip to Japan? Get an authentic, interpreted experience from Unseen Japan Tours and see a side of the country others miss!
"Noah [at Unseen Japan] put together an itinerary that didn’t lock us in and we could travel at our own pace. In Tokyo, he guided us personally on a walking tour. Overall, he made our Japan trip an experience not to forget." - Kate and Simon S., Australia
Keep all you devices connected in Japan - rent a pocket wifi device! Available for hotel pickup or delivered to your airport. Fast speeds and backed by excellent customer service. (Note: Affiliate link - Unseen Japan earns a commission if you make a purchase.)
A Challenger Rises
Fuminoha is not the only attempt at a kuzushiji transcribing app relying on OCR and AI models. I have in recent months used a competitor app, called Miwo, which has similar functionality and has been available for download since August 2021. Miwo was made by the Center for Open Data in the Humanities, part of the Japanese Research Organization of Information and Systems. It was trained on two recognition models, both of them derived from the National Institute of Japanese Literature’s Kuzushiji Dataset. As a historian eager to expand the range of sources at my disposal, I am curious to see how Miwo compares to Fuminoha. I plan to write a follow-up to this piece after the latter’s release, comparing their functionality and performance.
To learn more about Fuminoha, check out Toppan’s Fuminoha page for details, examples of how it works, detailed information, and much more, at https://www.toppan.co.jp/biz/fuminoha/
I, for one, look forward to having one more tool at my disposal with which to better and more easily access primary sources. I also look forward to this improved legibility of historic documents opening the large swaths of digitized material to the enjoyment, study, and appreciation of the general public in Japan and abroad.
Sources
- “About CODH” Center for Open Data in the Humanities. Accessed 16 September 2022.
- “Company Profile.” Toppan. Accessed 15 September 2022
- “Kyōin Shōkai: Terasawa Kengo.” Future University Hakodate. Accessed 16 September 2022.
- Matsuura Tatsuki. “Komonjo o kaidoku dekiru sumafo apuri– Toppan Insatsu ga Kaihatsu– kuzushiji taiō AI-OCR o Katsuyō.” IT Media, 13 September 2022. Accessed 15 September 2022.
- “Toppan Insatsu, AI-OCR de Komonjo o kaidoku suru Sumafo Apuri o Kaihatsu.” Toppan, 13 September 2022. Accessed 15 September 2022.