How To Build a Japanese Pronunciation Checker With Python and Wit.ai

A simple way to test pronunciation skills in a foreign language

Sep 02, 2024

black and blue portable speaker — Photo by Kelly Sikkema on Unsplash

As a foreign language lover, I’ve always wanted to learn Japanese on a basic level. I think that Japanese pronunciation can be a bit difficult, especially for European people. It’d be nice to have a pronunciation checker that could provide feedback for learners. There are many useful resources like Duolingo which are great for self-studying, but they don’t focus on speech.

In this article, I’m excited to show you how to implement a Japanese pronunciation checker with the help of Python and Wit.ai. The model can be easily modified to fit almost any other language.

Let’s get started!

The Speech-Evaluation Approach

First, let’s understand how our app is going to work.

There are various ways to evaluate pronunciation. For example, with the help of complex-speech software like Praat, you can check how much your speech deviates from the native one.

Although tools like that are more accurate, it’s time-consuming to collect data and analyse the speech results.

Here’s the trick: Wit.ai supports around 140 languages, so there’s no need to gather data. It’s an open-source NLP engine that allows you to build conversational applications by converting human speech or text into structured data.

Just choose a language, and say something. If it recognizes correctly what you say, then your pronunciation is good.

The advantage of this approach is that it’s very easy to get started with and can be used for almost any other foreign language. Sometimes, it may be inaccurate, but it’s still valuable for beginners trying to learn a new language.

Here’s the plan:

We’ll create our own list of Japanese words we want to pronounce.
We’ll make an audio record saying a chosen word. The audio will be converted to text.
The Python code will compare the expected word with our input.

Create a New App in Wit.ai

I’ve already written a tutorial about Wit.ai, so I’m going to keep things short here:

Register to Wit.ai with your Facebook account.
Create a new Wit app.
Select “Japanese” from the language drop-down menu.

After creating your app, take a note of your access token in the app’s Settings tab.

That’s everything we need from Wit.ai!

Record Your Speech With Python

I recommend the PyAudio library to record speech.

Install the library:

pip3 install pyaudio

Create a Python file — for example, recorder.py— and paste the following code:

Don’t worry about it — it’s used for audio recording only. We’ll concentrate on the more interesting part.

Recognize and Evaluate the Japanese Speech

Let’s create a new Python file for speech recognition.

We’ll need a library to convert the Japanese characters (hiragana, katakana, and kanji) to Latin/Roman (rōmaji) characters. In my experience, Pykakasi is one of the best choices. If you know a better one, let me know in the comments.

pip3 install pykakasi

Add the necessary imports:

import requests
import json
import pykakasi
from recorder import record_audio, read_audio

Then, let’s add the Wit.ai configuration:

# Wit speech API endpoint
API_ENDPOINT = 'https://api.wit.ai/'
API_FUNCTION_SPEECH = 'speech'

# Wit API token
wit_access_token = 'VPEZHKEUXSSOGT4EVCO6JXCGTJLP'

Write a function to recognize the speech:

Then we need to convert the Japanese characters to rōmaji. This way, we’ll be able to compare the user input with the expected word, which is written in rōmaji.

The documentation of Pykakasi is pretty clear and concise. Here’s how to use the library:

Now, we want to evaluate the result. To do so, we’ll compare the expected word with the user’s audio record. If they’re equal, the pronunciation is correct.

Here’s the speech-evaluation function:

Finally, let’s write the main method to start the program:

Note that I have hard-coded the Japanese word “neko” (meaning cat) as a target word. This is for the sake of easier testing. Later, we’ll use a predefined word list and choose a random word.

I’ve also set the recording duration to four seconds. Shortly, we’ll create a spinner using Streamlit’s widgets to provide flexible duration value.

Test the prototype

Save the file, and start the program:

python3 japan.py

Say “neko” after the “Listening” prompt, and check the result:

Listening...
Finished recording.
{'entities': {}, 'intents': [], 'text': '猫', 'traits': {}}

You said: 猫
**************************************
猫[Neko]
You said: neko which is: Correct

Wit.ai recognized the word and produced a kanji character output (see the ‘text’ column).
The audio input has been converted to romaji thanks to Pykakasi.
Our input matches the expected word “neko.”

Now, say something different to test the behavior:

Listening...
Finished recording.
{'entities': {}, 'intents': [], 'text': 'テスト', 'traits': {}}

You said: テスト
**************************************
テスト[Tesuto]
You said: tesuto which is: Incorrect

Great! Our prototype is working. It’s time to build more features.

Customize the Program With Streamlit

So far, we have a program working in the command line. It’d be more interactive to see a UI in the browser.

We’ll be using Streamlit and Bokeh widgets to achieve this.

Install the necessary libraries:

pip3 install bokeh
pip3 install streamlit

Import these new statements:

import streamlit as st
import random
from bokeh.models.widgets import Button
from bokeh.models import CustomJS
import SessionState

Create a new text file with some Japanese words written in rōmaji. I prefer to see the rōmaji along with the Japanese characters in brackets. This is extra exercise. But you can leave it out.

Here’s an example of my words file:

sensei[せんせい]
neko[ねこ]
sushi[すし]

Modify the evaluate_speech function to check the target words before the square brackets:

Write a function to choose a random Japanese word from the list:

Finally, let’s modify the main method to create the UI using Streamlit and Bokeh:

We’ve created a spinner on the sidebar where the user can choose a duration from 1 to 10 seconds.

We’ve also added two buttons — one to start the audio recording and another one to pick a random word from the list.

Note that every time you interact with the widgets, Streamlit reruns the entire Python script from top to bottom. That’s why we need to cache the selected word’s value. Here’s how SessionState comes into play — it stores values in a cache. To use it, just download the gist, and place it into your project’s directory.

Test the Application

To run the app in your browser, execute this Streamlit command:

streamlit run japan.py

It will automatically open in a new tab.

The final result looks like this:

Animation demo of the complete project — Showcasing the complete app

Great! Our app understands Japanese and evaluates our speech.

If you want to create a more advanced UI, check out Streamlit’s widget guide.

Conclusion

Congratulations! You’ve made it to the end of this tutorial. Now you know how to evaluate Japanese pronunciation using Wit.ai and Python.

You can easily adjust the model to support another language by skipping the character-conversion step and adding your own implementation. Feel free to play around with my GitHub repository linked below.

You’ve also learned how to add widgets to your app and host how to host it on the web using Streamlit.

I hope that you learned something new today and this tutorial gave you inspiration for your next project.

References

Originally posted to medium.com.