Our Research

Pre/Post-test Improvement in the Classroom

Students in a first semester Chinese course practiced with the Pinyin Tutor as part of their classroom exercises. They showed significant improvement on a test of their Pinyin skills between the start and end of the course.

In the study, the Pinyin Tutor gave one group of students practice with words from their classroom study, and another group was given practice with words outside the course curriculum. Both groups significantly improved performance after practicing with the Pinyin Tutor (see figure).

Using Machine Learning to Adapt to Each Student

A different learning path for each student

The sounds (phonemes) a student finds difficult to identify can depend on many factors. It may depend on their native language, what languages they were exposed to in school, the languages spoken by their parents, in their family, and their neighborhood growing up. Having exposure to tonal languages (Vietnamese, Thai, and Hmong) may help or compete with learning to identify Mandarin phonemes.

The number of Mandarin phonemes and contexts in which they can be spoken is huge. For instance, a student may find tone three universally difficult to disambiguate from tone two, or perhaps only when it's combined with the final "ao" vowel sound. The Pinyin Tutor explores this sound space with each student, giving more practice on sounds it identifies as being difficult for them.

Robust Learning

Utilizing 8,300 recordings, the Pinyin Tutor does not simply give more practice at the level of particular words or phrases incorrectly answered, but generates new items to give more practice on the difficult skills within words.

For instance, if a student incorrectly types "fan2 dian4" for "fan4 dian4", the tutor will not simply re-ask "fan4 dian4", but give more practice with tone 4, tone 4 with final "an", and items with syllable "fan4". Giving practice at this level has two benefits:

Superficial mnemonics can't be used as a fragile crutch, for instance just remembering that "fan4dian4" has two tone 4s. Instead, students will get practice on new words containing these difficult features leading to more robust learning.
Students will get practice on skills within different contexts to help indentify and remediate subtle difficulties.

Technical Details

The Pinyin Tutor creates this student-adaptive approach with machine learning techniques. Basically, the way this works is the tutor defines all the skills necessary to master Pinyin, tracks student performance at each of these skills, and uses statistical techniques to generate a phrase the student will learn the most from.

We define skills as all the initial consonant sounds, all possible final vowel sounds, all tones, along with all their combinations:

Initial consonant sounds:

b, p, m, f, d, t, n, l, g, k, h, j, q, x, z, c, s, r, zh, ch, sh, w, y.

Final vowel sounds:

a, e, i, o, u, v, ai, ao, an, ei, en, ia, ie, iu, in, ou, ua, uo, ui, un, ve, vn, ue, ang, eng, ian, ing, iao, ong, uai, uan, van, iang, iong, uang, ueng.

Tones:

Tone 1, Tone 2, Tone 3, Tone 4, Tone 5.

For each student input, the Pinyin Tutor records a "0" if a skill is applied incorrectly, "1" if a skill is applied correctly. Let's say the tutor presents "dian4 nao3", and the student enters "tian4 no2", the tutor will record:

d:	`0`
ian:	`1`
4:	`1`
n:	`1`
ao:	`0`
3:	`0`

Then, if on the second try, the student types "tian4 nao3", the tutor updates their record:

d:	`0,0`
ian:	`1,1`
4:	`1,1`
n:	`1,1`
ao:	`0,1`
3:	`0,1`

This process goes on as the tutor creates long sequences of observations of a student's attempt at each skill. These sequences are then fed into a statistical computation (a Hidden Markov Model) which is used to predict the likelihood the student will get the skill correct on their next try (see figure below).

Each skill has an HMM as shown which computes the probability the skill is in the "Learned" P(Learned State) or "Unlearned" P(Unlearned State) based on observed sequences of correct/incorrect for a skill. Using the sequences of observations at skill attempts and the output from the HMM computations, the next item the Pinyin Tutor presents is chosen from one of the least likely to be answered correctly by the student.

More details are in: Statistical Modeling of Student Performance to Improve Chinese Dictation Skills with an Intelligent Tutor. JEDM-Journal of Educational Data Mining 6.1 (2014).

Language Reseach Opportunities through Apps

The growing population with smartphones wanting to improve language skills at their convenience presents an opportunity for language researchers to test theories while users learn. The Pinyin Tutor app is designed to work completely offline without an Internet connection so users are free to learn while traveling in areas without WiFi. The Pinyin Tutor can still adapt to the user and collect data through a data-logging model that stores logs locally then sends them as soon as a connection is available.

We plan to further test and refine the Pinyin Tutor based on our findings. Also, many code modules and patterns described here can be reused in other apps to explore other language research areas. This is an exciting time!