Generating accurate, concise pronunciation feedback through large language models (LLMs) presents a distinctive challenge at the intersection of applied linguistics, conversation design, and NLP engineering. This paper reports on the iterative development of a pronunciation feedback prompt for the French course of a language learning app, where learners read a sentence aloud and receive real-time corrective feedback powered by an LLM.
The core design requirement was to compare a mispronounced French sound to a familiar sound or word in the learner’s own language. For example if mispronouncing the word “aujourd’hui”, the correct output should be: “We pronounce the ‘-ui’ in ‘aujourd’hui’ like the word ‘we’.
However, in practice, this anchor-word approach revealed systematic failures rooted in the tension between linguistic knowledge and LLM behavior.
Key findings showed that LLMs consistently selected anchor words based on orthographic similarity rather than phonetic equivalence, a spelling bias that required explicit countermeasures including self-verification steps populated with the model’s own observed errors.
Then, sounds with no cross-linguistic equivalent, like French nasal vowels and the French “u”, demanded dedicated output templates, as forcing them into the standard pattern produced the highest rates of hallucination.
Finally, adapting the system across interface languages (English, French, Spanish, German) revealed that prompt architecture could remain constant while phonetic mappings required language-pair-specific calibration, with difficulty scaling predictably according to phonological distance between the source and interface languages.