Preserving Pragmatic Integrity: Hesitation Markers, Epistemic Modality and Trust in LLMs
This study investigates how linguistic choices in data preprocessing shape the relational dynamics of trust in human–machine interaction. It focuses on semantic hallucination in Large Language Models (LLMs), advancing the hypothesis that the systematic suppression of hesitation markers—such as filled pauses and reformulations—may affect the integrity of epistemic modality in conversational systems.
In human interaction, hesitations function as pragmatic metadata that signal caution and the limits of knowledge. However, the common editorial “sanitization” of datasets removes these markers, potentially encouraging models such as GPT-4 and Llama-3 to exhibit “certainty hallucination” (overconfidence). As a result, expressions of uncertainty may be rendered as categorical statements, potentially undermining user trust.
To examine this hypothesis, we draw on the Roda Viva Corpus, a historical archive from one of Brazil’s longest-running television interview programs, on air for nearly 40 years. Comprising more than 700 long-form interviews (each exceeding one hour), the corpus provides a dense record of spontaneous speech and complex public debate. We propose a contrastive benchmark comparing original and sanitized transcriptions to assess how the removal of hesitation markers affects models’ probabilistic calibration and semantic entropy.
By shifting the analytical focus from factual accuracy alone to the preservation of pragmatic integrity, this study contributes to the design of socially responsible conversational systems. We argue that sensitivity to linguistic markers of uncertainty is crucial for maintaining rapport and ensuring safe interaction in high-responsibility domains such as journalism and law, where distinctions between fact and tentative interpretation are central to the perceived reliability of AI.
Keywords: Human–Machine Interaction; Hesitation; Epistemic Modality; Calibration; Trust; Uncertainty.
References:
BENDER, E. M. et al. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? Em: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’21). Nova York, NY, EUA: Association for Computing Machinery, 2021. p. 610–623
DUBEY, A. et al. The Llama 3 Herd of Models. arXiv preprint arXiv:2407.21783, 2024
GUO, C. et al. On Calibration of Modern Neural Networks. Em: Proceedings of the 34th International Conference on Machine Learning (ICML). Sydney, Austrália: PMLR, 2017. p. 1321–1330
HYLAND, K. Metadiscourse: Exploring Interaction in Writing. Londres: Continuum, 2005
JI, Z. et al. Survey of Hallucination in Natural Language Generation. ACM Computing Surveys, v. 55, n. 12, p. 248:1–248:38, mar. 2023
KUHN, L.; GAL, Y.; FARQUHAR, S. Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation. In: International Conference on Learning Representations (ICLR), 2023
MARCUSCHI, L. A. Análise da conversação. 5. ed. São Paulo: Ática, 2003
MIELKE, S. J. et al. Reducing Conversational Agents’ Overconfidence Through Linguistic Calibration. Transactions of the Association for Computational Linguistics, v. 10, p. 857–872, 2022
SHRIBERG, E. To “errrr” is human: ecology and acoustics of speech disfluencies. In: Proceedings of the International Congress of Phonetic Sciences (ICPhS). San Francisco, 1999
VALE, O. A. Quem fala o quê no Roda Viva? Identifica