Assessing the Quality of Modern Text-to-Speech Systems: A Conceptual Replication Study

Text-to-speech synthesizers (TTS) have captured the interest of L2 researchers and practitioners for their potential to enhance learning (e.g., Liakin et al., 2017). While previous research has suggested that TTS voices are not comparable to human speech in terms of naturalness (Cardoso et al., 2015; Bione & Cardoso, 2020) and prosodic authenticity (John & Cardoso, 2016), these findings may no longer be relevant considering the recent advancements in generative artificial intelligence (Gen-AI): Gen-AI-based TTS systems can now produce speech that mimics human intonation and emotions (Barakat et al., 2024).

» Titles and abstracts