Download PDFOpen PDF in browserCross-Lingual Speech Emotion Recognition Using English and Mandarin on Thai DataEasyChair Preprint 1544316 pages•Date: November 20, 2024AbstractThis study explores the efficacy of cross-lingual Speech Emotion Recognition (SER) using Thai as a target language with training sets in English and Mandarin. The study evaluates the adaptability of SER models across linguistic boundaries, emphasizing the challenges and potential of leveraging well-resourced languages to enhance emotion recognition capabilities in a language with fewer resources. Through a series of experiments, the research investigates three primary aspects: the performance of same-corpus training within Thai, cross-lingual model application from English and Mandarin to Thai, and the effectiveness of transfer learning techniques in improving model accuracy. The findings indicate that Mandarin facilitates more effective cross-lingual SER with Thai compared to English. However, despite the initial promise, models trained on Mandarin or English and applied to Thai did not outperform those trained directly on Thai in the same-corpus settings, suggesting limited benefits from cross-lingual training without sophisticated adaptation methods. Transfer learning emerged as a pivotal strategy, particularly when models pre-trained on large datasets in Mandarin were fine-tuned with Thai data, showing improved performance, and suggesting a scalable approach for deploying SER systems in multilingual contexts. Keyphrases: Cross-lingual, Thai language, deep learning, speech emotion recognition
|