Machine Learning for Speech

Time: 19-23 August 2019

Campus: Joensuu

Duration and credits: 1 week, 3 ECTS (lectures + practicals + learning diary) OR 5 ECTS (1 week lectures + practicals + learning diary + optional project work)
Teaching language: English
Level: Master and doctoral students
Max. number of attendees: not limited
Course coordinator: Tomi Kinnunen,  
Responsible department: School of Computing
Learning outcomes: the course is intended as a brief introduction to machine learning techniques and their application to selected speech applications. We focus in particular to speaker and language recognition and voice anti-spoofing, and will briefly touch upon other miscellaneous topics. The course will involve lectures, practicals / computer exercises, and learning diary. While no formal pre-requirements are set, sufficient programming knowledge and certain level of mathematics/statistics (linear algebra and probability theory) will be helpful for the maximum benefit of the participant. The foreseen programming tools in the 2019 course edition include Python and to a lesser extent, Matlab.


Basics of digital speech processing (1 day)
Speech as acoustic and linguistic object, representation of digital speech signal, Fourier transform, mel-frequency cepstral coefficients, linear prediction.
Basics of statistical pattern recognition and machine learning (1,5 days)
Elementary supervised and unsupervised learning, brief introduction to classic pattern recognition (Bayes’ rule, normal distribution, mixture models, logistic regression, dimensionality reduction, etc.) as well as modern deep learning models (feedforward, convolutive and recursive neural networks).
Speaker and language recognition as machine learning problem (1 day)
Representation learning for speaker and language recognition, including Gaussian mixture model supervectors, i-vector, x-vector, and back-end modeling techniques. Objective evaluation of speaker recognition, score calibration.
Research problems of speech technology (1,5 days)
Emerging research problems in speaker recognition and related problems, including spoofing attack detection, ASVspoof challenge, voice conversion, generative adversarial networks.
Modes of study: Lectures, practicals, learning diary
Study materials: delivered during the course
Evaluation criteria: pass/fail
Teachers: Lectures: Tomi Kinnunen (UEF), Ville Hautamäki (UEF), Stefan Werner (UEF), Abraham Woubie (UEF), Rosa Gonzalez-Hautamäki (UEF), post-doc N.N. and N.N. lecturers outside UEF
Practicals: Ville Vestman, Trung Ngo Trong, Anssi Kanervisto