Approved
Speaker Recognition using Biology-Inspired Feature Extraction
Edvin Andersson (2015)
Start
2021-01-20
Presentation
2021-06-21 11:15
Location:
https://lu-se.zoom.us/j/69378152860
Finished:
2021-06-30
Master's thesis:
Abstract
Distinguishing between the voices of people is something the human brain does naturally using only frequencies picked up by the inner ear. The field of speaker recognition is concerned with making machines do the same thing using digitally sampled speech and data processing. The processing extracts relevant information about the speech from the high dimensional acoustic data which can help the machine understand to which speaker a speech sample belongs. There exists several methods to solve this problem, most of which are based on modelling a speech sample as a sequence of time frames each representing the current frequency characteristics of the sound input. A very common choice of frequency characteristic are MFCC coefficients which represent the overall shape of the frequency spectrum representation of the input during each time frame. This thesis presents a different approach, inspired by findings of how the human brain processes tactile sensory input, which lets an unsupervised learning model pick out important combinations of frequencies from the signal. These different combinations of frequencies arise because they have an observed spatiotemporal relationship across multiple samples and speakers in which their intensities correlate in time. Extracting spatiotemporal patterns between input frequencies as features instead of the overall spectrum shape can lead to new, more robust ways of encoding auditory data.
Supervisor: Fredrik Edman (EIT)
Examiner: Erik Larsson (EIT)