Approved

Speaker Recognition using Biology-Inspired Feature Extraction

Edvin Andersson (2015)

Start

2021-01-20

Presentation

2021-06-21 11:15

Location:

https://lu-se.zoom.us/j/69378152860

Finished:

2021-06-30

Master's thesis:

EITM01-rapport-1387-836.pdf

Abstract

Distinguishing between the voices of people is something the human brain does naturally using only frequencies picked up by the inner ear. The field of speaker recognition is concerned with making machines do the same thing using digitally sampled speech and data processing. The processing extracts relevant information about the speech from the high dimensional acoustic data which can help the machine understand to which speaker a speech sample belongs. There exists several methods to solve this problem, most of which are based on modelling a speech sample as a sequence of time frames each representing the current frequency characteristics of the sound input. A very common choice of frequency characteristic are MFCC coefficients which represent the overall shape of the frequency spectrum representation of the input during each time frame. This thesis presents a different approach, inspired by findings of how the human brain processes tactile sensory input, which lets an unsupervised learning model pick out important combinations of frequencies from the signal. These different combinations of frequencies arise because they have an observed spatiotemporal relationship across multiple samples and speakers in which their intensities correlate in time. Extracting spatiotemporal patterns between input frequencies as features instead of the overall spectrum shape can lead to new, more robust ways of encoding auditory data.

Supervisor: Fredrik Edman (EIT)
Examiner: Erik Larsson (EIT)

Electrical and Information Technology

Faculty of Engineering LTH | Lund University

Approved

Speaker Recognition using Biology-Inspired Feature Extraction

Abstract