Approved
Audio Fingerprinting - A Decomposing Study
Niklas Gälldin () and Victor Hultman ()
Start
2024-01-15
Presentation
2024-06-13 11:15
Location:
E:3139
Finished:
2024-06-30
Master's thesis:
Abstract
Audio fingerprinting is a widely employed technique that involves generating unique fingerprints for given audio signals that later can be used for identification. A well-known example of this is the Shazam application where the concept is to match a short song snippet with a database to find the name of the song and artist. Generally, the audio fingerprints are created by applying a time-frequency transform on the audio signal and extracting the most prominent features in the time-frequency domain. There are different transforms with different properties but the standard choice is the short-time Fourier transform (STFT). This study compares the performance of the STFT with the Hyper Localized Wavelet Transform (HLT) within an audio fingerprinting pipeline, focusing on three key metrics: correctly identifying songs (accuracy), robustness towards noise, and memory. Results indicate that while the STFT and the HLT demonstrate comparable accuracy, the latter exhibits superior noise robustness with a smaller memory usage. The STFT was found to generate approximately 1.23 times more data when creating the fingerprint database compared to the HLT.
Supervisor: Henrik Jörntell () and Kaan Kesgin () and Fredrik Edman (EIT)
Examiner: Erik Larsson (EIT)