Your browser does not allow JavaScript!
JavaScript is necessary for the proper functioning of this website. Please enable JavaScript or use a modern browser.
|
|
SLO
|
ENG
|
Cookies and privacy
DKUM
EPF - Faculty of Business and Economics
FE - Faculty of Energy Technology
FERI - Faculty of Electrical Engineering and Computer Science
FF - Faculty of Arts
FGPA - Faculty of Civil Engineering, Transportation Engineering and Architecture
FKBV - Faculty of Agriculture and Life Sciences
FKKT - Faculty of Chemistry and Chemical Engineering
FL - Faculty of Logistic
FNM - Faculty of Natural Sciences and Mathematics
FOV - Faculty of Organizational Sciences in Kranj
FS - Faculty of Mechanical Engineering
FT - Faculty of Tourism
FVV - Faculty of Criminal Justice and Security
FZV - Faculty of Health Sciences
MF - Faculty of Medicine
PEF - Faculty of Education
PF - Faculty of Law
UKM - University of Maribor Library
UM - University of Maribor
UZUM - University of Maribor Press
COBISS
Faculty of Business and Economic, Maribor
Faculty of Agriculture and Life Sciences, Maribor
Faculty of Logistics, Celje, Krško
Faculty of Organizational Sciences, Kranj
Faculty of Criminal Justice and Security, Ljubljana
Faculty of Health Sciences
Library of Technical Faculties, Maribor
Faculty of Medicine, Maribor
Miklošič Library FPNM, Maribor
Faculty of Law, Maribor
University of Maribor Library
Bigger font
|
Smaller font
Introduction
Search
Browsing
Upload document
For students
For employees
Statistics
Login
First page
>
Show document
Show document
Title:
Razpoznava govorcev na mobilni platformi : magistrsko delo
Authors:
ID
Fartek, Jože
(Author)
ID
Holobar, Aleš
(Mentor)
More about this mentor...
Files:
MAG_Fartek_Joze_2022.pdf
(3,95 MB)
MD5: 73F1637C5145DED5F26B80B7A97318B8
PID:
20.500.12556/dkum/2a30c972-2729-4ef1-86d3-0a69b822b4df
Language:
Slovenian
Work type:
Master's thesis/paper
Typology:
2.09 - Master's Thesis
Organization:
FERI - Faculty of Electrical Engineering and Computer Science
Abstract:
V magistrskem delu smo predstavili osnove razpoznave govorcev. V ta namen smo najprej opisali izračun vokalnih značilnic. Podrobneje smo predstavili metodo izračuna mel-frekvenčnih kepstralnih koeficientov (MFCC) in prednosti metode v primerjavi z ostalimi pristopi. Opisali smo tudi učenje glasovnih modelov in novejši metodi, ki temeljita na supervektorjih. Na podlagi tega smo v nadaljevanju magistrskega dela razvili Androidovo mobilno aplikacijo, ki v realnem času razpoznava govorce. Pri razpoznavi govorcev smo se omejili na razpoznavo le nekaj oseb. Iz zvočnih posnetkov posameznih govorcev smo izračunali MFCC in jih uporabili za učenje glasovnega modela s pomočjo konvolucijske nevronske mreže. Za optimizacijo parametrov smo primerjali, kako različni parametri vplivajo na učenje glasovnega modela. Primerjali smo, kako dolžina zvočnih posnetkov v razponu 0,5–3 sekunde vpliva na uspešnost razpoznave. Ugotovili smo, da uspešnost modela z večanjem dolžine zvočnega posnetka vse do 1,5 sekunde narašča, nato pa se naraščanje ustavi. Pri primerjavi števila MFCC med 16 in 128 uspešnost modela do 48 MFCC narašča, nato pa se naraščanje ustavi. Pri primerjavi nivoja izpuščenih nevronov med 0 in 0,7 dobimo boljšo natančnost modela z večanjem nivoja izpuščenih nevronov do 0,5, nato pa začne uspešnost padati. Glede na primerjavo smo pri učenju glasovnega modela uporabili zvočne posnetke dolžine 1 sekunde, 32 izračunanih MFCC in nivo izpuščenih nevronov 0,4. Pri tem smo dobili 88-odstotno natančnost modela. Pri razpoznavi smo ugotovili, da hitrost govora vpliva na uspešnost razpoznave, medtem ko glasnost govora nanjo ne vpliva. Testiranje smo izvajali na mobilni napravi LG G7 ThinQ. Izračun MFCC na mobilni napravi je v povprečju trajal 170 milisekund, razpoznava z modelom TensorFlow Lite pa le 8 milisekund.
Keywords:
razpoznava govorcev
,
mel-frekvenčni kepstralni koeficienti
,
konvolucijske nevronske mreže
,
Android
Place of publishing:
Maribor
Place of performance:
Maribor
Publisher:
[J. Fartek]
Year of publishing:
2021
Number of pages:
1 spletni vir (1 datoteka PDF (X, 64 f.))
PID:
20.500.12556/DKUM-81072
UDC:
004.934.8\'1(043.2)
COBISS.SI-ID:
98851331
Publication date in DKUM:
31.01.2022
Views:
947
Downloads:
69
Metadata:
Categories:
KTFMB - FERI
Cite this work
Plain text
BibTeX
EndNote XML
EndNote/Refer
RIS
ABNT
ACM Ref
AMA
APA
Chicago 17th Author-Date
Harvard
IEEE
ISO 690
MLA
Vancouver
:
FARTEK, Jože, 2021,
Razpoznava govorcev na mobilni platformi : magistrsko delo
[online]. Master’s thesis. Maribor : J. Fartek. [Accessed 25 March 2025]. Retrieved from: https://dk.um.si/IzpisGradiva.php?lang=eng&id=81072
Copy citation
Average score:
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
(0 votes)
Your score:
Voting is allowed only for
logged in
users.
Share:
Similar works from our repository:
Ocenjevanje starosti osebe na osnovi digitalnih posnetkov z uporabo konvolucijskih nevronskih mrež
Razpoznavanje človeških emocij na digitalnih posnetkih s pomočjo konvolucijskih nevronskih mrež
Prepoznavanje aktivnosti osebe iz zaporedja slik s pomočjo konvolucijskih nevronskih mrež
Detekcija osebe v globinski sliki s pomočjo konvolucijskih nevronskih mrež
Razpoznavanje drevesnih značilnosti iz fotografije s pomočjo konvolucijskih nevronskih mrež
Similar works from other repositories:
Detekcija in klasifikacija objektov v vodnem okolju s pomočjo konvolucijskih nevronskih mrež
Ear detection with convolutional neural networks
Prepoznavanje starosti oseb s slik obrazov z uporabo konvolucijskih nevronskih mrež
Prepoznavanje šarenice s pomočjo nevronskih mrež
Semantična segmentacija slik za razpoznavanje notranjih prostorov
Hover the mouse pointer over a document title to show the abstract or click on the title to get all document metadata.
Licences
License:
CC BY-NC-ND 4.0, Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
Link:
http://creativecommons.org/licenses/by-nc-nd/4.0/
Description:
The most restrictive Creative Commons license. This only allows people to download and share the work for no commercial gain and for no other purposes.
Licensing start date:
22.12.2021
Secondary language
Language:
English
Title:
Speaker recognition on mobile devices
Abstract:
In this master's thesis, we review the basics of speaker recognition. We described how audio feature extraction works. We look more into details how Mel-frequency Cepstral Coefficients feature extraction works and what are its advantages compared to other feature extraction methods. This part is followed by an overview of speaker models and newer methods based on super vectors. Based on this, we have developed a mobile application, which recognizes speakers in real-time. Application was developed for operating system Android. In identifying speakers, we limited recognition to only a few people. Mel-frequency Cepstral Coefficients were extracted from the audio recordings of individual speakers and used to train the speaker model using a convolutional neural network. To get better results in a real-time recognition, we compared how different parameters affect the training of the speaker model. We compared how the length of the audio recording between 0,5 and 3 seconds affects the recognition performance. We found out that the performance of the sound model increases with increasing the length of the audio recording up to 1,5 seconds, and then the increasing stops. We compared speaker model performance by changing the number of MFCC coefficients between 16 and 128. Performance of the modal is increasing up to 48 MFCC coefficients and then the increasing stops. We also compared the affect of neural network dropout rate between 0 and 0,7. The speaker model performance is increasing up to a 0,5 dropout rate and then the performance begins to decline. According to the comparison, for the implemented mobile application we used an audio recordings of one second length, 32 MFCC coefficients and 0,4 for dropout rate. We achieved 88% accuracy of the speaker model. We measured how speech tempo and loudness affect recognition accuracy. The slower and faster we speak the recognition accuracy is decreasing while with loudness the accuracy it’s not affected. We performed testing on LG G7 ThinkQ mobile device and measured that the average time to calculate MFCC coefficients is 170 milliseconds and recognition with the TensorFlow Lite model takes only 8 milliseconds.
Keywords:
Speaker recognition
,
Mel-frequency Cepstral Coefficients
,
Convolutional neural network
,
Android
Comments
Leave comment
You must
log in
to leave a comment.
Comments (0)
0 - 0 / 0
There are no comments!
Back