| | SLO | ENG | Cookies and privacy

Bigger font | Smaller font

Show document Help

Title:Segmentacija in grozdenje govorcev za sisteme avtomatskega razpoznavanja spontanega govora
Authors:ID Grašič, Matej (Author)
ID Kačič, Zdravko (Mentor) More about this mentor... New window
ID Žgank, Andrej (Co-mentor)
Files:.pdf DR_Grasic_Matej_2010.pdf (1,75 MB)
MD5: AFC99F34DDE5253F7ED4403BCD2F9054
PID: 20.500.12556/dkum/2f147385-d1bc-4d3d-bf14-d9569ee7d19b
 
Language:Slovenian
Work type:Dissertation
Organization:FERI - Faculty of Electrical Engineering and Computer Science
Abstract:V doktorski disertaciji obravnavamo problem segmentacije in grozdenja govorcev znotraj pogovornih, radijskih in televizijskih oddaj za sisteme avtomatskega razpoznavanja spontanega govora. Cilj predstavljene doktorske disertacije je definicija, implementacija in vrednotenje uspešnosti novega postopka za segmentacijo in grozdenje govorcev (diarizacijo govorcev). V okviru disertacije smo najprej realizirali referenčni online sistem za diarizacijo govorcev, ki temelji na postopku Bayesovega informacijskega kriterija (ang. »Bayesian Information Criterion« - BIC). Za referenčni sistem smo nato definirali akustične značilke z boljšo razločljivostjo govorcev v akustičnem prostoru. Nato smo dodali v postopek segmentacije statistični kriterij, kjer smo za normalizacijo rezultata kriterija uporabili univerzalni model govorca (ang. »Universal Background Model« - UBM). Ta postopek je predvsem uporaben v primerih, ko je akustične informacije znotraj segmentov premalo, da bi lahko zgradili popoln model govorca. V doktorski disertaciji smo vrednotili dva statistična kriterija in sicer razmerje križne verjetnosti (ang. »Cross Likelihood Ratio« - CLR) in normalizirano razmerje križne verjetnosti (ang. »Normalized Cross Likelihood Ratio« - NCLR). Oba kriterija izvirata s področja verifikacije govorcev, pri čemer je bila pokazana boljša uspešnost kriterija NCLR. V postopku segmentacije smo statistični kriterij uporabili kot dodatni pogoj, s katerim smo lahko izločili nepravilne prehode govorcev. Po določitvi najboljšega statističnega kriterija za področje segmentacije smo podoben pristop uporabili tudi za grozdenje. V primeru grozdenja smo kriterij BIC iz osnovnega sistema zamenjali s statističnim kriterijem za določitev grozdov. Pri tem smo vpeljali modeliranje govorca z več grozdi. Na tak način smo zajeli spreminjanje glasu govorca v posnetku. Na koncu smo optimirali celoten sistem z normalizacijo rezultata izbranega kriterija z referenčno vrednostjo kriterija; postopek je olajšal izbiro pragovne vrednosti ter izboljšal uspešnost. Prav tako smo izboljšali pravilno zaznavo kratkih segmentov govorca. To smo izvedli z adaptacijo statističnega kriterija glede na dolžino okna analize, kar je izboljšalo linearnost kriterija v primerih kratkega okna analize. V zadnji fazi smo izvedli končno ocenjevanje uspešnosti uporabljenih segmentacijskih algoritmov. Oceno uspešnosti predlaganega online sistema za diarizacijo govorcev smo izvedli na osnovi primerjave z osnovnim sistemom za diarizacijo govorcev, temelječim na postopku BIC. V drugi fazi smo primerjavo razširili na offline sisteme, kjer smo uporabili prosto dostopni offline diarizacijski sistem mClust . Za gradnjo univerzalnega modela splošnega govorca ter za določitev optimalnih vrednosti parametrov segmentacijskih postopkov smo uporabili učni del slovenske baze BNSI Broadcast News. Vrednotenje online in offline postopkov smo opravili na testnem delu slovenske in angleške govorne baze Broadcast News.
Keywords:segmentacija govorcev, grozdenje govorcev, online diarizacija govorcev, avtomatsko razpoznavanje spontanega govora, procesiranje govornega signala, akustične značilke, statistični kriteriji, razpoznavanje govorcev
Place of publishing:[Maribor
Publisher:M. Grašič]
Year of publishing:2010
PID:20.500.12556/DKUM-14624 New window
UDC:004.934
COBISS.SI-ID:251660288 New window
NUK URN:URN:SI:UM:DK:HNBQBV9B
Publication date in DKUM:01.07.2010
Views:2776
Downloads:248
Metadata:XML RDF-CHPDL DC-XML DC-RDF
Categories:KTFMB - FERI
:
Kopiraj citat
  
Average score:(0 votes)
Your score:Voting is allowed only for logged in users.
Share:AddThis
AddThis uses cookies that require your consent. Edit consent...

Hover the mouse pointer over a document title to show the abstract or click on the title to get all document metadata.

Secondary language

Language:English
Title:Speaker segmentation and clustering for systems of automatic spontaneous speech recognition
Abstract:The doctoral thesis addresses the problem of speaker segmentation and clustering of speakers within the Broadcasts News domain for automatic spontaneous speech recognition systems. The aim of the doctoral dissertation is to present the definition, implementation and performance evaluation of a new procedure for speaker segmentation and clustering (speaker diarization). In the thesis the online reference system for speaker diarization based on a Bayesian Information Criterion - BIC was implemented first. Then the acoustic features with better speaker discrimination in the acoustic space were defined. Next a statistical segmentation criterion was added in the segmentation phase, where the result was normalized using a Universal Background Model - UBM. This procedure is particularly useful in cases where there is limited information within the segments, not enough to construct a complete model of a speaker. In the doctoral thesis, two statistical criterions, namely the criterion of Cross Likelihood Ratio – CLR and Normalized Cross Likelihood Ratio - NCLR were evaluated. Both criterions were introduced in the field of speaker verification and it was indicated that the NCLR criterion gives better verification performance. In the process of segmentation, the NCLR criterion was used as an additional statistical test, with which incorrect speaker transitions were eliminated. After determining the best statistical criterion for speaker segmentation, the best criterion also for speaker clustering was evaluated. In the case of clustering the basic clustering criterion BIC in the basic system was changed to the previously defined statistical criterion for speaker clustering. In doing so, modeling of speakers with more then one cluster was introduced. In this way the changing voice of a speaker in the audio voice recording was covered. Finally, the entire system was optimized using normalization with reference criterion selection; the normalization process eased decision threshold selection and it also improved speaker diarization performance. At the end the segmentation of short speaker segments was also enhanced. This was achieved with the compensation of the statistical criterion value in dependence of the analysis window length, which improves the linearity of the criterion in cases of a short analysis window. In the last stage, a final evaluation of the performance of the segmentation algorithms used was performed. The effectiveness of the proposed online speaker diarization system was examined, by comparing it with the reference online speaker diarization system based on the BIC criterion. The evaluation was further extended by including an offline system. A public available offline speaker diarization system mClust was used. For the construction of a UBM speaker model and for optimal parameters selection the training part of Slovenian Broadcast News database was used. Evaluation of online and offline systems was performed on the test part of the Slovenian and English Broadcast News speech database.
Keywords:speaker segmentation, speaker clustering, on-line speaker diarization, automatic speech recognition, speaker recognition


Comments

Leave comment

You must log in to leave a comment.

Comments (0)
0 - 0 / 0
 
There are no comments!

Back
Logos of partners University of Maribor University of Ljubljana University of Primorska University of Nova Gorica