1. Samodejno odkrivanje anomalij v dnevniških zapisih omrežnega stikala z uporabo nevronskih mrež na grafihAnže Dolenc, 2025, magistrsko delo Opis: V magistrski nalogi obravnavamo zaznavanje anomalij v dnevniških zapisih omrežnega stikala z uporabo nevronskih mrež nad grafi. Klasične metode analize logov nadomestimo s pristopom Log2Graph, ki dnevniške zapise pretvori v grafe s pomočjo razčlenjevalnika Drain in vektorskih predstavitev GloVe ter TF-IDF. Za učenje uporabljamo model DiGCN in preučimo vpliv vrednosti hiperparametrov, deleža anomalij ter kontaminacije učne množice na uspešnost zaznavanja anomalij. Rezultate ocenimo z metrikami AP, ROC AUC in F1. Pristop izkazuje robustnost in prilagodljivost pri zaznavanju anomalij v realnih omrežnih podatkih. Ključne besede: dnevniški zapisi, omrežno stikalo, zaznava anomalij, strojno učenje, Log2Graph Objavljeno v DKUM: 10.07.2025; Ogledov: 0; Prenosov: 31
Celotno besedilo (4,49 MB) |
2. Comparative study of modern differential evolution algorithms : perspectives on mechanisms and performanceJanez Brest, Mirjam Sepesy Maučec, 2025, izvirni znanstveni članek Opis: Since the discovery of the Differential Evolution algorithm, new and improved versions have continuously emerged. In this paper, we review selected algorithms based on Differential Evolution that have been proposed in recent years. We examine the mechanisms integrated into them and compare the performance of algorithms. To compare their performances, statistical comparisons were used as they enable us to draw reliable conclusions about the algorithms’ performances. We use the Wilcoxon signed-rank test for pairwise comparisons and the Friedman test for multiple comparisons. Subsequently, the Mann–Whitney U-score test was added. We conducted not only a cumulative analysis of algorithms, but we also focused on their performances regarding the function family (i.e., unimodal, multimodal, hybrid, and composition functions). Experimental results of algorithms were obtained on problems defined for the CEC’24 Special Session and Competition on Single Objective Real Parameter Numerical Optimization. Problem dimensions of 10, 30, 50, and 100 were analyzed. In this paper, we highlight promising mechanisms for further development and improvements based on the study of the selected algorithms. Ključne besede: global optimization, differential evolution, benchmark suite, mechanisms, statistical tests, performance Objavljeno v DKUM: 19.05.2025; Ogledov: 0; Prenosov: 4
Celotno besedilo (313,93 KB) |
3. Vpliv napak razpoznavalnika govora na kakovost strojnih prevodov v sistemih prevajanja govora v govor : magistrsko deloKlemen Stanič, 2025, magistrsko delo Opis: Magistrsko delo analizira vpliv napak avtomatskega razpoznavalnika govora na kakovost strojnih prevodov v sistemih prevajanja govora v govor. S prevajalnikom smo prevedli več izhodov razpoznavalnika govora, ki so se med seboj razlikovali v kvaliteti razpoznavanja, kot referenčne prevode pa smo vzeli prevod nabora povedi, ki je bil uporabljen na vhodu v razpoznavalnik. Tipi napak, ki smo jih obravnavali, so bili: vstavljanje, brisanje in zamenjava besede. V nadaljevanju smo jih še podrobneje razdelali glede na besedne vrste in obseg spremembe. Vpliv napak na kakovost prevodov smo ocenjevali z metriko BLEU. Ugotovili smo, da določene vrste napak bolj vplivajo na kakovost prevoda kakor druge. Ključne besede: razpoznavanje govora, strojno prevajanje, napaka razpoznavalnika, nevronske mreže, BLEU Objavljeno v DKUM: 31.03.2025; Ogledov: 0; Prenosov: 32
Celotno besedilo (2,35 MB) |
4. On the use of morpho-syntactic description tags in neural machine translation with small and large training corporaGregor Donaj, Mirjam Sepesy Maučec, 2022, izvirni znanstveni članek Opis: With the transition to neural architectures, machine translation achieves very good quality for several resource-rich languages. However, the results are still much worse for languages
with complex morphology, especially if they are low-resource languages. This paper reports the
results of a systematic analysis of adding morphological information into neural machine translation
system training. Translation systems presented and compared in this research exploit morphological
information from corpora in different formats. Some formats join semantic and grammatical information and others separate these two types of information. Semantic information is modeled using
lemmas and grammatical information using Morpho-Syntactic Description (MSD) tags. Experiments
were performed on corpora of different sizes for the English–Slovene language pair. The conclusions
were drawn for a domain-specific translation system and for a translation system for the general
domain. With MSD tags, we improved the performance by up to 1.40 and 1.68 BLEU points in the
two translation directions. We found that systems with training corpora in different formats improve
the performance differently depending on the translation direction and corpora size. Ključne besede: neural machine translation, POS tags, MSD tags, inflected language, data sparsity, corpora size Objavljeno v DKUM: 28.03.2025; Ogledov: 0; Prenosov: 11
Celotno besedilo (448,16 KB) Gradivo ima več datotek! Več... |
5. Strategies for managing time and costs in speech corpus creation : insights from the Slovenian ARTUR corpusDarinka Verdonik, Andreja Bizjak, Andrej Žgank, Mirjam Sepesy Maučec, Mitja Trojar, Jerneja Žganec Gros, Marko Bajec, Iztok Lebar Bajec, Simon Dobrišek, 2024, izvirni znanstveni članek Opis: Parliamentary debates represent an essential part of democratic discourse and provide insights into various socio-demographic and linguistic phenomena - parliamentary corpora, which contain transcripts of parliamentary debates and extensive metadata, are an important resource for parliamentary discourse analysis and other research areas. This paper presents the Slovenian parliamentary corpus siParl, the latest version of which contains transcripts of plenary sessions and other legislative bodies of the Assembly of the Republic of Slovenia from 1990 to 2022, comprising more than 1 million speeches and 210 million words. We outline the development history of the corpus and also mention other initiatives that have been influenced by siParl (such as the Parla-CLARIN encoding and the ParlaMint corpora of European parliaments), present the corpus creation process, ranging from the initial data collection to the structural development and encoding of the corpus, and given the growing influence of the ParlaMint corpora, compare siParl with the Slovenian ParlaMint-SI corpus. Finally, we discuss updates for the next version as well as the long-term development and enrichment of the siParl corpus. Ključne besede: recording speech, transcribing speech, transcription guidelines, Less-resourced language Objavljeno v DKUM: 04.02.2025; Ogledov: 0; Prenosov: 18
Celotno besedilo (1,09 MB) Gradivo ima več datotek! Več... |
6. Sequence-to-Sequence models and their evaluation for spoken language normalization of SlovenianMirjam Sepesy Maučec, Darinka Verdonik, Gregor Donaj, 2024, izvirni znanstveni članek Opis: Sequence-to-sequence models have been applied to many challenging problems, including
those in text and speech technologies. Normalization is one of them. It refers to transforming
non-standard language forms into their standard counterparts. Non-standard language forms
come from different written and spoken sources. This paper deals with one such source, namely
speech from the less-resourced highly inflected Slovenian language. The paper explores speech
corpora recently collected in public and private environments. We analyze the efficiencies of three
sequence-to-sequence models for automatic normalization from literal transcriptions to standard
forms. Experiments were performed using words, subwords, and characters as basic units for
normalization. In the article, we demonstrate that the superiority of the approach is linked to the
choice of the basic modeling unit. Statistical models prefer words, while neural network-based
models prefer characters. The experimental results show that the best results are obtained with
neural architectures based on characters. Long short-term memory and transformer architectures
gave comparable results. We also present a novel analysis tool, which we use for in-depth error
analysis of results obtained by character-based models. This analysis showed that systems with
similar overall results can differ in the performance for different types of errors. Errors obtained with
the transformer architecture are easier to correct in the post-editing process. This is an important
insight, as creating speech corpora is a time-consuming and costly process. The analysis tool also
incorporates two statistical significance tests: approximate randomization and bootstrap resampling.
Both statistical tests confirm the improved results of neural network-based models compared to
statistical ones. Ključne besede: low-resource language, applications, spoken language, normalization, character unit, subword unit, statistical model, long short-term memory, transformer, error analysis Objavljeno v DKUM: 31.01.2025; Ogledov: 0; Prenosov: 12
Celotno besedilo (437,99 KB) Gradivo ima več datotek! Več... |
7. Uporaba evolucijskih algoritmov v statističnem in hibridnem strojnem prevajanju : doctoral dissertationJani Dugonik, 2025, doktorska disertacija Opis: Doktorska disertacija obravnava področje strojnega prevajanja visoko fleksibilnih jezikov, osredotoča pa se na izzive tako statističnega kot nevronskega strojnega prevajanja, ki jih prinašajo strukturne razlike med visoko fleksibilnimi jeziki in angleščino. Naša raziskava vključuje tudi eksperimentalni del, izveden na jezikovnem paru \mbox{slovenščina--angleščina}, ki zajema prevajanje v obe smeri. V prvem eksperimentu smo načrtovali nov pristop za optimizacijo parametrov v statističnem strojnem prevajanju z uporabo evolucijskih algoritmov. Primerjali smo sisteme statističnega strojnega prevajanja, optimizirane s klasičnimi algoritmi za optimizacijo uteži v statističnem strojnem prevajanju, in sisteme, optimizirane z evolucijskimi algoritmi. V drugem eksperimentu pa smo načrtovali in razvili hibridni pristop, ki vključuje sisteme statističnega in nevronskega strojnega prevajanja. Izvorno poved in dva ciljna prevoda, prevedena z obema sistemoma, smo pretvorili v isti vektorski prostor, iz katerega smo nato pridobili vektorje značilk. V okviru doktorske disertacije smo pred\-lagali nov nabor značilk. Z uporabo klasifikatorjev smo nato izbrali boljšega izmed dveh prevodov, statističnega in nevronskega. Evalvacijo sistemov strojnega prevajanja smo izvedli z uporabo uveljavljenih metrik, kot so BLEU, TER, chrF in COMET. Opravili smo statistično analizo eksperimentalnih rezultatov s ponovnim vzorčenjem, ki je pokazala statistično pomembne razlike v kakovosti ustvarjenih prevodov. Eksperimentalni rezultati potrjujejo, da smo s predlaganimi pristopi izboljšali kakovost strojnih prevodov. Ključne besede: evolucijski algoritem, statistično strojno prevajanje, nevronsko strojno prevajanje, hibridni pristop strojnega prevajanja, optimizacija, predstavitev besed, klasifikacija, obratno prevajanje Objavljeno v DKUM: 29.01.2025; Ogledov: 0; Prenosov: 95
Celotno besedilo (1,21 MB) |
8. Analiza algoritmov stiskanja na primeru tekstovnih datotek v različnih jezikihKlemen Arzenšek, 2024, magistrsko delo Opis: Magistrsko delo obravnava različne algoritme stiskanja tekstovnih datotek in analizira, ali jezik, v katerem je zapisana vhodna datoteka, vpliva na uspešnost stiskanja z izbranimi algoritmi. Preučeni in predstavljeni bodo izbrani algoritmi stiskanja, ugotovljene prednosti uporabe izbranih algoritmov stiskanja tekstovnih datotek, določene entropije analiziranih jezikov na ravni znakov, izvedeni praktični testi izbranih algoritmov stiskanja tekstovnih datotek s testnimi vzorci različnih jezikov, analizirano in ugotovljeno, ali jezik v izbranih testnih vzorcih vpliva na uspešnost posameznih algoritmov stiskanja tekstovnih datotek. Delo bo iskalo povezave med entropijo jezika in uspešnostjo stiskanja. Na koncu bo na primeru Huffmanovega algoritma, ki kodira posamezne znake, preverjeno, ali kodiranje daljših nizov izboljša učinkovitost kodiranja. Ključne besede: naravni jezik, entropija jezika, algoritmi stiskanja, algoritem LZW, tekstovne datoteke Objavljeno v DKUM: 23.12.2024; Ogledov: 0; Prenosov: 28
Celotno besedilo (2,04 MB) |
9. |
10. Zasnova senzorskega omrežja za zaznavanje aktivnosti dnevnega življenja : magistrsko deloJan Cokan, 2024, magistrsko delo Opis: V magistrskem delu smo opisali čedalje večjo problematiko glede ohranjanja zdravja in kakovosti življenja starejših oseb ter predstavili potrebo po zaznavanju aktivnosti dnevnega življenja. V ta namen smo zasnovali dva različna sistema senzorskih omrežij. K obema sistemoma smo dodali podatkovno bazo na oddaljenem računalniku za zajemanje podatkov. Eden od sistemov senzorskega omrežja deluje na komunikaciji RF, medtem ko drugi na komunikaciji WiFi. Njuno delovanje smo preverili v testnem okolju. Ker sta omrežji napajani z baterijo, smo opravili meritve porabe toka ter napajalne napetosti. Rezultate smo predstavili v zaključku magistrskega dela. Ključne besede: senzorsko omrežje, aktivnosti dnevnega življenja, modul RF, modul WiFi, protokol MQTT Objavljeno v DKUM: 01.03.2024; Ogledov: 363; Prenosov: 40
Celotno besedilo (3,33 MB) |