1. Comparative study of modern differential evolution algorithms : perspectives on mechanisms and performanceJanez Brest, Mirjam Sepesy Maučec, 2025, izvirni znanstveni članek Opis: Since the discovery of the Differential Evolution algorithm, new and improved versions have continuously emerged. In this paper, we review selected algorithms based on Differential Evolution that have been proposed in recent years. We examine the mechanisms integrated into them and compare the performance of algorithms. To compare their performances, statistical comparisons were used as they enable us to draw reliable conclusions about the algorithms’ performances. We use the Wilcoxon signed-rank test for pairwise comparisons and the Friedman test for multiple comparisons. Subsequently, the Mann–Whitney U-score test was added. We conducted not only a cumulative analysis of algorithms, but we also focused on their performances regarding the function family (i.e., unimodal, multimodal, hybrid, and composition functions). Experimental results of algorithms were obtained on problems defined for the CEC’24 Special Session and Competition on Single Objective Real Parameter Numerical Optimization. Problem dimensions of 10, 30, 50, and 100 were analyzed. In this paper, we highlight promising mechanisms for further development and improvements based on the study of the selected algorithms. Ključne besede: global optimization, differential evolution, benchmark suite, mechanisms, statistical tests, performance Objavljeno v DKUM: 19.05.2025; Ogledov: 0; Prenosov: 3
Celotno besedilo (313,93 KB) |
2. Vpliv napak razpoznavalnika govora na kakovost strojnih prevodov v sistemih prevajanja govora v govor : magistrsko deloKlemen Stanič, 2025, magistrsko delo Opis: Magistrsko delo analizira vpliv napak avtomatskega razpoznavalnika govora na kakovost strojnih prevodov v sistemih prevajanja govora v govor. S prevajalnikom smo prevedli več izhodov razpoznavalnika govora, ki so se med seboj razlikovali v kvaliteti razpoznavanja, kot referenčne prevode pa smo vzeli prevod nabora povedi, ki je bil uporabljen na vhodu v razpoznavalnik. Tipi napak, ki smo jih obravnavali, so bili: vstavljanje, brisanje in zamenjava besede. V nadaljevanju smo jih še podrobneje razdelali glede na besedne vrste in obseg spremembe. Vpliv napak na kakovost prevodov smo ocenjevali z metriko BLEU. Ugotovili smo, da določene vrste napak bolj vplivajo na kakovost prevoda kakor druge. Ključne besede: razpoznavanje govora, strojno prevajanje, napaka razpoznavalnika, nevronske mreže, BLEU Objavljeno v DKUM: 31.03.2025; Ogledov: 0; Prenosov: 27
Celotno besedilo (2,35 MB) |
3. On the use of morpho-syntactic description tags in neural machine translation with small and large training corporaGregor Donaj, Mirjam Sepesy Maučec, 2022, izvirni znanstveni članek Opis: With the transition to neural architectures, machine translation achieves very good quality for several resource-rich languages. However, the results are still much worse for languages
with complex morphology, especially if they are low-resource languages. This paper reports the
results of a systematic analysis of adding morphological information into neural machine translation
system training. Translation systems presented and compared in this research exploit morphological
information from corpora in different formats. Some formats join semantic and grammatical information and others separate these two types of information. Semantic information is modeled using
lemmas and grammatical information using Morpho-Syntactic Description (MSD) tags. Experiments
were performed on corpora of different sizes for the English–Slovene language pair. The conclusions
were drawn for a domain-specific translation system and for a translation system for the general
domain. With MSD tags, we improved the performance by up to 1.40 and 1.68 BLEU points in the
two translation directions. We found that systems with training corpora in different formats improve
the performance differently depending on the translation direction and corpora size. Ključne besede: neural machine translation, POS tags, MSD tags, inflected language, data sparsity, corpora size Objavljeno v DKUM: 28.03.2025; Ogledov: 0; Prenosov: 6
Celotno besedilo (448,16 KB) Gradivo ima več datotek! Več... |
4. Strategies for managing time and costs in speech corpus creation : insights from the Slovenian ARTUR corpusDarinka Verdonik, Andreja Bizjak, Andrej Žgank, Mirjam Sepesy Maučec, Mitja Trojar, Jerneja Žganec Gros, Marko Bajec, Iztok Lebar Bajec, Simon Dobrišek, 2024, izvirni znanstveni članek Opis: Parliamentary debates represent an essential part of democratic discourse and provide insights into various socio-demographic and linguistic phenomena - parliamentary corpora, which contain transcripts of parliamentary debates and extensive metadata, are an important resource for parliamentary discourse analysis and other research areas. This paper presents the Slovenian parliamentary corpus siParl, the latest version of which contains transcripts of plenary sessions and other legislative bodies of the Assembly of the Republic of Slovenia from 1990 to 2022, comprising more than 1 million speeches and 210 million words. We outline the development history of the corpus and also mention other initiatives that have been influenced by siParl (such as the Parla-CLARIN encoding and the ParlaMint corpora of European parliaments), present the corpus creation process, ranging from the initial data collection to the structural development and encoding of the corpus, and given the growing influence of the ParlaMint corpora, compare siParl with the Slovenian ParlaMint-SI corpus. Finally, we discuss updates for the next version as well as the long-term development and enrichment of the siParl corpus. Ključne besede: recording speech, transcribing speech, transcription guidelines, Less-resourced language Objavljeno v DKUM: 04.02.2025; Ogledov: 0; Prenosov: 9
Celotno besedilo (1,09 MB) Gradivo ima več datotek! Več... |
5. Sequence-to-Sequence models and their evaluation for spoken language normalization of SlovenianMirjam Sepesy Maučec, Darinka Verdonik, Gregor Donaj, 2024, izvirni znanstveni članek Ključne besede: low-resource language, applications, spoken language, normalization, character unit, subword unit, statistical model, long short-term memory, transformer, error analysis Objavljeno v DKUM: 31.01.2025; Ogledov: 0; Prenosov: 6
Celotno besedilo (437,99 KB) |
6. Uporaba evolucijskih algoritmov v statističnem in hibridnem strojnem prevajanju : doctoral dissertationJani Dugonik, 2025, doktorska disertacija Opis: Doktorska disertacija obravnava področje strojnega prevajanja visoko fleksibilnih jezikov, osredotoča pa se na izzive tako statističnega kot nevronskega strojnega prevajanja, ki jih prinašajo strukturne razlike med visoko fleksibilnimi jeziki in angleščino. Naša raziskava vključuje tudi eksperimentalni del, izveden na jezikovnem paru \mbox{slovenščina--angleščina}, ki zajema prevajanje v obe smeri. V prvem eksperimentu smo načrtovali nov pristop za optimizacijo parametrov v statističnem strojnem prevajanju z uporabo evolucijskih algoritmov. Primerjali smo sisteme statističnega strojnega prevajanja, optimizirane s klasičnimi algoritmi za optimizacijo uteži v statističnem strojnem prevajanju, in sisteme, optimizirane z evolucijskimi algoritmi. V drugem eksperimentu pa smo načrtovali in razvili hibridni pristop, ki vključuje sisteme statističnega in nevronskega strojnega prevajanja. Izvorno poved in dva ciljna prevoda, prevedena z obema sistemoma, smo pretvorili v isti vektorski prostor, iz katerega smo nato pridobili vektorje značilk. V okviru doktorske disertacije smo pred\-lagali nov nabor značilk. Z uporabo klasifikatorjev smo nato izbrali boljšega izmed dveh prevodov, statističnega in nevronskega. Evalvacijo sistemov strojnega prevajanja smo izvedli z uporabo uveljavljenih metrik, kot so BLEU, TER, chrF in COMET. Opravili smo statistično analizo eksperimentalnih rezultatov s ponovnim vzorčenjem, ki je pokazala statistično pomembne razlike v kakovosti ustvarjenih prevodov. Eksperimentalni rezultati potrjujejo, da smo s predlaganimi pristopi izboljšali kakovost strojnih prevodov. Ključne besede: evolucijski algoritem, statistično strojno prevajanje, nevronsko strojno prevajanje, hibridni pristop strojnega prevajanja, optimizacija, predstavitev besed, klasifikacija, obratno prevajanje Objavljeno v DKUM: 29.01.2025; Ogledov: 0; Prenosov: 54
Celotno besedilo (1,21 MB) |
7. Analiza algoritmov stiskanja na primeru tekstovnih datotek v različnih jezikihKlemen Arzenšek, 2024, magistrsko delo Opis: Magistrsko delo obravnava različne algoritme stiskanja tekstovnih datotek in analizira, ali jezik, v katerem je zapisana vhodna datoteka, vpliva na uspešnost stiskanja z izbranimi algoritmi. Preučeni in predstavljeni bodo izbrani algoritmi stiskanja, ugotovljene prednosti uporabe izbranih algoritmov stiskanja tekstovnih datotek, določene entropije analiziranih jezikov na ravni znakov, izvedeni praktični testi izbranih algoritmov stiskanja tekstovnih datotek s testnimi vzorci različnih jezikov, analizirano in ugotovljeno, ali jezik v izbranih testnih vzorcih vpliva na uspešnost posameznih algoritmov stiskanja tekstovnih datotek. Delo bo iskalo povezave med entropijo jezika in uspešnostjo stiskanja. Na koncu bo na primeru Huffmanovega algoritma, ki kodira posamezne znake, preverjeno, ali kodiranje daljših nizov izboljša učinkovitost kodiranja. Ključne besede: naravni jezik, entropija jezika, algoritmi stiskanja, algoritem LZW, tekstovne datoteke Objavljeno v DKUM: 23.12.2024; Ogledov: 0; Prenosov: 21
Celotno besedilo (2,04 MB) |
8. |
9. Zasnova senzorskega omrežja za zaznavanje aktivnosti dnevnega življenja : magistrsko deloJan Cokan, 2024, magistrsko delo Opis: V magistrskem delu smo opisali čedalje večjo problematiko glede ohranjanja zdravja in kakovosti življenja starejših oseb ter predstavili potrebo po zaznavanju aktivnosti dnevnega življenja. V ta namen smo zasnovali dva različna sistema senzorskih omrežij. K obema sistemoma smo dodali podatkovno bazo na oddaljenem računalniku za zajemanje podatkov. Eden od sistemov senzorskega omrežja deluje na komunikaciji RF, medtem ko drugi na komunikaciji WiFi. Njuno delovanje smo preverili v testnem okolju. Ker sta omrežji napajani z baterijo, smo opravili meritve porabe toka ter napajalne napetosti. Rezultate smo predstavili v zaključku magistrskega dela. Ključne besede: senzorsko omrežje, aktivnosti dnevnega življenja, modul RF, modul WiFi, protokol MQTT Objavljeno v DKUM: 01.03.2024; Ogledov: 363; Prenosov: 38
Celotno besedilo (3,33 MB) |
10. Reduction of Neural Machine Translation Failures by Incorporating Statistical Machine TranslationJani Dugonik, Mirjam Sepesy Maučec, Domen Verber, Janez Brest, 2023, izvirni znanstveni članek Opis: This paper proposes a hybrid machine translation (HMT) system that improves the quality of neural machine translation (NMT) by incorporating statistical machine translation (SMT). Therefore, two NMT systems and two SMT systems were built for the Slovenian-English language pair, each for translation in one direction. We used a multilingual language model to embed the source sentence and translations into the same vector space. From each vector, we extracted features based on the distances and similarities calculated between the source sentence and the NMT translation, and between the source sentence and the SMT translation. To select the best possible translation, we used several well-known classifiers to predict which translation system generated a better translation of the source sentence. The proposed method of combining SMT and NMT in the hybrid system is novel. Our framework is language-independent and can be applied to other languages supported by the multilingual language model. Our experiment involved empirical applications. We compared the performance of the classifiers, and the results demonstrate that our proposed HMT system achieved notable improvements in the BLEU score, with an increase of 1.5 points and 10.9 points for both translation directions, respectively. Ključne besede: neural machine translation, statistical machine translation, sentence embedding, similarity, classification, hybrid machine translation Objavljeno v DKUM: 20.02.2024; Ogledov: 322; Prenosov: 36
Celotno besedilo (400,40 KB) Gradivo ima več datotek! Več... |