1. Machine translation of independent nominal phrases in technical textsSimon Zupan, Zmago Pavličič, Melanija Fabčič, 2025, original scientific article Abstract: This paper deals with machine translations of independent noun phrases in technical texts, which are not part of any sentence structure but function on their own, typically in tables and illustrations. Such nominal structures are common in technical texts because they allow technical writers to increase lexical density and precision in expression. On the other hand, these phrases pose a challenge for machine translation engines, as their meaning depends on the context. Independent noun phrases from a service manual, which were translated from English into Slovene by two different machine translators (DeepL and Google Translate), are considered in this paper. Their comparison with the original showed some limitations of machine translation engines in translating noun phrases, since approximately half of them showed a noticeable change in meaning.Prispevek obravnava strojne prevode samostojnih samostalniških besednih zvez v tehničnih besedilih, ki niso del stavčnih struktur, temveč se pojavljajo zunaj konteksta, najpogosteje v preglednicah in grafičnih prikazih. Tovrstne besedne zveze se pogosto pojavljajo v tehničnih besedilih, saj piscem omogočajo večjo leksikalno gostoto in konciznost pri izražanju. Po drugi strani predstavljajo izziv za strojne prevajalnike, saj je njihov pomen odvisen od sobesedila. V prispevku so obravnavane samostoječe samostalniške besedne zveze iz servisnega priročnika, ki so bile iz angleščine v slovenščino prevedene z dvema različnima strojnima prevajalnikoma (DeepL in Google Translate). Njihova primerjava z izvirnikom je pokazala nekatere omejitve strojnih prevajalnikov pri prevajanju samostalniških besednih zvez, saj se je pri približno polovici besednih zvez opazno spremenil njihov pomen. Keywords: technical texts, machine translation, nominal phrases, translation shifts, technical translation Published in DKUM: 08.07.2025; Views: 0; Downloads: 7
Full text (1,14 MB) This document has many files! More... |
2. Weakly-supervised multilingual medical NER for symptom extraction for low-resource languagesRigon Sallauka, Umut Arioz, Matej Rojc, Izidor Mlakar, 2025, original scientific article Abstract: Patient-reported health data, especially patient-reported outcomes measures, are vital for improving clinical care but are often limited by memory bias, cognitive load, and inflexible questionnaires. Patients prefer conversational symptom reporting, highlighting the need for robust methods in symptom extraction and conversational intelligence. This study presents a weakly-supervised pipeline for training and evaluating medical Named Entity Recognition (NER) models across eight languages, with a focus on low-resource settings. A merged English medical corpus, annotated using the Stanza i2b2 model, was translated into German, Greek, Spanish, Italian, Portuguese, Polish, and Slovenian, preserving the entity annotations medical problems, diagnostic tests, and treatments. Data augmentation addressed the class imbalance, and the fine-tuned BERT-based models outperformed baselines consistently. The English model achieved the highest F1 score (80.07%), followed by German (78.70%), Spanish (77.61%), Portuguese (77.21%), Slovenian (75.72%), Italian (75.60%), Polish (75.56%), and Greek (69.10%). Compared to the existing baselines, our models demonstrated notable performance gains, particularly in English, Spanish, and Italian. This research underscores the feasibility and effectiveness of weakly-supervised multilingual approaches for medical entity extraction, contributing to improved information access in clinical narratives—especially in under-resourced languages. Keywords: low-resource languages, machine translation, medical entity extraction, NER, NLP, patient-reported outcomes, weakly-supervised learning Published in DKUM: 19.05.2025; Views: 0; Downloads: 4
Full text (338,94 KB) |
3. On the use of morpho-syntactic description tags in neural machine translation with small and large training corporaGregor Donaj, Mirjam Sepesy Maučec, 2022, original scientific article Abstract: With the transition to neural architectures, machine translation achieves very good quality for several resource-rich languages. However, the results are still much worse for languages
with complex morphology, especially if they are low-resource languages. This paper reports the
results of a systematic analysis of adding morphological information into neural machine translation
system training. Translation systems presented and compared in this research exploit morphological
information from corpora in different formats. Some formats join semantic and grammatical information and others separate these two types of information. Semantic information is modeled using
lemmas and grammatical information using Morpho-Syntactic Description (MSD) tags. Experiments
were performed on corpora of different sizes for the English–Slovene language pair. The conclusions
were drawn for a domain-specific translation system and for a translation system for the general
domain. With MSD tags, we improved the performance by up to 1.40 and 1.68 BLEU points in the
two translation directions. We found that systems with training corpora in different formats improve
the performance differently depending on the translation direction and corpora size. Keywords: neural machine translation, POS tags, MSD tags, inflected language, data sparsity, corpora size Published in DKUM: 28.03.2025; Views: 0; Downloads: 11
Full text (448,16 KB) This document has many files! More... |
4. Reduction of Neural Machine Translation Failures by Incorporating Statistical Machine TranslationJani Dugonik, Mirjam Sepesy Maučec, Domen Verber, Janez Brest, 2023, original scientific article Abstract: This paper proposes a hybrid machine translation (HMT) system that improves the quality of neural machine translation (NMT) by incorporating statistical machine translation (SMT). Therefore, two NMT systems and two SMT systems were built for the Slovenian-English language pair, each for translation in one direction. We used a multilingual language model to embed the source sentence and translations into the same vector space. From each vector, we extracted features based on the distances and similarities calculated between the source sentence and the NMT translation, and between the source sentence and the SMT translation. To select the best possible translation, we used several well-known classifiers to predict which translation system generated a better translation of the source sentence. The proposed method of combining SMT and NMT in the hybrid system is novel. Our framework is language-independent and can be applied to other languages supported by the multilingual language model. Our experiment involved empirical applications. We compared the performance of the classifiers, and the results demonstrate that our proposed HMT system achieved notable improvements in the BLEU score, with an increase of 1.5 points and 10.9 points for both translation directions, respectively. Keywords: neural machine translation, statistical machine translation, sentence embedding, similarity, classification, hybrid machine translation Published in DKUM: 20.02.2024; Views: 322; Downloads: 40
Full text (400,40 KB) This document has many files! More... |
5. Applicability and challenges of using machine translation in translator trainingMelita Koletnik, 2011, professional article Abstract: During the last decade, translation as well as translator training have experienced a significant change. This change has been significantly influenced by the development of the Internet and the successive availability of web-based translation resources, such as Google Translate. Their introduction into the translation didactic process and training is no longer a matter of a teacher’s personal preference and IT skills, but a necessity imposed by the ever-swifter advancement of technology. This article presents the experimental results of an ongoing broader research study focusing on the modes and frequency of use of the Internet, Google Translate and Google Translator Toolkit among translation students at the undergraduate level. The preliminary results, presented in this article, are based on a questionnaire which was prepared in relation to the use of Google Translate while considering the latest professional findings. The article concludes with the author’s observations as to the applicability of these resources in translator training and the challenges thereof. Keywords: machine translation, teaching methodology, internet, Google Translate, machine translation systems, translator training, translation didactics, Internet, Google Translate Published in DKUM: 12.05.2017; Views: 2030; Downloads: 268
Full text (269,02 KB) This document has many files! More... |
6. |