1. Using machine learning and natural language processing for unveiling similarities between microbial dataLucija Brezočnik, Tanja Žlender, Maja Rupnik, Vili Podgorelec, 2024, original scientific article Abstract: Microbiota analysis can provide valuable insights in various fields, including diet and nutrition, understanding health and disease, and in environmental contexts, such as understanding the role of microorganisms in different ecosystems. Based on the results, we can provide targeted therapies, personalized medicine, or detect environmental contaminants. In our research, we examined the gut microbiota of 16 animal taxa, including humans, as well as the microbiota of cattle and pig manure, where we focused on 16S rRNA V3-V4 hypervariable regions. Analyzing these regions is common in microbiome studies but can be challenging since the results are high-dimensional. Thus, we utilized machine learning techniques and demonstrated their applicability in processing microbial sequence data. Moreover, we showed that techniques commonly employed in natural language processing can be adapted for analyzing microbial text vectors. We obtained the latter through frequency analyses and utilized the proposed hierarchical clustering method over them. All steps in this study were gathered in a proposed microbial sequence data processing pipeline. The results demonstrate that we not only found similarities between samples but also sorted groups’ samples into semantically related clusters. We also tested our method against other known algorithms like the Kmeans and Spectral Clustering algorithms using clustering evaluation metrics. The results demonstrate the superiority of the proposed method over them. Moreover, the proposed microbial sequence data pipeline can be utilized for different types of microbiota, such as oral, gut, and skin, demonstrating its reusability and robustness. Keywords: machine learning, NLP, hierarchical clustering, microbial data, microbiome, n-grame Published in DKUM: 04.09.2024; Views: 38; Downloads: 9
Full text (4,48 MB) |
2. Optimal bus stops' allocation : a school bus routing problem with respect to terrain elevationKlemen Prah, Abolfazl Keshavarzsaleh, Tomaž Kramberger, Borut Jereb, Dejan Dragan, 2018, original scientific article Abstract: The paper addresses the optimal bus stops allocation in the Laško municipality. The goal is to achieve a cost reduction by proper re-designing of a mandatory pupils' transportation to their schools. The proposed heuristic optimization algorithm relies on data clustering and Monte Carlo simulation. The number of bus stops should be minimal possible that still assure a maximal service area, while keeping the minimal walking distances children have to go from their homes to the nearest bus stop. The working mechanism of the proposed algorithm is explained. The latter is driven by three-dimensional GIS data to take into account as much realistic dynamic properties of terrain as possible. The results show that the proposed algorithm achieves an optimal solution with only 37 optimal bus stops covering 94.6 % of all treated pupils despite the diversity and wideness of municipality, as well as the problematic characteristics of terrains' elevation. The calculated bus stops will represent important guidelines to their actual physical implementation. Keywords: logistics, maximal covering problems, optimization, data clustering, Monte Carlo simulation, geographic information system (GIS), reduction of transportation costs, Laško, Slovenia Published in DKUM: 22.08.2024; Views: 35; Downloads: 13
Full text (2,40 MB) This document has many files! More... |
3. Categorisation of open government data literatureAljaž Ferencek, Mirjana Kljajić Borštnar, Ajda Pretnar Žagar, 2022, review article Abstract: Background: Due to the emerging global interest in Open Government Data, research papers on various topics in this area have increased.
Objectives: This paper aims to categorise Open government data research.
Methods/Approach: A literature review was conducted to provide a complete overview and classification of open government data research. Hierarchical clustering, a cluster analysis method, was used, and a hierarchy of clusters on selected data sets emerged.
Results: The results of this study suggest that there are two distinct clusters of research, which either focus on government perspectives and policies on OGD, initiatives, and portals or focus on regional studies, adoption of OGD, platforms, and barriers to implementation. Further findings suggest that research gaps could be segmented into many thematic areas, focusing on success factors, best practices, the impact of open government data, barriers/challenges in implementing open government data, etc.
Conclusions: The extension of the paper, which was first presented at the Entrenova conference, provides a comprehensive overview of research to date on the implementation of OGD and points out that this topic has already received research attention, which focuses on specific segments of the phenomenon and signifies in which direction new research should be made. Keywords: open government data, open government data research, hierarchical clustering, OGD classification, OGD literature overview Published in DKUM: 12.06.2024; Views: 134; Downloads: 12
Full text (539,06 KB) This document has many files! More... |
4. Robust clustering of languages across Wikipedia growthKristina Ban, Matjaž Perc, Zoran Levnajić, 2017, original scientific article Abstract: Wikipedia is the largest existing knowledge repository that is growing on a genuine crowdsourcing support. While the English Wikipedia is the most extensive and the most researched one with over 5 million articles, comparatively little is known about the behaviour and growth of the remaining 283 smaller Wikipedias, the smallest of which, Afar, has only one article. Here, we use a subset of these data, consisting of 14 962 different articles, each of which exists in 26 different languages, from Arabic to Ukrainian. We study the growth of Wikipedias in these languages over a time span of 15 years. We show that, while an average article follows a random path from one language to another, there exist six well-defined clusters of Wikipedias that share common growth patterns. The make-up of these clusters is remarkably robust against the method used for their determination, as we verify via four different clustering methods. Interestingly, the identified Wikipedia clusters have little correlation with language families and groups. Rather, the growth of Wikipedia across different languages is governed by different factors, ranging from similarities in culture to information literacy. Keywords: Wikipedia, language, growth dynamics, data analysis, clustering Published in DKUM: 13.11.2017; Views: 1533; Downloads: 397
Full text (1004,06 KB) This document has many files! More... |