An algorithm for protecting knowledge discovery dataBoštjan Brumen
, Izidor Golob
, Tatjana Welzer-Družovec
, Ivan Rozman
, Marjan Družovec
, Hannu Jaakkola
, 2003, izvirni znanstveni članek
Opis: In the paper, we present an algorithm that can be applied to protect data before a data mining process takes place. The data mining, a part of the knowledge discovery process, is mainly about building models from data. We address the following question: can we protect the data and still allow the data modelling process to take place? We consider the case where the distributions of original data values are preserved while the values themselves change, so that the resulting model is equivalent to the one built with original data. The presented formal approach is especially useful when the knowledge discovery process is outsourced. The application of the algorithm is demonstrated through an example.
Ključne besede: data protection algorithm, classification algorithm, disclosure control, data mining, knowledge discovery, data security
Objavljeno: 01.06.2012; Ogledov: 1449; Prenosov: 35
Povezava na celotno besedilo
Contrasting temporal trend discovery for large healthcare databasesGoran Hrovat
, Gregor Štiglic
, Peter Kokol
, Milan Ojsteršek
, izvirni znanstveni članek
Opis: With the increased acceptance of electronic health records, we can observe theincreasing interest in the application of data mining approaches within this field. This study introduces a novel approach for exploring and comparingtemporal trends within different in-patient subgroups, which is basedon associated rule mining using Apriori algorithm and linear model-based recursive partitioning. The Nationwide Inpatient Sample (NIS), Healthcare Costand Utilization Project (HCUP), Agency for Healthcare Research and Qualitywas used to evaluate the proposed approach. This study presents a novelapproach where visual analytics on big data is used for trend discovery in form of a regression tree with scatter plots in the leaves of the tree. Thetrend lines are used for directly comparing linear trends within a specified time frame. Our results demonstrate the existence of opposite trendsin relation to age and sex based subgroups that would be impossible to discover using traditional trend-tracking techniques. Such an approach can be employed regarding decision support applications for policy makers when organizing campaigns or by hospital management for observing trends that cannot be directly discovered using traditional analytical techniques.
Ključne besede: data mining, decision support, trend discovery
Objavljeno: 27.11.2014; Ogledov: 1174; Prenosov: 363
Celotno besedilo (1013,97 KB)
Gradivo ima več datotek! Več...
Algorithms for association rule learningRenata Akhmetshakirova
, 2017, diplomsko delo
Opis: One of the most popular methods of knowledge discovery in databases is the extraction of association rules. There are many different algorithms for association rule learning , which differ in space and time complexity. To perform a comparative analysis, we have implemented Apriori, Eclat and FP-growth algorithms and compared their time and memory consumption using synthetic and real databases. The analysis has shown that the FP-growth algorithm is the most efficient in the majority of cases.
Ključne besede: association rules, data mining, Apriori, Eclat, FP-growth
Objavljeno: 24.02.2017; Ogledov: 1374; Prenosov: 86
Celotno besedilo (1,17 MB)
Analyzing information seeking and drug-safety alert response by health care professionals as ew methods for surveillanceAlison Callahan
, Igor Pernek
, Gregor Štiglic
, Jurij Leskovec
, Howard Strasberg
, Nigam Haresh Shah
, 2015, izvirni znanstveni članek
Opis: Background: Patterns in general consumer online search logs have been used to monitor health conditions and to predict health-related activities, but the multiple contexts within which consumers perform online searches make significant associations difficult to interpret. Physician information-seeking behavior has typically been analyzed through survey-based approaches and literature reviews. Activity logs from health care professionals using online medical information resources are thus a valuable yet relatively untapped resource for large-scale medical surveillance.
Objective: To analyze health care professionals% information-seeking behavior and assess the feasibility of measuring drug-safety alert response from the usage logs of an online medical information resource.
Methods: Using two years (2011-2012) of usage logs from UpToDate, we measured the volume of searches related to medical conditions with significant burden in the United States, as well as the seasonal distribution of those searches. We quantified the relationship between searches and resulting page views. Using a large collection of online mainstream media articles and Web log posts we also characterized the uptake of a Food and Drug Administration (FDA) alert via changes in UpToDate search activity compared with general online media activity related to the subject of the alert.
Results: Diseases and symptoms dominate UpToDate searches. Some searches result in page views of only short duration, while others consistently result in longer-than-average page views. The response to an FDA alert for Celexa, characterized by a change in UpToDate search activity, differed considerably from general online media activity. Changes in search activity appeared later and persisted longer in UpToDate logs. The volume of searches and page view durations related to Celexa before the alert also differed from those after the alert.
Conclusions: Understanding the information-seeking behavior associated with online evidence sources can offer insight into the information needs of health professionals and enable large-scale medical surveillance. Our Web log mining approach has the potential to monitor responses to FDA alerts at a national level. Our findings can also inform the design and content of evidence-based medical information resources such as UpToDate
Ključne besede: internet log analysis, data mining, physicians, information-seeking behavior, drug safety surveillance
Objavljeno: 02.08.2017; Ogledov: 768; Prenosov: 95
Celotno besedilo (4,18 MB)
Gradivo ima več datotek! Več...
Link prediction on TwitterSanda Martinčić-Ipšić
, Edvin Močibob
, Matjaž Perc
, 2017, izvirni znanstveni članek
Opis: With over 300 million active users, Twitter is among the largest online news and social networking services in existence today. Open access to information on Twitter makes it a valuable source of data for research on social interactions, sentiment analysis, content diffusion, link prediction, and the dynamics behind human collective behaviour in general. Here we use Twitter data to construct co-occurrence language networks based on hashtags and based on all the words in tweets, and we use these networks to study link prediction by means of different methods and evaluation metrics. In addition to using five known methods, we propose two effective weighted similarity measures, and we compare the obtained outcomes in dependence on the selected semantic context of topics on Twitter. We find that hashtag networks yield to a large degree equal results as all-word networks, thus supporting the claim that hashtags alone robustly capture the semantic context of tweets, and as such are useful and suitable for studying the content and categorization. We also introduce ranking diagrams as an efficient tool for the comparison of the performance of different link prediction algorithms across multiple datasets. Our research indicates that successful link prediction algorithms work well in correctly foretelling highly probable links even if the information about a network structure is incomplete, and they do so even if the semantic context is rationalized to hashtags.
Ključne besede: link prediction, data mining, Twitter, network analysis
Objavljeno: 15.09.2017; Ogledov: 683; Prenosov: 70
Celotno besedilo (6,98 MB)
Gradivo ima več datotek! Več...