NADOMEŠČANJE MANJKAJOČIH VREDNOSTI S POMOČJO ROTACIJSKEGA REGRESIJSKEGA GOZDA

Palfy, Miroslav

| | SLO | ENG | Cookies and privacy

Bigger font | Smaller font

First page > Show document

Show document

Title:	NADOMEŠČANJE MANJKAJOČIH VREDNOSTI S POMOČJO ROTACIJSKEGA REGRESIJSKEGA GOZDA
Authors:	ID Palfy, Miroslav (Author) ID Kokol, Peter (Mentor) More about this mentor... ID Zorman, Milan (Comentor)
Files:	DR_Palfy_Miroslav_2009.pdf (5,63 MB) MD5: 20A3BDA8D38A37AC77F873348F19D2A8 PID: 20.500.12556/dkum/a9109e16-6935-4770-bc9b-3e099cd14940
Language:	Slovenian
Work type:	Dissertation
Organization:	FERI - Faculty of Electrical Engineering and Computer Science
Abstract:	Manjkajoče vrednosti predstavljajo pogosto težavo, ki spremlja ustvarjanje podatkovnih baz, bodisi če se podatki zbirajo s pomočjo anket bodisi če so pridobljeni iz načrtovanih eksperimentov. Ne glede na to, koliko truda je vloženo za zagotavljanje popolne izpolnjenosti vprašalnikov ali v skrbno načrtovanje znanstvenega poskusa, se manjkajočim vrednostim pogosto ni možno izogniti. Nepopolni podatki so, odvisno od razmerja v katerem se pojavljajo manjkajoče vrednosti, lahko neustrezni za nadaljnjo analizo, medtem ko je brisanje vzorcev z manjkajočimi vrednostmi, posebno ko njihov odstotek ni dovolj majhen in ti vzorci predstavljajo pomembne informacije, lahko zelo neustrezno. Za reševanje tega problema se tako na področju statistične analize uporabljajo različne metode za nadomeščanje manjkajočih vrednosti. Z namenom zapolnitve vrzeli, ki obstaja med obstoječimi metodami enkratnega vstavljanja manjkajočih vrednosti in modeli, ki temeljijo na večkratnem vstavljanju in pri katerih je za vsak cikel vstavljanja potrebna ločena statistična analiza, smo v okviru disertacije razvili nov postopek nadomeščanja manjkajočih vrednosti, ki temelji na ansambelskem pristopu nadzorovanega strojnega učenja. Uporabili smo ansambel, imenovan rotacijski regresijski gozd, ki predstavlja varianto rotacijskega gozda (Rotation forest), kot so ga razvili RodrÃguez, Kuncheva in Alonso (RodrÃguez, Kuncheva, & Alonso, 2006), pri katerem smo namesto osnovne metode, namenjene reševanju klasifikacijskih problemov, uporabili modelno regresijsko drevo. Našo metodo za nadomeščanje manjkajočih vrednosti smo primerjali z 9 drugimi popularnimi metodami, pri čemer smo merili natančnost metod in njihovo sposobnost ohranjanja variance po vstavljanju različnih deležev manjkajočih vrednosti. Meritve smo izvedli na 14 javno dostopnih podatkovnih množicah in eni umetno ustvarjeni množici, tako da smo obravnavali vse mehanizme nastanka manjkajočih vrednosti, kot jih je definiral Rubin (Rubin, 1976). Na podlagi poizkusov smo ugotovili, da naša metoda v povprečju natančneje napoveduje manjkajoče vrednosti v izbranih podatkovnih množicah, ne glede na mehanizem nastanka manjkajočih vrednosti. Prav tako smo pokazali, da z uvedbo dodatne stohastične metode za ohranjanje variance naš rotacijski regresijski gozd bolje ohranja varianco od vseh preostalih metod, ki izvajajo enkratno vstavljanje, pri čemer po svoji natančnosti še vedno prekaša vse metode. V disertaciji smo v uvodnih, teoretičnih poglavjih podrobneje opisali problematiko manjkajočih vrednosti ter obstoječe metode, ki se najpogosteje uporabljajo za njihovo nadomeščanje. Predstavili smo rotacijski regresijski gozd in stohastično metodo za ohranjanje variance. Največjo pozornost smo posvetili rezultatom poizkusov, na podlagi katerih smo v zaključku izoblikovali priporočila za uporabo rotacijskega regresijskega gozda za nadomeščanje manjkajočih vrednosti ter predstavili izhodišča za nadaljnje delo.
Keywords:	strojno učenje, rotacijski gozd, nadomeščanje manjkajočih vrednosti, regresijsko drevo, ansambel regresorjev
Place of publishing:	Maribor
Publisher:	[M. Palfy]
Year of publishing:	2009
PID:	20.500.12556/DKUM-12750
UDC:	004.89:004.9(043.3)
COBISS.SI-ID:	13737238
NUK URN:	URN:SI:UM:DK:W7KUO4I8
Publication date in DKUM:	21.12.2009
Views:	3316
Downloads:	317
Metadata:
Categories:	KTFMB - FERI
:	PALFY, Miroslav, 2009, NADOMEŠČANJE MANJKAJOČIH VREDNOSTI S POMOČJO ROTACIJSKEGA REGRESIJSKEGA GOZDA [online]. Doctoral dissertation. Maribor : M. Palfy. [Accessed 22 April 2025]. Retrieved from: https://dk.um.si/IzpisGradiva.php?lang=eng&id=12750 Copy citation

Average score:	0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 (0 votes)
Your score:	Voting is allowed only for logged in users.
Share:

Similar works from our repository:

Complex interdependency of microstructure, mechanical properties, fatigue resistance, and residual stress of austenitic stainless steels AISI 304L
Influence of deep cryogenic treatment on natural and artificial aging of Al-Mg-Si alloy EN AW 6026
The effect of heat treatment on the interface of 155 PH martensitic stainless steel and SAF 2507 duplex steel in functionally graded AM components
Influence of the deep cryogenic treatment on AISI 52100 and AISI D3 steelʼs corrosion resistance
Altering tribological properties of tools steel through deep cryo-genic treatment utilization

Similar works from other repositories:

No similar works found

Hover the mouse pointer over a document title to show the abstract or click on the title to get all document metadata.

Secondary language

Language:	English
Title:	Missing values imputation using a rotation regression forest
Abstract:	Missing values represent a common problem, plaguing many databases; either based on surveys and questionnaires or designed experiments. No matter how carefully the surveys are taken, or how well the experiments are designed, missing values can occur. Incomplete data can, depending on the amount of missing values, be unsuitable for further statistical analysis, while case deletion, especially when dealing with considerable amounts of missing values, can be very inappropriate. Therefore different methods were developed which can be used to impute missing data. The main goal of this dissertation was to develop a new imputation method, which would narrow the gap between single-impute methods and multiple-imputation models, which require standard statistical analysis to be carried out on multiple imputed data sets. For this purpose we used an ensemble-based approach to supervised machine learning. We relied on a variation of rotation forest ensemble, developed by Rodríguez, Kuncheva and Alonso (Rodríguez, Kuncheva, & Alonso, 2006) which we named “rotation regression forest”, since we used a model regression tree as a base method instead of a method used for classification purposes. We selected 9 other popular imputation methods for comparison with our ensemble where we measured their accuracy as well as their ability to preserve the variance structure within data when dealing with different amounts of missing values. Measurements were carried out on 14 different public access datasets and one artificial dataset to account for each of the three missingness mechanisms, as described by Rubin (Rubin, 1976). Based on results of these tests we concluded that, on average, our method is the most accurate among the selected methods, no matter which misingness mechanism is responsible for missing values. When an additional stochastic method for preservation of variance was used, our rotation regression forest was able to preserve the variance structure within data better than any other single-impute method, while still besting them all in accuracy. The introductory, more theoretical chapters of this dissertation deal with supervised machine learning, missing values and commonly used imputation methods. Rotation regression forest ensemble was introduced, as well as our stochastic method for preservation of variance. The bulk of our work is focused on results, gained through empirical experiments, which were used to model our recommendations concerning the use of rotation regression forest ensemble for imputation of missing values and to form starting points for possible future work.
Keywords:	machine learning, rotation forest, missing value imputation, regression tree, ensemble of regressors

Comments

Leave comment

You must log in to leave a comment.

Comments (0)

0 - 0 / 0

There are no comments!

Back