| | SLO | ENG | Cookies and privacy

Bigger font | Smaller font

Show document

Title:Vpliv priprave nestrukturiranih podatkov na klasifikacijo : magistrsko delo
Authors:Pečnik, Špela (Author)
Podgorelec, Vili (Mentor) More about this mentor... New window
Files:.pdf MAG_Pecnik_Spela_2019.pdf (1,49 MB)
 
Language:Slovenian
Work type:Master's thesis/paper (mb22)
Typology:2.09 - Master's Thesis
Organization:FERI - Faculty of Electrical Engineering and Computer Science
Abstract:V vsakdanjem življenju se v večini primerov srečujemo z nestrukturiranimi podatki v obliki besedil iz različnih virov. Število teh iz dneva v dan narašča, zato obstaja vse večja potreba po njihovi organizaciji in kategorizaciji. Pri teh podatkih je najpomembnejša njihova predpriprava na uporabo v algoritmih strojnega učenja. Za ustrezno pripravo besedila lahko uporabimo različne metode/tehnike predprocesiranja – besedilo pretvorimo v male črke, iz njega odstranimo stop-besede, nad posameznimi besedami uporabimo krnjenje, lematizacijo, besede sestavljamo v fraze različnih dolžin (uni-grame, bi-grame, tri-grame) ali pa jih na primer pretvorimo v vektorsko obliko (ang. word embedding). S pomočjo laboratorijskega eksperimenta smo ugotovili, da nekatere tehnike predobdelave bolj vplivajo na uspešnost klasifikacije kot druge, poleg tega pa ima velik vpliv na uspešnost klasifikacije sam jezik in količina besedila, ter klasifikator, ki ga uporabimo za strojno učenje.
Keywords:nestrukturirani podatki, klasifikacija besedil, vektorska predstavitev besedil, krnjenje, lematizacija
Year of publishing:2019
Place of performance:Maribor
Publisher:Š. Pečnik
Number of pages:VIII, 74 str.
Source:Maribor
UDC:004.94:004.83(043.2)
URN:URN:SI:UM:DK:MLJBF77J
COBISS_ID:22489366 Link is opened in a new window
NUK URN:URN:SI:UM:DK:MLJBF77J
License:CC BY-NC-ND 4.0
This work is available under this license: Creative Commons Attribution Non-Commercial No Derivatives 4.0 International
Views:409
Downloads:102
Metadata:XML RDF-CHPDL DC-XML DC-RDF
Categories:KTFMB - FERI
:
  
Average score:(0 votes)
Your score:Voting is allowed only for logged in users.
Share:AddThis
AddThis uses cookies that require your consent. Edit consent...

Hover the mouse pointer over a document title to show the abstract or click on the title to get all document metadata.

Secondary language

Language:English
Title:The impact of preprocessing on the classification of unstructured data
Abstract:In everyday life, in most cases we encounter unstructured data in the form of texts from different sources. The number of these is growing every day, so there is an increasing need for their organization and categorization. For these data, the most important part is their pre-preparation for use in machine learning algorithms. Various methods/techniques of pre-processing can be used for the proper preparation of the text - we can convert the text into lower case letters, remove the stop-words from it, use stemming, lemmatization, compose words in phrases of different lengths (unigrams, bigrams, trigrams), or convert them into word embedding. With the help of a laboratory experiment, we found out that some pre-preparation techniques have a greater impact on the performance of the classification than others, and in addition, the language and quantity of the text, as well as the classifier used for machine learning, have a great influence on the success of the classification.
Keywords:unstructured data, text classification, word embedding, stemming, lemmatization


Comments

Leave comment

You have to log in to leave a comment.

Comments (0)
0 - 0 / 0
 
There are no comments!

Back
Logos of partners University of Maribor University of Ljubljana University of Primorska University of Nova Gorica