| | SLO | ENG | Cookies and privacy

Bigger font | Smaller font

Show document

Title:NADZOROVANO ODKRIVANJE PREDMETA TEKSTOVNIH VSEBIN Z UPORABO SELEKCIJSKIH IN STATISTIČNIH METOD
Authors:Hrnčić, Sašo (Author)
Kosar, Tomaž (Mentor) More about this mentor... New window
Podgorelec, Vili (Co-mentor)
Files:.pdf VS_Hrncic_Saso_2016.pdf (2,31 MB)
 
Language:Slovenian
Work type:Undergraduate thesis (m5)
Typology:2.11 - Undergraduate Thesis
Organization:FERI - Faculty of Electrical Engineering and Computer Science
Abstract:Cilj diplomske naloge je izdelati preprost kategorizacijski sistem, ki zna nov tekstovni dokument čim natančneje uvrstiti v naprej definirane kategorije. Ena izmed funkcionalnosti sistema je prepoznavanje jezika, ki je bilo testirano na podatkovnih korpusih dokumentov Wikipedije, Europarla in jezikovnih modelov projekta LibTextCat. Kategorizacijski sistem je bil razširjen še na prepoznavanje v naprej definiranih tematikah korpusa 20 Newsgroups in Reuters-21578. Za predstavitev dokumentov smo uporabili n-gramsko tehniko, ki smo jo kombinirali s selekcijskimi in statističnimi metodami. Dosežene rezultate smo analizirali ter dokumentirali. Podrobneje smo predstavili problematiko, lastne izkušnje, lastnosti uporabljenih metod ter obstoječe raziskave.
Keywords:tekstovno kategoriziranje, n-grami, strojno učenje, teorija informacij, odmik od najpomembnejšega elementa
Year of publishing:2016
Publisher:S. Hrnčić
Source:[Maribor
UDC:004.05:004.5(043.2)
COBISS_ID:19991318 Link is opened in a new window
NUK URN:URN:SI:UM:DK:GT5AAOMU
Views:387
Downloads:31
Metadata:XML RDF-CHPDL DC-XML DC-RDF
Categories:KTFMB - FERI
:
  
Average score:(0 votes)
Your score:Voting is allowed only for logged in users.
Share:AddThis
AddThis uses cookies that require your consent. Edit consent...

Hover the mouse pointer over a document title to show the abstract or click on the title to get all document metadata.

Secondary language

Language:English
Title:SUPERVISED TOPICS' DETECTION BASED ON FEATURE SELECTION AND STATISTICAL METHODS
Abstract:The main goal of diploma work is to develop simple text classification system that is able to automatically classify a document into predefined categories as accurately as possible. One of the functionalities of the system is language detection that has been tested on documents of Wikipedia, Europarl and language models of project LibTextCat. Classification system has been expanded to identify predefine topics of the corpus 20 Newsgroups and Reuters-21578. For document presentation we used n-grams technique, which was combined with feature selection methods and statistical methods. The obtained results were analyzed and documented. We also present text classification problem, our experiences, features of used methods and some existing research.
Keywords:text classification, n-grams, machine learning, information theory, out of place


Comments

Leave comment

You have to log in to leave a comment.

Comments (0)
0 - 0 / 0
 
There are no comments!

Back
Logos of partners University of Maribor University of Ljubljana University of Primorska University of Nova Gorica