| | SLO | ENG | Cookies and privacy

Bigger font | Smaller font

Show document

Title:Primerjava algoritmov za določanje sopojavnosti besed v besedilih : diplomsko delo
Authors:Pal, Klemen (Author)
Holobar, Aleš (Mentor) More about this mentor... New window
Ojsteršek, Milan (Co-mentor)
Ferme, Marko (Co-mentor)
Files:.pdf UN_Pal_Klemen_2020.pdf (686,66 KB)
MD5: B8D3935BC525C158107EC5314D24BE95
 
Language:Slovenian
Work type:Bachelor thesis/paper (mb11)
Typology:2.11 - Undergraduate Thesis
Organization:FERI - Faculty of Electrical Engineering and Computer Science
Abstract:Glavna tema diplomske naloge je raziskovanje in primerjava nekaterih najbolj razširjenih algoritmov za določanje sopojavnosti besed v besedilih. Teoretično so razloženi pojavi kolokacij, njihova osnova in statistično ozadje. Nato so opisani trije najpogostejši algoritmi, ki slonijo na različnih pristopih: T-test, Pearsonov hi-kvadrat in algoritem PMI. Ti opisi so podprti s primeri izračuna vrednosti algoritmov. Praktični del vsebuje implementacijo predobdelave besedila in iskanja statističnih podatkov, sledi pa uporaba algoritmov nad temi podatki. Za konec je podana še primerjava teh algoritmov na osnovi dobljenih rezultatov.
Keywords:algoritem, sopojavnost besed, besedilo, primerjava
Year of publishing:2020
Place of performance:Maribor
Publisher:[K. Pal]
Number of pages:VI, 43 f.
Source:Maribor
UDC:004.021:004.912(043.2)
COBISS_ID:38465027 New window
NUK URN:URN:SI:UM:DK:J8Y1WVSF
Views:97
Downloads:13
Metadata:XML RDF-CHPDL DC-XML DC-RDF
Categories:KTFMB - FERI
:
  
Average score:(0 votes)
Your score:Voting is allowed only for logged in users.
Share:AddThis
AddThis uses cookies that require your consent. Edit consent...

Hover the mouse pointer over a document title to show the abstract or click on the title to get all document metadata.

Licences

License:CC BY-NC-ND 4.0, Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
Link:http://creativecommons.org/licenses/by-nc-nd/4.0/
Description:The most restrictive Creative Commons license. This only allows people to download and share the work for no commercial gain and for no other purposes.
Licensing start date:10.09.2020

Secondary language

Language:English
Title:Comparison of word collocation algorithms
Abstract:The main subject of this diploma thesis is research and comparison of some of the most common word collocation algorithms. The theoretical part starts off with basic explanation of collocations and statistics. After that each of the three most common algorithms are described: T-test, Pearsons Chi-Squared test and PMI algorithm. Each of them is supported with a calculation example. The practical part consists of text preprocessing and statistical analysis, followed by implementation of these algorithms. The last part of the thesis provides the comparison of the results of these algorithms.
Keywords:algorithm, word collocation, text, comparison


Comments

Leave comment

You have to log in to leave a comment.

Comments (0)
0 - 0 / 0
 
There are no comments!

Back
Logos of partners University of Maribor University of Ljubljana University of Primorska University of Nova Gorica