| | SLO | ENG | Cookies and privacy

Bigger font | Smaller font

Show document Help

Title:IZDELAVA PROGRAMSKEGA PAKETA ZA PRIDOBIVANJE IN PRIMERJANJE BESEDIL IZ INTERNETA
Authors:ID Petek, Matej (Author)
ID Ojsteršek, Milan (Mentor) More about this mentor... New window
Files:.pdf UNI_Petek_Matej_2012.pdf (2,57 MB)
MD5: EFF8D4807FE6C586BA832553EC91C774
PID: 20.500.12556/dkum/bd644252-09b2-4f25-87b6-1220190add48
 
Language:Slovenian
Work type:Undergraduate thesis
Typology:2.11 - Undergraduate Thesis
Organization:FERI - Faculty of Electrical Engineering and Computer Science
Abstract:V diplomskem delu smo obdelali področje pridobivanja dokumentov iz različnih virov na internetu. V teoretičnem delu smo predstavili, kako delujejo spletni pajki, strežniki OAI-PMH in OAI-ORE ter orodje Appache Tika, ki omogoča pretvorbo različnih vrst dokumentov v tekstovno obliko in ekstrakcijo metapodatkov. Zatem smo na kratko predstavili naloge, ki jih rešujejo sistemi za obdelavo besedil v naravnem jeziku. V praktičnem delu smo izdelali programski paket za pridobivanje dokumentov iz interneta in primerjanje besedil iz teh dokumentov.
Keywords:procesiranje naravnega jezika, metapodatki, spletni pajki, OAI-PMH, detekcija plagiatov
Place of publishing:Maribor
Publisher:[M. Petek]
Year of publishing:2012
PID:20.500.12556/DKUM-22262 New window
UDC:004.774.6(043.2)
COBISS.SI-ID:16168470 New window
NUK URN:URN:SI:UM:DK:HAIAZ9MM
Publication date in DKUM:14.03.2012
Views:3419
Downloads:264
Metadata:XML DC-XML DC-RDF
Categories:KTFMB - FERI
:
Copy citation
  
Average score:(0 votes)
Your score:Voting is allowed only for logged in users.
Share:Bookmark and Share


Hover the mouse pointer over a document title to show the abstract or click on the title to get all document metadata.

Secondary language

Language:English
Title:CREATING A SOFTWARE PACKAGE FOR ACQUIRING AND COMPARING OF TEXTS FROM THE INTERNET
Abstract:In my diploma work we dealt with the area of acquiring documents from various sources on the internet. In the theoretical part we represented the work of web crowlers, servers OAI-PMH and OAI-ORE and Appache Tika tool which enables conversion of various kinds of documents into text form and extraction of metadata. Secondly, we shortly represented the tasks which are solved by systems for processing of natural language texts. In the practical part we built a software package for acquiring and comparing documents from the internet.
Keywords:natural language processing, metadata, web spider, OAI-PMH, plagiat detection


Comments

Leave comment

You must log in to leave a comment.

Comments (0)
0 - 0 / 0
 
There are no comments!

Back
Logos of partners University of Maribor University of Ljubljana University of Primorska University of Nova Gorica