| | SLO | ENG | Cookies and privacy

Bigger font | Smaller font

Show document

Title:IZDELAVA PROGRAMSKEGA PAKETA ZA PRIDOBIVANJE IN PRIMERJANJE BESEDIL IZ INTERNETA
Authors:Petek, Matej (Author)
Ojsteršek, Milan (Mentor) More about this mentor... New window
Files:.pdf UNI_Petek_Matej_2012.pdf (2,57 MB)
 
Language:Slovenian
Work type:Undergraduate thesis (m5)
Typology:2.11 - Undergraduate Thesis
Organization:FERI - Faculty of Electrical Engineering and Computer Science
Abstract:V diplomskem delu smo obdelali področje pridobivanja dokumentov iz različnih virov na internetu. V teoretičnem delu smo predstavili, kako delujejo spletni pajki, strežniki OAI-PMH in OAI-ORE ter orodje Appache Tika, ki omogoča pretvorbo različnih vrst dokumentov v tekstovno obliko in ekstrakcijo metapodatkov. Zatem smo na kratko predstavili naloge, ki jih rešujejo sistemi za obdelavo besedil v naravnem jeziku. V praktičnem delu smo izdelali programski paket za pridobivanje dokumentov iz interneta in primerjanje besedil iz teh dokumentov.
Keywords:procesiranje naravnega jezika, metapodatki, spletni pajki, OAI-PMH, detekcija plagiatov
Year of publishing:2012
Publisher:[M. Petek]
Source:Maribor
UDC:004.774.6(043.2)
COBISS_ID:16168470 Link is opened in a new window
NUK URN:URN:SI:UM:DK:HAIAZ9MM
Views:2550
Downloads:189
Metadata:XML RDF-CHPDL DC-XML DC-RDF
Categories:KTFMB - FERI
:
  
Average score:(0 votes)
Your score:Voting is allowed only for logged in users.
Share:AddThis
AddThis uses cookies that require your consent. Edit consent...

Hover the mouse pointer over a document title to show the abstract or click on the title to get all document metadata.

Secondary language

Language:English
Title:CREATING A SOFTWARE PACKAGE FOR ACQUIRING AND COMPARING OF TEXTS FROM THE INTERNET
Abstract:In my diploma work we dealt with the area of acquiring documents from various sources on the internet. In the theoretical part we represented the work of web crowlers, servers OAI-PMH and OAI-ORE and Appache Tika tool which enables conversion of various kinds of documents into text form and extraction of metadata. Secondly, we shortly represented the tasks which are solved by systems for processing of natural language texts. In the practical part we built a software package for acquiring and comparing documents from the internet.
Keywords:natural language processing, metadata, web spider, OAI-PMH, plagiat detection


Comments

Leave comment

You have to log in to leave a comment.

Comments (0)
0 - 0 / 0
 
There are no comments!

Back
Logos of partners University of Maribor University of Ljubljana University of Primorska University of Nova Gorica