| | SLO | ENG | Cookies and privacy

Bigger font | Smaller font

Show document Help

Title:UPORABA NOSQL PODATKOVNIH BAZ ZA GENERIRANJE POROČIL DETEKTORJA PLAGIATOV
Authors:ID Dietner, Mario (Author)
ID Ojsteršek, Milan (Mentor) More about this mentor... New window
Files:.pdf UNI_Dietner_Mario_2012.pdf (2,78 MB)
MD5: 63694AFB7A00C4FEDE821DA3402CE43F
PID: 20.500.12556/dkum/db51f7f6-101a-4740-a262-bb457f7637f3
 
Language:Slovenian
Work type:Undergraduate thesis
Typology:2.11 - Undergraduate Thesis
Organization:FERI - Faculty of Electrical Engineering and Computer Science
Abstract:V diplomskem delu smo predstavili koncept podatkovnih baz NOSQL. Našteli smo osnovne skupine, jih opisali in našteli njihove pripadnike. Podrobneje smo se posvetili opisu podatkovnega sistema Apache Cassandra, njegovim lastnostim, delovanju in arhitekturi. Posebej so nas zanimale splošne lastnosti Cassandre, kot so elastičnost, razpoložljivost in ujemljivost podatkov ter zanesljivost in zmogljivost sistema v realnem okolju. V praktičnem delu diplomskega dela smo razvili sistem, ki uporablja Appache Cassandra za generiranje poročil o vsebinski podobnosti med dokumenti. Iskanje plagiatov je časovno zahteven proces, saj z naraščanjem števila dokumentov narašča tudi časovna zahtevnost iskanja. Cassandrina elastičnost in podatkovni model sta idealna rešitev za takšno iskanje. Za izvedbo preizkusa smo uporabili obstoječo bazo izvlečkov (angl. hash), ki smo jih pridobili iz dokumentov Digitalne knjižnice Univerze v Mariboru. To smo prenesli v Cassandrino gručo desetih strežnikov. Čase generiranja poročil smo primerjali s časi, ki smo jih dobili pri generiranju poročil iz podatkovne baze MS SQL. Pokazali smo, da Appache Cassandra 2,2-krat hitreje generira poročila in je neobčutljiva za izpade strežnikov.
Keywords:podatkovne baze NOSQL, Appache Casandra, teorem CAP, detekcija plagiatov
Place of publishing:Maribor
Publisher:[M. Dietner]
Year of publishing:2012
PID:20.500.12556/DKUM-36499 New window
UDC:004.65(043.2)
COBISS.SI-ID:16235030 New window
NUK URN:URN:SI:UM:DK:C09O9IT5
Publication date in DKUM:15.06.2012
Views:3061
Downloads:276
Metadata:XML DC-XML DC-RDF
Categories:KTFMB - FERI
:
Copy citation
  
Average score:(0 votes)
Your score:Voting is allowed only for logged in users.
Share:Bookmark and Share


Hover the mouse pointer over a document title to show the abstract or click on the title to get all document metadata.

Secondary language

Language:English
Title:USAGE OF NOSQL DATABASES FOR PLAGIAT DETECTOR REPORT GENERATION
Abstract:In this diploma thesis we have introduced the concept of NOSQL databases. We described some of the NOSQL groups and represented their members. Next chapters are dedicated to the detailed description of Apache Cassandra, its attributes, architecture and how it works in general. We were especially interested in general attributes of Cassandra, like elastic scaling, availability and data consistency, reliability and performance with a real work load. Further, we developed a system which uses Apache Cassandra to generate reports on content similarity of documents. The search of plagiarism is a time consuming process and for each document added, time complexity grows. The elasticity and the data model of Apache Cassandra are an ideal solution for this problem. For the realization of our performance test, we used an existing database of hash values from the Digital library of the University of Maribor. We transferred those values to our cluster of ten servers and compared the measured report generation times from Apache Cassandra cluster to the measured report generation times from the MS SQL database. We have shown, that Apache Cassandra is generating reports 2,2 times faster and that it is insensitive for server failures.
Keywords:NSQL databases, Appache Casandra, CAP theorem, plagiat detection


Comments

Leave comment

You must log in to leave a comment.

Comments (0)
0 - 0 / 0
 
There are no comments!

Back
Logos of partners University of Maribor University of Ljubljana University of Primorska University of Nova Gorica