Your browser does not allow JavaScript!
JavaScript is necessary for the proper functioning of this website. Please enable JavaScript or use a modern browser.
|
|
SLO
|
ENG
|
Cookies and privacy
DKUM
EPF - Faculty of Business and Economics
FE - Faculty of Energy Technology
FERI - Faculty of Electrical Engineering and Computer Science
FF - Faculty of Arts
FGPA - Faculty of Civil Engineering, Transportation Engineering and Architecture
FKBV - Faculty of Agriculture and Life Sciences
FKKT - Faculty of Chemistry and Chemical Engineering
FL - Faculty of Logistic
FNM - Faculty of Natural Sciences and Mathematics
FOV - Faculty of Organizational Sciences in Kranj
FS - Faculty of Mechanical Engineering
FT - Faculty of Tourism
FVV - Faculty of Criminal Justice and Security
FZV - Faculty of Health Sciences
MF - Faculty of Medicine
PEF - Faculty of Education
PF - Faculty of Law
UKM - University of Maribor Library
UM - University of Maribor
UZUM - University of Maribor Press
COBISS
Faculty of Business and Economic, Maribor
Faculty of Agriculture and Life Sciences, Maribor
Faculty of Logistics, Celje, Krško
Faculty of Organizational Sciences, Kranj
Faculty of Criminal Justice and Security, Ljubljana
Faculty of Health Sciences
Library of Technical Faculties, Maribor
Faculty of Medicine, Maribor
Miklošič Library FPNM, Maribor
Faculty of Law, Maribor
University of Maribor Library
Bigger font
|
Smaller font
Introduction
Search
Browsing
Upload document
For students
For employees
Statistics
Login
First page
>
Show document
Show document
Title:
Implementacija avtomatiziranega pristopa k analizi podatkov DNA sekvenciranja
Authors:
ID
Bjelić, Dragana
(Author)
ID
Gorenjak, Mario
(Mentor)
More about this mentor...
ID
Potočnik, Uroš
(Comentor)
Files:
MAG_Bjelic_Dragana_2020.pdf
(7,38 MB)
MD5: F7C69F387B0A161D4E4FA63330F2646C
PID:
20.500.12556/dkum/421a6de3-c2d5-442c-8537-93a4674aeada
Language:
Slovenian
Work type:
Master's thesis/paper
Typology:
2.09 - Master's Thesis
Organization:
FZV - Faculty of Health Sciences
Abstract:
Uvod: Z razvojem tehnologije sekvenciranja DNA in naraščanjem podatkov se povečuje tudi potreba po kvalitetni analizi in interpretaciji podatkov. Prav tako sta pomembna hitrost in zanesljivost klasificiranja posameznikov za določen genotip. Pri metodi sekvenciranja naslednje generacije (NGS) to klasificiranje temelji na klicanju različic, ki je sklepanje, da na določenem mestu obstaja razlika v nukleotidu v primerjavi z referenčnim nukleotidnim zaporedjem. Surovi podatki pridobljeni z NGS analizo so podani v datoteki VCF (ang. variant call format), kjer je v tabeli potencialnih različic oziroma kandidatnih genotipov v spremenljivki Filter pogosto uporabljena oznaka PASS za različice oziroma genotipe za katere je klasifikator nevronske mreže podal višjo verjetnost nereferenčnega klica genotipa kot za referenco, tj. zanesljiv klic različice. V magistrskem delu želimo s primerjavo števila klicanih različic in PASS različic med obstoječim in nadgrajenim pristopom pokazati pomembnost posodobitev programskih orodij. Metode: V empiričnem delu smo implementirali avtomatiziran pristop k analizi podatkov DNA sekvenciranja, ki je nadgradnja obstoječega protokola analize, ki je na razpolago na aparatu Illumina Miseq. V našem nadgrajenem protokolu smo namesto modula GATK Variant Caller iz različice v1.6. obstoječega orodja na aparatu Illumina MiSeq uporabili modul Haplotype Caller pridobljenega iz programskega paketa GATK v3.8. Haplotype Caller je natančnejši, saj zavrne podatke o poravnavi okoli položaja, kjer se sumi na različico in ponovno prebere odčitke v tej regiji. Prav tako smo nadgradili algoritem poravnave nukleotidnih zaporedij iz različice 0.7.9 v obstoječem protokolu na 0.7.12, ki nam z nadgradnjo omogoča HLA tipizacijo. Protokol smo nadgradili tudi s predhodnim obrezovanjem tehničnih nukleotidnih zaporedij. Na koncu smo analizo števila klicanih različic in PASS različic med obema pristopoma ovrednotili v programskem okolju R z Wilcoxon-ovim statističnim testom. Rezultati: Rezultati Wilcoxon-ovega testa so pokazali močno statistično značilno razliko med odkritim številom klicanih različic in PASS različic med nadgrajenim in obstoječim pristopom, pri čemer je nadgrajen pristop v povprečju odkril 26-krat več klicanih različic in 33 krat več PASS različic, od tega 5 pozitivnih PASS različic pomembnih za diagnozo od 12, kar pomeni 41,7 %. Diskusija: Ugotovili smo, da je nadgrajen tekoči trak ukazov za analizo nukleotidnega zaporedja DNA učinkovitejši, saj odkrije več klicanih in PASS različic.
Keywords:
NGS
,
bioinformatika
,
sekvenciranje
,
Illumina MiSeq
Place of publishing:
Maribor
Publisher:
[D. Bjelić]
Year of publishing:
2020
PID:
20.500.12556/DKUM-77387
UDC:
575.112(043.2)
COBISS.SI-ID:
28695299
NUK URN:
URN:SI:UM:DK:XMKPJVKG
Publication date in DKUM:
21.09.2020
Views:
1484
Downloads:
263
Metadata:
Categories:
FZV
Cite this work
Plain text
BibTeX
EndNote XML
EndNote/Refer
RIS
ABNT
ACM Ref
AMA
APA
Chicago 17th Author-Date
Harvard
IEEE
ISO 690
MLA
Vancouver
:
BJELIĆ, Dragana, 2020,
Implementacija avtomatiziranega pristopa k analizi podatkov DNA sekvenciranja
[online]. Master’s thesis. Maribor : D. Bjelić. [Accessed 27 March 2025]. Retrieved from: https://dk.um.si/IzpisGradiva.php?lang=eng&id=77387
Copy citation
Average score:
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
(0 votes)
Your score:
Voting is allowed only for
logged in
users.
Share:
Similar works from our repository:
Več plati Schengena
Meja z Republiko Avstrijo
Policijska postaja za izravnalne ukrepe
Izravnalni ukrepi v okviru schengenskega sporazuma
Organizacija protokolarnega dogodka
Similar works from other repositories:
Presejanje za raka materničnega vratu s testom HPV
Pomen HPV pri skriningu (presejanju) raka materničnega vratu
Kolposkopija zgodnjega raka materničnega vratu
Diagnostika in zdravljenje raka materničnega vratu
Uporaba triažnega testa HPV v programu ZORA v letu 2012
Hover the mouse pointer over a document title to show the abstract or click on the title to get all document metadata.
Licences
License:
CC BY-NC-ND 4.0, Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
Link:
http://creativecommons.org/licenses/by-nc-nd/4.0/
Description:
The most restrictive Creative Commons license. This only allows people to download and share the work for no commercial gain and for no other purposes.
Licensing start date:
28.08.2020
Secondary language
Language:
English
Title:
Implementation of an automatized approach to DNA sequencing data analysis
Abstract:
Introduction: With the development of DNA sequencing technology and the growth of data, the need for high quality analysis and interpretation of data is also increasing. The speed and reliability of classifying individuals for a particular genotype are also important. In the next-generation sequencing (NGS) method, this classification is based on variant calling, which is the inference that there is a difference in the nucleotide at a particular site compared to the reference nucleotide sequence. The PASS mark is often used for variants for which the neutral network classifier gave a higher probability of a non-reference cariant call than a reference, i.e. reliable variant call. Raw data, obtained by NGS analysis, is given in the VCF (variant call format) file, where the table of potential variants or candidate genotypes in the Filter variable often uses the PASS mark. The aim of this thesis is to show the importance of software tool updates by comparing the number of called variants and PASS variants between the existing and the upgraded approach. Methods: In the empirical part, we implemented an automated approach to DNA sequencing data analysis, which is an upgrade of the existing analysis protocol, available in the Illumina Miseq apparatus. In our upgraded protocol, instead of the GATK VariantCaller module from version v1.6 of the existing tool on the Illumina MiSeq device, we used HaplotypeCaller module obtained from the GATK v3.8 software package. HaplotypeCaller is more accurate, as it discards the alignment information around a position where it suspects a variant and is doing local re-assembly with those reads. We also upgraded the nucleotide sequence alignment algorithm from version 0.7.9 to 0.7.12, which allows us HLA typing by upgrading. The protocol was also upgraded by pre-trimming of the technical nucleotide sequences. Finally, the analysis of the number of called variants and the PASS variants between the two approaches was evaluated in the R software environment using the Wilcoxon statistical test. Results: The results of the Wilcoxon test showed a strong statistically significant difference between the detected number of called variants and the PASS variants between the upgraded and the existing approach, with the upgraded approach detecting an average of 26-fold more called variants and 33-fold more PASS variants. Out of 12 variants relevant for diagnosis, 5 positive PASS variants were missed by existing protocol (41.7 %), but not by our improved protocol. Conclusion: We came to the conclusion that the upgraded pipeline for DNA sequence analysis is more efficient as it detects more called and PASS variants.
Keywords:
NGS
,
bioinformatics
,
sequencing
,
Illumina MiSeq
Comments
Leave comment
You must
log in
to leave a comment.
Comments (0)
0 - 0 / 0
There are no comments!
Back