Strojno učenje in obdelava naravnega jezika za pripravo analize sentimenta na spletu

Jerin, Matija

| | SLO | ENG | Cookies and privacy

Bigger font | Smaller font

First page > Show document

Show document

Title:	Strojno učenje in obdelava naravnega jezika za pripravo analize sentimenta na spletu
Authors:	ID Jerin, Matija (Author) ID Kljajić Borštnar, Mirjana (Mentor) More about this mentor...
Files:	MAG_Jerin_Matija_2025.pdf (4,85 MB) MD5: 966733DD4F7499945AEA26F3CADA8F72
Language:	Slovenian
Work type:	Master's thesis/paper
Typology:	2.09 - Master's Thesis
Organization:	FOV - Faculty of Organizational Sciences in Kranj
Abstract:	Magistrsko delo obravnava področje analize sentimenta z uporabo strojnega učenja in obdelave naravnega jezika. Namen dela je razviti model, ki bi bil sposoben analizirati sentiment besedil, pridobljenih s spletnih platform, zlasti z družbenega omrežja X (prej Twitter). V delu smo uporabili različne metode strojnega učenja in obdelave naravnega jezika. Najprej smo podatke pridobili iz odprtih virov, jih očistili in normalizirali z metodami, kot sta lemmatizacija in tokenizacija. Pri obdelavi podatkov smo uporabili več tehnik, vključno z Bag of Words, s pozitivno/z negativno frekvenco in s TF-IDF, za kar smo uporabili Python knjižnice, kot sta NLTK in scikit-learn. Model smo učili z metodo logistične regresije, naivnega bayesa in z metodo podpornih vektorjev ter testirali njihovo natančnost s pomočjo ločenih testnih podatkov. Rezultati kažejo, da je logistična regresija v kombinaciji z značilkami TF-IDF dosegla najvišjo natančnost pri predvidevanju sentimenta, in sicer 88,65 %, kar pomeni, da je model sposoben zanesljivo prepoznati sentiment besedil kot pozitiven ali negativen. Kljub uspehu modela obstaja potencial za nadaljnje izboljšave. Uporaba večjih in bolj raznovrstnih podatkovnih zbirk ter naprednejših tehnik globokega učenja, kot so nevronske mreže (LSTM ali BERT), bi lahko še povečala natančnost in zmogljivost modela. Zaključki magistrskega dela potrjujejo, da je analiza sentimenta z uporabo strojnega učenja izvedljiva in uporabna v različnih okoljih. V prihodnje priporočamo implementacijo API-ja za omrežje X, kar bi omogočilo sprotno pridobivanje podatkov in avtomatizirano analizo sentimenta v realnem času. Prav tako bi lahko nadgradnja modela z globokim učenjem pripomogla k obvladovanju kompleksnejših jezikovnih struktur in kontekstov, kot sta sarkazem in večpomenskost.
Keywords:	analiza sentimenta, strojno učenje, obdelava naravnega jezika
Place of publishing:	Maribor
Year of publishing:	2025
PID:	20.500.12556/DKUM-91477
COBISS.SI-ID:	231923203
Publication date in DKUM:	09.04.2025
Views:	0
Downloads:	6
Metadata:
Categories:	FOV
:	JERIN, Matija, 2025, Strojno učenje in obdelava naravnega jezika za pripravo analize sentimenta na spletu [online]. Master’s thesis. Maribor. [Accessed 23 April 2025]. Retrieved from: https://dk.um.si/IzpisGradiva.php?lang=eng&id=91477 Copy citation

Average score:	0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 (0 votes)
Your score:	Voting is allowed only for logged in users.
Share:

Similar works from our repository:

Similar works from other repositories:

Hover the mouse pointer over a document title to show the abstract or click on the title to get all document metadata.

Licences

License:	CC BY-NC-ND 4.0, Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International

Link:	http://creativecommons.org/licenses/by-nc-nd/4.0/
Description:	The most restrictive Creative Commons license. This only allows people to download and share the work for no commercial gain and for no other purposes.
Licensing start date:	06.01.2025

Secondary language

Language:	English
Title:	Machine learning and natural language processing for sentiment analysis on the internet
Abstract:	The master's thesis addresses the field of sentiment analysis using machine learning and natural language processing. The aim of the thesis is to develop a model capable of analyzing the sentiment of texts obtained from online platforms, particularly from the social network X (formerly Twitter). In the thesis, various machine learning and natural language processing methods were applied. First, data were obtained from open sources, cleaned, and normalized using methods such as lemmatization and tokenization. Several techniques were employed for data processing, including Bag of Words, positive/negative frequency, and TF-IDF, using Python libraries such as NLTK and scikit-learn. The model was trained using logistic regression, naive bayes and support vector machine and then we tested their accuracy with separate test data. The results show that logistic regression in combination with the TF-IDF method achieved the highest accuracy in sentiment prediction, reaching 88.65%, meaning the model is capable of reliably identifying the sentiment of texts as positive or negative. Despite the model's success, there is potential for further improvements. Using larger and more diverse datasets, along with advanced deep learning techniques such as neural networks (LSTM or BERT), could further enhance the model's accuracy and performance. The conclusions of the master's thesis confirm that sentiment analysis using machine learning is feasible and applicable in various environments. In the future, the implementation of the X network API is recommended, which would allow real-time data acquisition and automated sentiment analysis. Additionally, upgrading the model with deep learning could help manage more complex linguistic structures and contexts, such as sarcasm and ambiguity.
Keywords:	Sentiment analysis, Machine learning, Natural language processing

Comments

Leave comment

You must log in to leave a comment.

Comments (0)

0 - 0 / 0

There are no comments!

Back