| | SLO | ENG | Cookies and privacy

Bigger font | Smaller font

Show document Help

Title:Strategies for managing time and costs in speech corpus creation : insights from the Slovenian ARTUR corpus
Authors:ID Verdonik, Darinka (Author)
ID Bizjak, Andreja (Author)
ID Žgank, Andrej (Author)
ID Sepesy Maučec, Mirjam (Author)
ID Trojar, Mitja (Author)
ID Žganec Gros, Jerneja (Author)
ID Bajec, Marko (Author)
ID Lebar Bajec, Iztok (Author)
ID Dobrišek, Simon (Author)
Files:URL https://link.springer.com/article/10.1007/s10579-024-09746-8#article-info
 
.pdf s10579-024-09792-2.pdf (1,09 MB)
MD5: 8463E09A330AA66A7C51EE1D380851F5
 
Language:English
Work type:Article
Typology:1.01 - Original Scientific Article
Organization:FERI - Faculty of Electrical Engineering and Computer Science
Abstract:Parliamentary debates represent an essential part of democratic discourse and provide insights into various socio-demographic and linguistic phenomena - parliamentary corpora, which contain transcripts of parliamentary debates and extensive metadata, are an important resource for parliamentary discourse analysis and other research areas. This paper presents the Slovenian parliamentary corpus siParl, the latest version of which contains transcripts of plenary sessions and other legislative bodies of the Assembly of the Republic of Slovenia from 1990 to 2022, comprising more than 1 million speeches and 210 million words. We outline the development history of the corpus and also mention other initiatives that have been influenced by siParl (such as the Parla-CLARIN encoding and the ParlaMint corpora of European parliaments), present the corpus creation process, ranging from the initial data collection to the structural development and encoding of the corpus, and given the growing influence of the ParlaMint corpora, compare siParl with the Slovenian ParlaMint-SI corpus. Finally, we discuss updates for the next version as well as the long-term development and enrichment of the siParl corpus.
Keywords:recording speech, transcribing speech, transcription guidelines, Less-resourced language
Publication status:Published
Publication version:Version of Record
Article acceptance date:30.10.2024
Publication date:30.11.2024
Publisher:Springer
Year of publishing:2024
Number of pages:26 str.
PID:20.500.12556/DKUM-91778 New window
UDC:004.9
ISSN on article:1574-0218
COBISS.SI-ID:217959427 New window
DOI:10.1007/s10579-024-09792-2 New window
Copyright:© The Author(s) 2024
Publication date in DKUM:04.02.2025
Views:0
Downloads:8
Metadata:XML DC-XML DC-RDF
Categories:Misc.
:
VERDONIK, Darinka, BIZJAK, Andreja, ŽGANK, Andrej, SEPESY MAUČEC, Mirjam, TROJAR, Mitja, ŽGANEC GROS, Jerneja, BAJEC, Marko, LEBAR BAJEC, Iztok and DOBRIŠEK, Simon, 2024, Strategies for managing time and costs in speech corpus creation : insights from the Slovenian ARTUR corpus. Language resources and evaluation [online]. 2024. [Accessed 14 March 2025]. DOI 10.1007/s10579-024-09792-2. Retrieved from: https://dk.um.si/IzpisGradiva.php?lang=eng&id=91778
Copy citation
  
Average score:
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
(0 votes)
Your score:Voting is allowed only for logged in users.
Share:Bookmark and Share


Hover the mouse pointer over a document title to show the abstract or click on the title to get all document metadata.

Record is a part of a journal

Title:Language resources and evaluation
Publisher:Kluwer, Springer
ISSN:1574-0218
COBISS.SI-ID:516101145 New window

Document is financed by a project

Funder:ARIS - Slovenian Research and Innovation Agency
Project number:J7-4642-2022
Name:Temeljne raziskave za razvoj govornih virov in tehnologij za slovenščino

Funder:ARIS - Slovenian Research and Innovation Agency
Project number:P2-0069-2018
Name:Napredne metode interakcij v telekomunikacijah

Licences

License:CC BY-NC-ND 4.0, Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
Link:http://creativecommons.org/licenses/by-nc-nd/4.0/
Description:The most restrictive Creative Commons license. This only allows people to download and share the work for no commercial gain and for no other purposes.

Secondary language

Language:Slovenian
Keywords:govorjeni jeziki, prevajanje


Comments

Leave comment

You must log in to leave a comment.

Comments (0)
0 - 0 / 0
 
There are no comments!

Back
Logos of partners University of Maribor University of Ljubljana University of Primorska University of Nova Gorica