Ве молиме користете го овој идентификатор да го цитирате или поврзете овој запис: http://hdl.handle.net/20.500.12188/22315
DC FieldValueLanguage
dc.contributor.authorMarkoski, Filipen_US
dc.contributor.authorMarkoska, Elenaen_US
dc.contributor.authorLjubešić, Nikolaen_US
dc.contributor.authorZdravevski, Eftimen_US
dc.contributor.authorKocarev, Ljupcoen_US
dc.date.accessioned2022-08-16T08:15:23Z-
dc.date.available2022-08-16T08:15:23Z-
dc.date.issued2021-09-
dc.identifier.urihttp://hdl.handle.net/20.500.12188/22315-
dc.description.abstractThere is a shortage of high-quality corpora for South-Slavic languages. Such corpora are useful to computer scientists and researchers in social sciences and humanities alike, focusing on numerous linguistic, content analysis, and natural language processing applications. This paper presents a workflow for mining Wikipedia content and processing it into linguistically-processed corpora, applied on the Bosnian, Bulgarian, Croatian, Macedonian, Serbian, Serbo-Croatian and Slovenian Wikipedia. We make the resulting seven corpora publicly available. We showcase these corpora by comparing the content of the underlying Wikipedias, our assumption being that the content of the Wikipedias reflects broadly the interests in various topics in these Balkan nations. We perform the content comparison by using topic modelling algorithms and various distribution comparisons. The results show that all Wikipedias are topically rather similar, with all of them covering art, culture, and literature, whereas they contain differences in geography, politics, history and scienceen_US
dc.titleCultural topic modelling over novel wikipedia corpora for south-slavic languagesen_US
dc.typeProceedingsen_US
dc.relation.conferenceInternational Conference on Recent Advances in Natural Language Processing (RANLP 2021)en_US
item.grantfulltextopen-
item.fulltextWith Fulltext-
crisitem.author.deptFaculty of Computer Science and Engineering-
Appears in Collections:Faculty of Computer Science and Engineering: Conference papers
Files in This Item:
File Опис SizeFormat 
2021.ranlp-1.104.pdf247.62 kBAdobe PDFView/Open
Прикажи едноставен запис

Page view(s)

33
checked on 9.11.2024

Download(s)

8
checked on 9.11.2024

Google ScholarTM

Проверете


Записите во DSpace се заштитени со авторски права, со сите права задржани, освен ако не е поинаку наведено.