Validation of language agnostic models for discourse marker detection

Damova, Mariana; Mishev, Kostadin; Valunaite Oleskeviciene, Giedre; Liebeskind, Chaya; da Purificação Silvano, Maria; Trajanov, Dimitar; Truica, Ciprian-Octavian; Apostol, Elena-Simona; Chiarcos, Christian; Baczkowska, Anna

Please use this identifier to cite or link to this item: http://hdl.handle.net/20.500.12188/30399

DC Field	Value	Language
dc.contributor.author	Damova, Mariana	en_US
dc.contributor.author	Mishev, Kostadin	en_US
dc.contributor.author	Valunaite Oleskeviciene, Giedre	en_US
dc.contributor.author	Liebeskind, Chaya	en_US
dc.contributor.author	da Purificação Silvano, Maria	en_US
dc.contributor.author	Trajanov, Dimitar	en_US
dc.contributor.author	Truica, Ciprian-Octavian	en_US
dc.contributor.author	Apostol, Elena-Simona	en_US
dc.contributor.author	Chiarcos, Christian	en_US
dc.contributor.author	Baczkowska, Anna	en_US
dc.date.accessioned	2024-06-05T09:23:23Z	-
dc.date.available	2024-06-05T09:23:23Z	-
dc.date.issued	2023	-
dc.identifier.uri	http://hdl.handle.net/20.500.12188/30399	-
dc.description.abstract	Using language models to detect or predict the presence of language phenomena in the text has become a mainstream research topic. With the rise of generative models, experiments using deep learning and transformer models trigger intense interest. Aspects like precision of predictions, portability to other languages or phenomena, scale have been central to the research community. Discourse markers, as language phenomena, perform important functions, such as signposting, signalling, and rephrasing, by facilitating discourse organization. Our paper is about discourse markers detection, a complex task as it pertains to a language phenomenon manifested by expressions that can occur as content words in some contexts and as discourse markers in others. We have adopted language agnostic model trained in English to predict the discourse marker presence in texts in 8 other unseen by the model languages with the goal to evaluate how well the model performs in different structure and lexical properties languages. We report on the process of evaluation and validation of the model's performance across European Portuguese, Hebrew, German, Polish, Romanian, Bulgarian, Macedonian, and Lithuanian and about the results of this validation. This research is a key step towards multilingual language processing.	en_US
dc.relation.ispartof	Language, Data and Knowledge 2023 (LDK 2023): Proceedings of the 4th Conference on Language, Data and Knowledge	en_US
dc.title	Validation of language agnostic models for discourse marker detection	en_US
dc.type	Proceedings	en_US
item.fulltext	With Fulltext	-
item.grantfulltext	open	-
crisitem.author.dept	Faculty of Computer Science and Engineering	-
Appears in Collections:	Faculty of Computer Science and Engineering: Journal Articles

Files in This Item:

File	Description	Size	Format
649361.pdf		231.86 kB	Adobe PDF	View/Open

Show simple item record

Page view(s)

44

checked on May 3, 2025

Download(s)

6

checked on May 3, 2025

Google Scholar^TM

Check

Repository of UKIM

Files in This Item:

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM