Named Entity Discovery for the Drug Domain

Jofche, Nasi; Jovanovik, Milos; Trajanov, Dimitar

Ве молиме користете го овој идентификатор да го цитирате или поврзете овој запис: http://hdl.handle.net/20.500.12188/10169

DC Field	Value	Language
dc.contributor.author	Jofche, Nasi	en_US
dc.contributor.author	Jovanovik, Milos	en_US
dc.contributor.author	Trajanov, Dimitar	en_US
dc.date.accessioned	2021-02-18T09:06:34Z	-
dc.date.available	2021-02-18T09:06:34Z	-
dc.date.issued	2019-05	-
dc.identifier.uri	http://hdl.handle.net/20.500.12188/10169	-
dc.description.abstract	Medical datasets that contain data relating to drugs and chemical substances, in general tend to contain multiple variations of a generic name which denotes the same drug or a drug product. This ambiguity lies in the fact that a single drug, referenced by a unique code, has an active substance which can be known under different chemical names in different countries, thus forming an obstacle during the process for extracting relevant and useful information. To overcome the issues presented by this ambiguity, we developed a scalable, term frequency based data cleaning algorithm, that solely uses the data available in the dataset to infer the correct generic name for each drug based on text similarities, thus forming the roots for building a model that would be able to predict generic names for related and previously unseen drug records with high accuracy. This paper describes the application of the algorithm towards the cleaning and standardization process of an already populated drug products availability dataset, by representing all of the variations of a substance under a single generic name, thus eliminating ambiguity. Our proposed algorithm is also evaluated against a Linked Data approach for detecting related drug products in the dataset.	en_US
dc.language.iso	en	en_US
dc.publisher	Ss. Cyril and Methodius University in Skopje, Faculty of Computer Science and Engineering, Republic of North Macedonia	en_US
dc.relation.ispartofseries	CIIT 2019;	-
dc.subject	Named entity	en_US
dc.subject	Data cleaning	en_US
dc.subject	Text similarity	en_US
dc.subject	Drug data	en_US
dc.subject	Drugs	en_US
dc.title	Named Entity Discovery for the Drug Domain	en_US
dc.type	Proceeding article	en_US
dc.relation.conference	16th International Conference on Informatics and Information Technologies - CIIT 2019	en_US
item.grantfulltext	open	-
item.fulltext	With Fulltext	-
crisitem.author.dept	Faculty of Computer Science and Engineering	-
crisitem.author.dept	Faculty of Computer Science and Engineering	-
Appears in Collections:	Faculty of Computer Science and Engineering: Conference papers

Files in This Item:

File	Опис	Size	Format
namedentitydrugs-ciit2019.pdf		101.64 kB	Adobe PDF	View/Open

Прикажи едноставен запис

Page view(s)

131

checked on 20.7.2025

Download(s)

27

checked on 20.7.2025

Google Scholar^TM

Проверете

Записите во DSpace се заштитени со авторски права, со сите права задржани, освен ако не е поинаку наведено.

Репозиториум на трудови на УКИМ

Files in This Item:

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM