Please use this identifier to cite or link to this item:
http://hdl.handle.net/20.500.12188/10169
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Jofche, Nasi | en_US |
dc.contributor.author | Jovanovik, Milos | en_US |
dc.contributor.author | Trajanov, Dimitar | en_US |
dc.date.accessioned | 2021-02-18T09:06:34Z | - |
dc.date.available | 2021-02-18T09:06:34Z | - |
dc.date.issued | 2019-05 | - |
dc.identifier.uri | http://hdl.handle.net/20.500.12188/10169 | - |
dc.description.abstract | Medical datasets that contain data relating to drugs and chemical substances, in general tend to contain multiple variations of a generic name which denotes the same drug or a drug product. This ambiguity lies in the fact that a single drug, referenced by a unique code, has an active substance which can be known under different chemical names in different countries, thus forming an obstacle during the process for extracting relevant and useful information. To overcome the issues presented by this ambiguity, we developed a scalable, term frequency based data cleaning algorithm, that solely uses the data available in the dataset to infer the correct generic name for each drug based on text similarities, thus forming the roots for building a model that would be able to predict generic names for related and previously unseen drug records with high accuracy. This paper describes the application of the algorithm towards the cleaning and standardization process of an already populated drug products availability dataset, by representing all of the variations of a substance under a single generic name, thus eliminating ambiguity. Our proposed algorithm is also evaluated against a Linked Data approach for detecting related drug products in the dataset. | en_US |
dc.language.iso | en | en_US |
dc.publisher | Ss. Cyril and Methodius University in Skopje, Faculty of Computer Science and Engineering, Republic of North Macedonia | en_US |
dc.relation.ispartofseries | CIIT 2019; | - |
dc.subject | Named entity | en_US |
dc.subject | Data cleaning | en_US |
dc.subject | Text similarity | en_US |
dc.subject | Drug data | en_US |
dc.subject | Drugs | en_US |
dc.title | Named Entity Discovery for the Drug Domain | en_US |
dc.type | Proceeding article | en_US |
dc.relation.conference | 16th International Conference on Informatics and Information Technologies - CIIT 2019 | en_US |
item.fulltext | With Fulltext | - |
item.grantfulltext | open | - |
crisitem.author.dept | Faculty of Computer Science and Engineering | - |
crisitem.author.dept | Faculty of Computer Science and Engineering | - |
Appears in Collections: | Faculty of Computer Science and Engineering: Conference papers |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
namedentitydrugs-ciit2019.pdf | 101.64 kB | Adobe PDF | View/Open |
Page view(s)
79
checked on Oct 10, 2024
Download(s)
20
checked on Oct 10, 2024
Google ScholarTM
Check
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.