Analysis of Feature Selection Algorithms on High Dimensional Data

Sowmya Sanagavarapu; Mariam Jamilah; Barathkumar V

Please use this identifier to cite or link to this item: http://hdl.handle.net/20.500.12188/8268

DC Field	Value	Language
dc.contributor.author	Sowmya Sanagavarapu	en_US
dc.contributor.author	Mariam Jamilah	en_US
dc.contributor.author	Barathkumar V	en_US
dc.date.accessioned	2020-05-22T07:57:27Z	-
dc.date.available	2020-05-22T07:57:27Z	-
dc.date.issued	2020-05-08	-
dc.identifier.uri	http://hdl.handle.net/20.500.12188/8268	-
dc.description.abstract	Dimensionality of a dataset refers to the number of attributes present in the dataset. At times, the number of attributes is greater than the number of observations, this gives rise to high dimensional data. In high dimensional data, the dimensions are so high that calculations become extremely difficult and this in turn increases the processing and training time. Thus, it is vital to reduce the dimensionality of data [1]. Dimensionality reduction means to simplify the data without affecting data integrity. For this study, we have taken the Dorothea dataset [10] from UC Irvine Machine Learning Repository. Dorothea is a drug discovery dataset. Drugs are organic molecules that bind to a target on a receptor, they are classified as active or inactive based on their ability to bind. New drugs are formed usually by identifying and isolating the receptor to which the chemical compounds have to bind. Then many small molecules are tested for their ability to bind to this receptor. The class label shows whether the molecule will bind to the drug or not. In this paper, we investigate the dimensional reduction achieved by applying three Feature Selection algorithms [2]- Filter, Wrapper and Hybrid with no loss in the integrity of the dataset. We evaluated the accuracy of the obtained data using a C4.5 Classification algorithm [6]. It is used to predict categorical class label of the dataset after trainingg it using the training dataset. The results of each algorithm [1] have been compared and analyzed in order to arrive at the best suited algorithm.	en_US
dc.language.iso	en	en_US
dc.publisher	Ss. Cyril and Methodius University in Skopje, Faculty of Computer Science and Engineering, Republic of North Macedonia	en_US
dc.relation.ispartofseries	CIIT 2020 full papers;37	-
dc.subject	classifier, relief filter, hybrid, Las Vegas wrapper, test data, training data	en_US
dc.title	Analysis of Feature Selection Algorithms on High Dimensional Data	en_US
dc.type	Proceeding article	en_US
dc.relation.conference	17th International Conference on Informatics and Information Technologies - CIIT 2020	en_US
item.grantfulltext	open	-
item.fulltext	With Fulltext	-
Appears in Collections:	International Conference on Informatics and Information Technologies

Files in This Item:

File	Description	Size	Format
CIIT2020_paper_37.pdf		476.98 kB	Adobe PDF	View/Open

Show simple item record

Page view(s)

325

checked on May 15, 2024

Download(s)

103

checked on May 15, 2024

Google Scholar^TM

Check

Repository of UKIM

Files in This Item:

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM