Repository logo
Communities & Collections
Research Outputs
Fundings & Projects
People
Statistics
User Manual
Have you forgotten your password?
  1. Home
  2. Faculty of Computer Science and Engineering
  3. International Conference on Informatics and Information Technologies
  4. Analysis of Feature Selection Algorithms on High Dimensional Data
Details

Analysis of Feature Selection Algorithms on High Dimensional Data

Date Issued
2020-05-08
Author(s)
Sowmya Sanagavarapu
Mariam Jamilah
Barathkumar V
Abstract
Dimensionality of a dataset refers to the number of attributes present in the dataset. At times, the number of attributes is greater than the number of observations, this gives rise to high dimensional data. In high dimensional data, the dimensions are so high that calculations become extremely difficult and this in turn increases the processing and training time. Thus, it is vital to reduce the dimensionality of data [1]. Dimensionality reduction means to simplify the data without
affecting data integrity. For this study, we have taken the Dorothea dataset [10] from UC Irvine Machine Learning Repository. Dorothea is a drug discovery dataset. Drugs are organic molecules that bind to a target on a receptor, they are classified as active or inactive based on their ability to bind. New drugs are formed usually by identifying and isolating the receptor to which the chemical compounds have to bind. Then many small molecules are tested for their ability to bind to this receptor. The class label shows whether the molecule will bind to the drug or not. In this paper, we investigate the dimensional reduction achieved by applying three Feature Selection algorithms [2]- Filter, Wrapper and Hybrid with no loss in the integrity of the dataset. We evaluated the accuracy of the obtained data using a C4.5 Classification algorithm [6]. It is used to predict categorical class label of the dataset after trainingg it using the training dataset. The results of each algorithm [1] have been compared and analyzed in order to arrive at the best suited algorithm.
Subjects

classifier, relief fi...

File(s)
Loading...
Thumbnail Image
Name

CIIT2020_paper_37.pdf

Size

476.98 KB

Format

Adobe PDF

Checksum

(MD5):5a1cf3eec9dc4f89507390b07c770f50

⠀

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Accessibility settings
  • Privacy policy
  • End User Agreement
  • Send Feedback
Repository logo COAR Notify