Feature selection and allocation to diverse subsets for multi-label learning problems with large datasets
Date Issued
2014-09-07
Author(s)
Gjorgjevikj, Dejan
Abstract
Feature selection is important phase in machine
learning and in the case of multi-label classification, it can be
considerably challenging. In like manner, finding the best subset
of good features is involved and difficult when the dataset has
significantly large number of features (more than a thousand). In
this paper we address the problem of feature selection for multilabel classification with large number of features. The proposed
method is a hybrid of two phases - preliminary feature selection
based on the information value and additional correlation-based
selection. We show how with the first phase we can do preliminary
selection of features from tens of thousands to couple of hundred,
and then with the second phase we can make fine-grained feature
selection with more sophisticated but computationally intensive
methods. Finally, we analyze the ways of allocating the selected
features to diverse subsets, which are suitable for training of
ensembles of classifiers.
learning and in the case of multi-label classification, it can be
considerably challenging. In like manner, finding the best subset
of good features is involved and difficult when the dataset has
significantly large number of features (more than a thousand). In
this paper we address the problem of feature selection for multilabel classification with large number of features. The proposed
method is a hybrid of two phases - preliminary feature selection
based on the information value and additional correlation-based
selection. We show how with the first phase we can do preliminary
selection of features from tens of thousands to couple of hundred,
and then with the second phase we can make fine-grained feature
selection with more sophisticated but computationally intensive
methods. Finally, we analyze the ways of allocating the selected
features to diverse subsets, which are suitable for training of
ensembles of classifiers.
File(s)![Thumbnail Image]()
Loading...
Name
500.pdf
Size
141.84 KB
Format
Adobe PDF
Checksum
(MD5):774d642618a5fcdfa207c22537a996ec
