Transformation of nominal features into numeric in supervised multi-class problems based on the weight of evidence parameter
Date Issued
2015-09-13
Author(s)
Abstract
Machine learning has received increased interest
by both the scientific community and the industry. Most of the
machine learning algorithms rely on certain distance metrics that
can only be applied to numeric data. This becomes a problem
in complex datasets that contain heterogeneous data consisted of
numeric and nominal (i.e. categorical) features. Thus the need
of transformation from nominal to numeric data. Weight of
evidence (WoE) is one of the parameters that can be used for
transformation of the nominal features to numeric. In this paper
we describe a method that uses WoE to transform the features.
Although the applicability of this method is researched to some
extent, in this paper we extend its applicability for multi-class
problems, which is a novelty. We compared it with the method
that generates dummy features. We test both methods on binary
and multi-class classification problems with different machine
learning algorithms. Our experiments show that the WoE based
transformation generates smaller number of features compared
to the technique based on generation of dummy features while
also improving the classification accuracy, reducing memory
complexity and shortening the execution time. Be that as it
may, we also point out some of its weaknesses and make some
recommendations when to use the method based on dummy
features generation instead.
by both the scientific community and the industry. Most of the
machine learning algorithms rely on certain distance metrics that
can only be applied to numeric data. This becomes a problem
in complex datasets that contain heterogeneous data consisted of
numeric and nominal (i.e. categorical) features. Thus the need
of transformation from nominal to numeric data. Weight of
evidence (WoE) is one of the parameters that can be used for
transformation of the nominal features to numeric. In this paper
we describe a method that uses WoE to transform the features.
Although the applicability of this method is researched to some
extent, in this paper we extend its applicability for multi-class
problems, which is a novelty. We compared it with the method
that generates dummy features. We test both methods on binary
and multi-class classification problems with different machine
learning algorithms. Our experiments show that the WoE based
transformation generates smaller number of features compared
to the technique based on generation of dummy features while
also improving the classification accuracy, reducing memory
complexity and shortening the execution time. Be that as it
may, we also point out some of its weaknesses and make some
recommendations when to use the method based on dummy
features generation instead.
Subjects
File(s)![Thumbnail Image]()
Loading...
Name
Transformation_of_nominal_features_into.pdf
Size
203.67 KB
Format
Adobe PDF
Checksum
(MD5):55587966e5c911b4386a27a9404785e8
