Weight of evidence as a tool for attribute transformation in the preprocessing stage of supervised learning algorithms
Date Issued
2011-07-31
Author(s)
Abstract
Transformation of features is a common task in the
data preprocessing stage while solving data mining
and classification problems. Many classification
algorithms have preference of continual attributes
over nominal attributes, and sometimes the distance between different data points cannot be estimated if the values of the attributes are not continual and normalized. The Weight of Evidence has
some very desirable properties that make it very
useful tool for the transformation of attributes, but
unfortunately there are some preconditions that
need to be met in order to calculate it. In this paper
we propose a modified calculation of the Weight of
Evidence that overcomes these preconditions, and
additionally makes it usable for test examples that
were not present in the training set. The proposed
transformation can be used for all supervised learning problems. At the end, we present the results
from the proposed transformation and discuss the
benefits of the transformed nominal and continual
attributes from the PAKDD 2009 dataset. The results show that the proposed transformation contributes towards a better performance in all tested
classification algorithms than the method that generates dummy (i.e. binary) variables for each value of the nominal attributes.
data preprocessing stage while solving data mining
and classification problems. Many classification
algorithms have preference of continual attributes
over nominal attributes, and sometimes the distance between different data points cannot be estimated if the values of the attributes are not continual and normalized. The Weight of Evidence has
some very desirable properties that make it very
useful tool for the transformation of attributes, but
unfortunately there are some preconditions that
need to be met in order to calculate it. In this paper
we propose a modified calculation of the Weight of
Evidence that overcomes these preconditions, and
additionally makes it usable for test examples that
were not present in the training set. The proposed
transformation can be used for all supervised learning problems. At the end, we present the results
from the proposed transformation and discuss the
benefits of the transformed nominal and continual
attributes from the PAKDD 2009 dataset. The results show that the proposed transformation contributes towards a better performance in all tested
classification algorithms than the method that generates dummy (i.e. binary) variables for each value of the nominal attributes.
Subjects
File(s)![Thumbnail Image]()
Loading...
Name
Weight_of_evidence_as_a_tool_for_attribute_transformation_in_the_preprocessing_stage_in_supervised_learning_algorithms_KOREGIRANO.pdf
Size
4.1 MB
Format
Adobe PDF
Checksum
(MD5):7e295b0fffe265d425d1a3767e2b161e
