Web genre classification via hierarchical multi-label classification
Date Issued
2015-10-14
Author(s)
Madjarov, Gjorgji
Vidulin, Vedrana
Kocev, Dragi
Abstract
The increase of the number of web pages prompts for
improvement of the search engines. One such improvement can be by
specifying the desired web genre of the result web pages. This opens
the need for web genre prediction based on the information on the web
page. Typically, this task is addressed as multi-class classification, with
some recent studies advocating the use of multi-label classification. In
this paper, we propose to exploit the web genres labels by constructing a hierarchy of web genres and then use methods for hierarchical
multi-label classification to boost the predictive performance. We use
two methods for hierarchy construction: expert-based and data-driven.
The evaluation on a benchmark dataset (20-Genre collection corpus)
reveals that using a hierarchy of web genres significantly improves the
predictive performance of the classifiers and that the data-driven hierarchy yields similar performance as the expert-driven with the added value
that it was obtained automatically and fast.
improvement of the search engines. One such improvement can be by
specifying the desired web genre of the result web pages. This opens
the need for web genre prediction based on the information on the web
page. Typically, this task is addressed as multi-class classification, with
some recent studies advocating the use of multi-label classification. In
this paper, we propose to exploit the web genres labels by constructing a hierarchy of web genres and then use methods for hierarchical
multi-label classification to boost the predictive performance. We use
two methods for hierarchy construction: expert-based and data-driven.
The evaluation on a benchmark dataset (20-Genre collection corpus)
reveals that using a hierarchy of web genres significantly improves the
predictive performance of the classifiers and that the data-driven hierarchy yields similar performance as the expert-driven with the added value
that it was obtained automatically and fast.
Subjects
File(s)![Thumbnail Image]()
Loading...
Name
978-3-319-24834-9_2.pdf
Size
610.62 KB
Format
Adobe PDF
Checksum
(MD5):78cc5251302bd18a2b1c9aff6d5df833
