Please use this identifier to cite or link to this item: http://hdl.handle.net/20.500.12188/23141
Title: Web genre classification via hierarchical multi-label classification
Authors: Madjarov, Gjorgji
Vidulin, Vedrana
Dimitrovski, Ivica 
Kocev, Dragi
Keywords: Web genre classification · Hierarchy construction · Hierarchical multi-label classification
Issue Date: 14-Oct-2015
Publisher: Springer, Cham
Conference: International Conference on Intelligent Data Engineering and Automated Learning
Abstract: The increase of the number of web pages prompts for improvement of the search engines. One such improvement can be by specifying the desired web genre of the result web pages. This opens the need for web genre prediction based on the information on the web page. Typically, this task is addressed as multi-class classification, with some recent studies advocating the use of multi-label classification. In this paper, we propose to exploit the web genres labels by constructing a hierarchy of web genres and then use methods for hierarchical multi-label classification to boost the predictive performance. We use two methods for hierarchy construction: expert-based and data-driven. The evaluation on a benchmark dataset (20-Genre collection corpus) reveals that using a hierarchy of web genres significantly improves the predictive performance of the classifiers and that the data-driven hierarchy yields similar performance as the expert-driven with the added value that it was obtained automatically and fast.
URI: http://hdl.handle.net/20.500.12188/23141
Appears in Collections:Faculty of Computer Science and Engineering: Conference papers

Files in This Item:
File Description SizeFormat 
978-3-319-24834-9_2.pdf610.62 kBAdobe PDFView/Open
Show full item record

Page view(s)

33
checked on Jun 6, 2024

Download(s)

7
checked on Jun 6, 2024

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.