Comparative analysis of NLP-based models for company classification

Rizinski, Maryan; Jankov, Andrej; Sankaradas, Vignesh; Pinsky, Eugene; Mishkovski, Igor; Trajanov, Dimitar

Please use this identifier to cite or link to this item: http://hdl.handle.net/20.500.12188/33956

DC Field	Value	Language
dc.contributor.author	Rizinski, Maryan	en_US
dc.contributor.author	Jankov, Andrej	en_US
dc.contributor.author	Sankaradas, Vignesh	en_US
dc.contributor.author	Pinsky, Eugene	en_US
dc.contributor.author	Mishkovski, Igor	en_US
dc.contributor.author	Trajanov, Dimitar	en_US
dc.date.accessioned	2025-08-25T09:42:35Z	-
dc.date.available	2025-08-25T09:42:35Z	-
dc.date.issued	2024-01-31	-
dc.identifier.uri	http://hdl.handle.net/20.500.12188/33956	-
dc.description.abstract	The task of company classification is traditionally performed using established standards, such as the Global Industry Classification Standard (GICS). However, these approaches heavily rely on laborious manual efforts by domain experts, resulting in slow, costly, and vendor-specific assignments. Therefore, we investigate recent natural language processing (NLP) advancements to automate the company classification process. In particular, we employ and evaluate various NLP-based models, including zero-shot learning, One-vs-Rest classification, multi-class classifiers, and ChatGPT-aided classification. We conduct a comprehensive comparison among these models to assess their effectiveness in the company classification task. The evaluation uses the Wharton Research Data Services (WRDS) dataset, consisting of textual descriptions of publicly traded companies. Our findings reveal that the RoBERTa and One-vs-Rest classifiers surpass the other methods, achieving F1 scores of 0.81 and 0.80 on the WRDS dataset, respectively. These results demonstrate that deep learning algorithms offer the potential to automate, standardize, and continuously update classification systems in an efficient and cost-effective way. In addition, we introduce several improvements to the multi-class classification techniques: (1) in the zero-shot methodology, we use TF-IDF to enhance sector representation, yielding improved accuracy in comparison to standard zero-shot classifiers; (2) next, we use ChatGPT for dataset generation, revealing potential in scenarios where datasets of company descriptions are lacking; and (3) we also employ K-Fold to reduce noise in the WRDS dataset, followed by conducting experiments to assess the impact of noise reduction on the company classification results.	en_US
dc.publisher	MDPI	en_US
dc.relation.ispartof	Information	en_US
dc.subject	company classification; industry classification; natural language processing; machine learning; deep learning; finance; fintech	en_US
dc.title	Comparative analysis of NLP-based models for company classification	en_US
dc.type	Journal Article	en_US
item.fulltext	With Fulltext	-
item.grantfulltext	open	-
crisitem.author.dept	Faculty of Computer Science and Engineering	-
crisitem.author.dept	Faculty of Computer Science and Engineering	-
Appears in Collections:	Faculty of Computer Science and Engineering: Journal Articles

Files in This Item:

File	Size	Format
Comparative_Analysis_of_NLP_Based_Models.pdf	684.17 kB	Adobe PDF	View/Open

Show simple item record

Google Scholar^TM

Check

Repository of UKIM

Files in This Item:

Google ScholarTM

Google Scholar^TM