Real-Time Clustering of Text Data for News Aggregation
Date Issued
2023-11-21
Author(s)
Najkov, D
Gusev, Marjan
Abstract
This paper explores real-time text data clustering in news aggregation using the Message Passing Interface (MPI) with parallelized K-Means algorithm variants. We evaluate batch-based, centroid-based, and fusion-based methods, measuring their training time in two experiments—one based on cluster complexity and the other on dataset size. Our study aims to identify the most effective method and analyze trade-offs between parallelization strategies. Results indicate that MPI-based solutions substantially accelerate training time compared to serial K-Means implementation in this context.
Subjects
