Parallelizing file-type conversion for financial analysis
Date Issued
2023-11
Author(s)
Alek Jarmov
DOI
10.1109/TELFOR59449.2023.10372813
Abstract
Data analysis has gained significant traction, particularly in the era of artificial intelligence, offering novel approaches for financial data analysis. However, a data storage challenge arises prior to analysis. Financial data is commonly stored in the XLSX format, whereas for faster analysis and reduced server storage, the preferred format is CSV. This paper investigates the acceleration of XLSX to CSV conversion. The XLSX file’s main content is represented as a tree structure in XML format. Leveraging the independent nature of rows and files, we propose two methods for parallelizing the conversion process: single file parallelization and simultaneous parallel conversion of multiple files. Our results demonstrate the effectiveness of parallelization, resulting in reduced workflow waiting times.
Subjects
