Framework for Real-Time Parallel and Distributed Natural Language Processing
Date Issued
2021
Author(s)
Mileski, D.
Abstract
In this paper, we present a new framework for
parallel and distributed processing of real-time text streams capable for executing NLP-Natural Language Processing algorithms.
The focus is set on acceleration based on attention for building
the topology, and not on the individual NLP algorithms. We
elaborate the configuration of our specific use case, and discuss
the reduction of the time required for system configuration in
order to use the benefits of virtualization and containers.
Research hypothesis: We can process more text tuples per
unit time using the new developed framework for an algorithm
that divides the sequential algorithm into smaller jobs and
tasks including tokenisation, part of speech tagging, stopwords,
sentiment analysis, where each of these individual jobs are specific
nodes in the Apache Storm-based topology.
We have conducted an experimental proof-of-concept and
found the optimal configuration confirming the validity of the
hypothesis.
parallel and distributed processing of real-time text streams capable for executing NLP-Natural Language Processing algorithms.
The focus is set on acceleration based on attention for building
the topology, and not on the individual NLP algorithms. We
elaborate the configuration of our specific use case, and discuss
the reduction of the time required for system configuration in
order to use the benefits of virtualization and containers.
Research hypothesis: We can process more text tuples per
unit time using the new developed framework for an algorithm
that divides the sequential algorithm into smaller jobs and
tasks including tokenisation, part of speech tagging, stopwords,
sentiment analysis, where each of these individual jobs are specific
nodes in the Apache Storm-based topology.
We have conducted an experimental proof-of-concept and
found the optimal configuration confirming the validity of the
hypothesis.
Subjects
File(s)![Thumbnail Image]()
Loading...
Name
MIPRO2021_Framework_for_real_time_parallel_and_distributed_Natural_Language_Processing.pdf
Size
530.72 KB
Format
Adobe PDF
Checksum
(MD5):363dcc32f665ed84357231fc7acd9324
