Running Semantic Search Over Complete English Wikipedia on a Local Computer
Date Issued
2022
Author(s)
Tudjarski, Stojancho
Madevska Bogdanova, Ana
Abstract
We implement a system that allows providing
human-like answers to human-like questions extracted from a
considerable amount of data in a reasonable time measured
in seconds. To prove that the volume of the data used as a
knowledge base where the answers to the questions are searched
for, we used a complete English Wikipedia dump running on a
local laptop under Windows10 OS, exposed to a software that
receives questions and provides the three most relevant solutions.
The entire technology stack of the implementation is the subject
of this research.
The main conclusion of this research is that it is possible
to implement semantic search over a vast amount of text data
on a local computer with an average hardware specifications,
which is of outermost importance in developing different NLP
systems.
human-like answers to human-like questions extracted from a
considerable amount of data in a reasonable time measured
in seconds. To prove that the volume of the data used as a
knowledge base where the answers to the questions are searched
for, we used a complete English Wikipedia dump running on a
local laptop under Windows10 OS, exposed to a software that
receives questions and provides the three most relevant solutions.
The entire technology stack of the implementation is the subject
of this research.
The main conclusion of this research is that it is possible
to implement semantic search over a vast amount of text data
on a local computer with an average hardware specifications,
which is of outermost importance in developing different NLP
systems.
Subjects
File(s)![Thumbnail Image]()
Loading...
Name
CIIT_2022_31.pdf
Size
1.02 MB
Format
Adobe PDF
Checksum
(MD5):602722d1ba703cf74b0fdc1bcbbcce7a
