Benchmarking OpenAI’s APIs and other Large Language Models for Repeatable and Efficient Question Answering Across Multiple Documents

Filipovska, Elena; Mladenovska, Ana; Bajrami, Merxhan; Dobreva, Jovana; Hillman, Velislava; Lameski, Petre; Zdravevski, Eftim

Ве молиме користете го овој идентификатор да го цитирате или поврзете овој запис: http://hdl.handle.net/20.500.12188/33585

DC Field	Value	Language
dc.contributor.author	Filipovska, Elena	en_US
dc.contributor.author	Mladenovska, Ana	en_US
dc.contributor.author	Bajrami, Merxhan	en_US
dc.contributor.author	Dobreva, Jovana	en_US
dc.contributor.author	Hillman, Velislava	en_US
dc.contributor.author	Lameski, Petre	en_US
dc.contributor.author	Zdravevski, Eftim	en_US
dc.date.accessioned	2025-05-21T07:28:22Z	-
dc.date.available	2025-05-21T07:28:22Z	-
dc.date.issued	2024-09-08	-
dc.identifier.uri	http://hdl.handle.net/20.500.12188/33585	-
dc.description.abstract	The rapid growth of document volumes and com plexity in various domains necessitates advanced automated methods to enhance the efficiency and accuracy of information extraction and analysis. This paper aims to evaluate the efficiency and repeatability of OpenAI’s APIs and other Large Language Models (LLMs) in automating question-answering tasks across multiple documents, specifically focusing on analyzing Data Pri vacy Policy (DPP) documents of selected EdTech providers. We test how well these models perform on large-scale text processing tasks using the OpenAI’s LLM models (GPT 3.5 Turbo, GPT 4, GPT 4o) and APIs in several frameworks: direct API calls (i.e., one-shot learning), LangChain, and Retrieval Augmented Generation (RAG) systems. We also evaluate a local deployment of quantized versions (with FAISS) of LLM models (Llama-2- 13B-chat-GPTQ). Through systematic evaluation against pre defined use cases and a range of metrics, including response format, execution time, and cost, our study aims to provide insights into the optimal practices for document analysis. Our findings demonstrate that using OpenAI’s LLMs via API calls is a workable workaround for accelerating document analysis when using a local GPU-powered infrastructure is not a viable solution, particularly for long texts. On the other hand, the local deployment is quite valuable for maintaining the data within the private infrastructure. Our findings show that the quantized models retain substantial relevance even with fewer parameters than ChatGPT and do not impose processing restrictions on the number of tokens. This study offers insights on maximizing the use of LLMs for better efficiency and data governance in addition to confirming their usefulness in improving document analysis procedures.	en_US
dc.publisher	IEEE	en_US
dc.subject	OpenAI, LangChain, RAG, GPT, QA, LLM, Llama, Large Language Models, Multi-document, one-shot learn ing, few-shot learning Q&A	en_US
dc.title	Benchmarking OpenAI’s APIs and other Large Language Models for Repeatable and Efficient Question Answering Across Multiple Documents	en_US
dc.type	Proceedings	en_US
dc.relation.conference	2024 19th Conference on Computer Science and Intelligence Systems (FedCSIS)	en_US
item.fulltext	With Fulltext	-
item.grantfulltext	open	-
crisitem.author.dept	Faculty of Computer Science and Engineering	-
crisitem.author.dept	Faculty of Computer Science and Engineering	-
Appears in Collections:	Faculty of Computer Science and Engineering: Conference papers

Files in This Item:

File	Size	Format
3979.pdf	288.48 kB	Adobe PDF	View/Open

Прикажи едноставен запис

Google Scholar^TM

Проверете

Записите во DSpace се заштитени со авторски права, со сите права задржани, освен ако не е поинаку наведено.

Репозиториум на трудови на УКИМ

Files in This Item:

Google ScholarTM

Google Scholar^TM