Exploring Large Language Models for Data Augmentation: A Case Study for Text Style Transfer

Toshevska, Martina; Mirceva, Georgina; Gievska, Sonja

doi:10.1109/mipro65660.2025.11131729

Exploring Large Language Models for Data Augmentation: A Case Study for Text Style Transfer

Journal

2025 MIPRO 48th ICT and Electronics Convention

Date Issued

2025-06-02

Author(s)

DOI

10.1109/mipro65660.2025.11131729

Abstract

Text style transfer is the task that involves modifying a sentence to adapt to a desired target style while preserving its original meaning. It often requires high-quality parallel datasets that are not always available. This paper explores data augmentation techniques for text style transfer, leveraging large language models (LLMs) to address the challenge of dataset scarcity. Our approach generates synthetic parallel data by prompting LLMs to paraphrase and/or rewrite sentences in diverse styles, enabling the creation of larger and more varied datasets. We demonstrate the applicability of this approach across three tasks: formality transfer with the GYAFC dataset, sentiment transfer with the Yelp dataset, and personal style transfer with the Shakespeare dataset. This work introduces an approach to enhance dataset availability, aiming to foster further research in the field and support a broader application of LLMs. The experiments were performed only with English language datasets.

Subjects

large language models...

text style transfer

data augmentation