Exploring Large Language Models for Data Augmentation: A Case Study for Text Style Transfer
Journal
2025 MIPRO 48th ICT and Electronics Convention
Date Issued
2025-06-02
Author(s)
DOI
10.1109/mipro65660.2025.11131729
Abstract
Text style transfer is the task that involves modifying a sentence to adapt to a desired target style while preserving its original meaning. It often requires high-quality parallel datasets that are not always available. This paper explores data augmentation techniques for text style transfer, leveraging large language models (LLMs) to address the challenge of dataset scarcity. Our approach generates synthetic parallel data by prompting LLMs to paraphrase and/or rewrite sentences in diverse styles, enabling the creation of larger and more varied datasets. We demonstrate the applicability of this approach across three tasks: formality transfer with the GYAFC dataset, sentiment transfer with the Yelp dataset, and personal style transfer with the Shakespeare dataset. This work introduces an approach to enhance dataset availability, aiming to foster further research in the field and support a broader application of LLMs. The experiments were performed only with English language datasets.
