Abstract

Recent advances in Natural Language Processing (NLP) are highly based on black-box score-based models that provide only final predictions along with their score. This opacity impedes the comprehension of their internal decision-making processes, complicating the identification of potential weaknesses. A powerful strategy for analyzing model vulnerabilities is the generation of text adversarial examples. These attacks introduce subtle text perturbations that cause victim models to make incorrect predictions while preserving the original semantic meaning. This paper presents a novel method for generating text adversarial examples through Large Language Models (LLMs). The proposed method uses the outstanding text generation capabilities of LLMs to modify the original input text at multiple granularities: character-, word-, and sentence-level. First, sentence-level perturbations are introduced by generating paraphrases with an LLM instruction prompt. Next, further character- and word-level perturbations are introduced to words that most affect predictions using another set of LLM instruction prompts. In particular, vulnerable words are perturbed by replacing them with their synonyms or misspelled variants, or by inserting additional neutral words adjacent to them. Experiments were conducted to assess the proposal’s viability on two sentiment classification tasks: sentence-level reviews and full-length reviews. The proposal demonstrates an advantage over many well-known approaches based on LLMs. It preserves the original semantics to a similar extent, while increasing the deception of victim models by 29–85 % over the best-analyzed state-of-the-art methods.
Loading...

Quotes

0 citations in WOS
0 citations in

Journal Title

Journal ISSN

Volume Title

Publisher

Elsevier

URL external

Description

Citation

Natalia Madrueño, Alberto Fernández-Isabel, Rubén R. Fernández, Isaac Martín de Diego, Advancing text adversarial example generation using large language models, Knowledge-Based Systems, Volume 329, Part B, 2025, 114361, ISSN 0950-7051, https://doi.org/10.1016/j.knosys.2025.114361. (https://www.sciencedirect.com/science/article/pii/S0950705125014005)

Endorsement

Review

Supplemented By

Referenced By

Statistics

Views
8
Downloads
45

Bibliographic managers

Document viewer

Select a file to preview:
Reload