Evaluating and improving the robustness of security attack detectors generated by LLMs

Pasini, Samuele; Kim, Jinhan; Aiello, Tommaso; Cabrera Lozoya, Rocío; Sabetta, Antonino; Tonella, Paolo

doi:10.1007/s10664-025-10743-w

Back

Tonella_2025_Springer_s10664-025-10743-w

Journal article

Evaluating and improving the robustness of security attack detectors generated by LLMs

Pasini, Samuele ORCID Università della Svizzera italiana, Svizzera
Kim, Jinhan Università della Svizzera italiana, Svizzera
Aiello, Tommaso SAP Labs France, Mougins, France
Cabrera Lozoya, Rocío SAP Labs France, Mougins, France
Sabetta, Antonino SAP Labs France, Mougins, France
Tonella, Paolo ORCID Università della Svizzera italiana, Svizzera

2025

Published in:

Empirical Software Engineering. - 2025, vol. 31, no. 2, p. 35

Retrieval Augmented Generation

English Large Language Models (LLMs) are increasingly used in software development to generate functions, such as attack detectors, that implement security requirements. A key challenge is ensuring the LLMs have enough knowledge to address specific security requirements, such as information about existing attacks. For this, we propose an approach integrating Retrieval Augmented Generation (RAG) and Self-Ranking into the LLM pipeline. RAG enhances the robustness of the output by incorporating external knowledge sources, while the Self-Ranking technique, inspired by the concept of Self-Consistency, generates multiple reasoning paths and creates ranks to select the most robust detector. Our extensive empirical study targets code generated by LLMs to detect two prevalent injection attacks in web security: Cross-Site Scripting (XSS) and SQL injection (SQLi). Results show a significant improvement in detection performance while employing RAG and Self-Ranking, with an increase of up to 71%pt (on average 37%pt) and up to 43%pt (on average 6%pt) in the F2-Score for XSS and SQLi detection, respectively.

Collections

USI Faculty of Informatics

Language

English

Classification

Computer science and technology

License

CC BY

Open access status

hybrid

Identifiers

ISSN 1382-3256
ISSN 1573-7616
DOI 10.1007/s10664-025-10743-w
RICERCO 59691
ARK ark:/12658/srd1333995

Persistent URL

https://n2t.net/ark:/12658/srd1333995

Statistics

Document views: 37 File downloads:

Tonella_2025_Springer_s10664-025-10743-w: 64

Journal article

Evaluating and improving the robustness of security attack detectors generated by LLMs

Large Language Models

Code security

Attack detection

Retrieval Augmented Generation

Statistics