Download PDFOpen PDF in browserRobustness of Large Language Models: Mitigating Adversarial Attacks and Input PerturbationsEasyChair Preprint 122749 pages•Date: February 24, 2024AbstractThis paper explores the robustness of LLMs and strategies for mitigating the impact of adversarial attacks and input perturbations. Adversarial attacks, where small, carefully crafted perturbations are added to input data to induce misclassification or undesired behavior, can exploit vulnerabilities in LLMs and compromise their performance. Additionally, input perturbations, such as typographical errors or grammatical inconsistencies, can also degrade the accuracy and reliability of LLMs in practical settings. To address these challenges, various approaches have been proposed, including adversarial training, robust optimization techniques, and input preprocessing methods. Keyphrases: language, large, models
|