May
Master thesis presentation: Benchmarking Large Language Models for Vulnerability Detection: Comparing Local and Cloud LLMs
Alexandra Pykälistö and Karl Müller-Uri present their master thesis May 26, in E:3139.
Benchmarking Large Language Models for Vulnerability Detection: Comparing Local and Cloud LLMs
Abstract: This thesis investigates the possibility of utilizing locally fine-tuned LLMs in order to discover and flag memory related security flaws in C and C++- code. Five locally fine-tuned models have been examined and compared to each other, their non-fine-tuned versions, as well as proprietary cloud models. The models were fed functions taken from C/C++-projects, and were asked to determine whether the function in question was vulnerable.
Two different prompting methods were used during the evaluation, which were zero-shot prompting and few-shot prompting. After each evaluation, performance metrics such as accuracy and F1-score were calculated. We show that while fine-tuning enhanced the performances of the local models with respect to F1-score, their ability to detect vulnerabilities remained unsatisfactory. The highest performing model, CodeLlama 7B, achieved a F1-score of only 0.12. However, as the cloud models, which are orders of magnitude larger in parameter size and with more extensive pre-training, did not outperform this, it indicates that the methods utilized in the thesis were suboptimal.
Supervisor: Christian Gehrmann
Examiner: Thomas Johansson
About the event
Location:
E:3139
Contact:
susanna [dot] lonnqvist [at] eit [dot] lth [dot] se