Apr
Master thesis presentation: A study on reasoning-enhancing fine-tuning for function-level vulnerability detection targeting JavaScript - Cloned
Simone Angenius will present his master’s thesis titled:
A study on reasoning-enhancing fine-tuning for function-level vulnerability detection targeting JavaScript
Abstract: As societies become more reliant on software, ensuring their safety and correctness becomes all the more important. With the sudden rise of LLMs in 2022, researchers have acquired a new tool for detecting vulnerabilities. Even more recently, DeepSeek introduced Group Relative Policy Optimization (GRPO), a new approach for fine-tuning models to improve their reasoning and logic for problemsolving.Due to its novelty, the number of studies in this area is still less than desired. Additionally, the majority of papers focus on C and C++, resulting in a gap for other popular languages—like JavaScript. In this thesis, we explore how an LLM’s reasoning can be trained and used to improve its ability to identify vulnerabilities in JavaScript code on function-level. To ensure reliable results, a new benchmark, JSPrimeVul, targeting JavaScript has been compiled by gathering and filtering various JavaScript datasets. The performance results are compared to previous studies, such as models without the use of reasoning and models using reasoning but only trained on parts of DeepSeek’s proposed training regimen. While improving the model’s performance scores in both recall and F1-score for the vulnerable class, the fine-tuning process turned it biased. Furthermore, the results demonstrate how the conditions for models using reasoning differ from those without reasoning. Evidently, more information than what is given by a single function is necessary for the reasoning to be helpful. Future work should study how complementing a function with context-rich structures helps an LLM’s reasoning. Additionally, the GRPO-algorithm should be studied further, and a greater focus should be given to JavaScript as it has been proven to be a viable option for LLM-based vulnerability detection.
About the event
Location:
E:3139
Contact:
christian [dot] gehrmann [at] eit [dot] lth [dot] se