jun
Exjobbspresentation: LLM-Based Data Extraction and Machine Learning for CO2e Estimation of Semiconductor Components
Arvid Müller och Elias Flynn Rosenberg presenterar sitt exjobb LLM-Based Data Extraction and Machine Learning for CO2e Estimation of Semiconductor Components den 4 juni, i E:3139
This thesis investigates the application of artificial intelligence and machine learning methods to improve carbon footprint estimation for IC components in automotive cost engineering. The work addresses challenges faced by electrical cost engineers at Volvo Cars: the manual, time-consuming process of collecting component specifications, and estimating CO2e emissions from electronic components. We developed an automated data collection pipeline using large language models (LLMs) to extract structured information from manufacturer datasheets and material content sheets. Three models, Gemma 3 4B, Llama 3.1 8B, and gpt-oss 120B, were evaluated for extraction accuracy, inference time and memory usage. The gpt-oss 120B model achieved 98.5% extraction accuracy for the validation set. The data extraction pipeline was then applied to a larger set of datasheets and material content sheets and converted unstructured PDF documents into structured tabular data to be used for machine learning. We developed machine learning models to predict the masses of four hotspot metals, copper, gold, palladium, and silver, which are the primary contributors to raw-material carbon emissions. We evaluated TabPFN (a transformer-based prior-fitted network), XGBoost and CatBoost regression models. CatBoost achieved the best overall performance, with R2 values of 0.994 for gold and 0.971 for palladium. SHAP analysis revealed that total component mass is the most important feature for copper content, while pin count has the biggest effect on gold and silver predictions. We introduced a simplified hotspot scaling approach that reduces the previous model’s complexity from ten function-type-specific scaling factors to a single global factor, with minimal loss in accuracy. The model was integrated into a desktop application that enables cost engineers to retrieve component specifications, predict metal content when material data are unavailable, and compute CO2e emissions automatically. The tool reduces manual effort and provides a reproducible, consistent framework for carbon footprint estimation in component evaluation. This work demonstrates that AI and ML methods can effectively automate and improve semiconductor carbon footprint estimation, supporting Volvo Cars’ decarbonisation objectives while reducing the manual workload of cost engineering teams.
The thesis work was collaborated with Volvo Cars.
Om evenemanget
Plats:
E:3139
Kontakt:
susanna [dot] lonnqvist [at] eit [dot] lth [dot] se