Research

MARVEL: Multi-Agent RTL Vulnerability Extraction using Large Language Models

Luca Collini, Baleegh Ahmad, Joey Ah-kiow, Ramesh Karri

* Equal Contribution

Under Review

In this work, we introduce MARVEL, a multi-agent LLM framework for a unified approach to decision-making, tool use, and reasoning. MARVEL includes executor agents that leverage formal tools, linters, simulation tests, LLM-based detection schemes, and static analysis-based checks. We test our approach on a known buggy SoC based on OpenTitan from the Hack@DATE competition. We find that 20 of the 48 issues reported by MARVEL pose security vulnerabilities.

LASHED: LLMs And Static Hardware Analysis for Early Detection of RTL Bugs

Baleegh Ahmad, Hammond Pearce, Ramesh Karri, Benjamin Tan

Under Review

In this work, we combine two approaches (LLMs and Static Analysis) to overcome each other's limitations for hardware security bug detection. We find that 87.5% of instances flagged by our recommended scheme are plausible CWEs. In-context learning and asking the model to 'think again' improves LASHED's precision.

FLAG: Finding Line Anomalies (in RTL code) with Generative AI

Baleegh Ahmad, Benjamin Tan, Ramesh Karri, Hammond Pearce

ACM Transactions on Design Automation of Electronic Systems, 2025

In this work, we investigate the ability of Large Language Models to detect functional and security related bugs in C, Python and Verilog. By comparing the original code with an LLM-generated alternative, we can flag notable differences as anomalies for further inspection, with features such as distance from comments and LLM confidence also aiding this classification.

On Hardware Security Bug Code Fixes By Prompting Large Language Models

Baleegh Ahmad, Shailja Thakur, Benjamin Tan, Ramesh Karri, Hammond Pearce

IEEE Transactions on Information Forensics and Security, 2024

In this work, we investigate the ability of Large Language Models to fix security related bugs in Verilog. We also present a complete framework that identifies bugs using static analysis and suggests repairs using Large Language Models.

VeriGen: A Large Language Model for Verilog Code Generation

Shailja Thakur, Baleegh Ahmad, Zhenxing Fan, Hammond Pearce, Benjamin Tan, Ramesh Karri, Brendan Dolan-Gavitt, Siddharth Garg.

ACM Transactions on Design Automation of Electronic Systems, 2024

First appeared in Design and Test in Europe, 2023 as "Benchmarking Large Language Models for Automated Verilog RTL Code Generation"

(Nominated for Best Paper)

In this paper, we characterize the ability of LLMs to generate useful Verilog. For this, we fine-tune pre-trained LLMs on Verilog datasets collected from GitHub and Verilog textbooks. We construct an evaluation framework comprising test-benches for functional analysis and a flow to test the syntax of Verilog code generated in response to problems of varying difficulty.

Examining Zero-Shot Vulnerability Repair with Large Language Models

Hammond Pearce, Benjamin Tan, Baleegh Ahmad, Ramesh Karri, Brendan Dolan-Gavitt.

IEEE Security and Privacy, 2023

In this work, we examine the use of large language models (LLMs) for code (such as OpenAI’s Codex and AI21’s Jurassic J-1) for zero-shot vulnerability repair. We investigate challenges in the design of prompts that coax LLMs into generating repaired versions of insecure code.

Don’t CWEAT It: Toward CWE Analysis Techniques in Early Stages of Hardware Design

Baleegh Ahmad, Wei-Kai Liu, Luca Collini, Hammond Pearce, Jason M. Fung, Jonathan Valamehr, Mohammad Bidmeshki, Piotr Sapiecha, Steve Brown, Krishnendu Chakrabarty, Ramesh Karri, Benjamin Tan.

IEEE/ACM ICCAD, 2022

In this work, we investigate the practical implications and feasibility of producing a set of security-specific scanners that operate on Verilog source files. The scanners indicate parts of code that might contain one of a set of MITRE’s common weakness enumerations (CWEs).

Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions

Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, Ramesh Karri.

IEEE Security and Privacy, 2022

(Won Most Distinguished Paper Award)

In this work, we systematically investigate the prevalence and conditions that can cause GitHub Copilot to recommend insecure code. To perform this analysis we prompt Copilot to generate code in scenarios relevant to high-risk cybersecurity weaknesses, e.g. those from MITRE’s “Top 25” Common Weakness Enumeration (CWE) list. We explore Copilot’s performance on three distinct code generation axes—examining how it performs given diversity of weaknesses, diversity of prompts, and diversity of domains.

For a complete list, please visit google scholar

Page updated

Google Sites

Report abuse

Research

Luca Collini*, Baleegh Ahmad*, Joey Ah-kiow, Ramesh Karri

Under Review

Baleegh Ahmad, Hammond Pearce, Ramesh Karri, Benjamin Tan

Under Review

Baleegh Ahmad, Benjamin Tan, Ramesh Karri, Hammond Pearce

ACM Transactions on Design Automation of Electronic Systems, 2025

Baleegh Ahmad, Shailja Thakur, Benjamin Tan, Ramesh Karri, Hammond Pearce

IEEE Transactions on Information Forensics and Security, 2024

Shailja Thakur, Baleegh Ahmad, Zhenxing Fan, Hammond Pearce, Benjamin Tan, Ramesh Karri, Brendan Dolan-Gavitt, Siddharth Garg.

ACM Transactions on Design Automation of Electronic Systems, 2024

First appeared in Design and Test in Europe, 2023 as "Benchmarking Large Language Models for Automated Verilog RTL Code Generation"

Hammond Pearce, Benjamin Tan, Baleegh Ahmad, Ramesh Karri, Brendan Dolan-Gavitt.

IEEE Security and Privacy, 2023

Baleegh Ahmad, Wei-Kai Liu, Luca Collini, Hammond Pearce, Jason M. Fung, Jonathan Valamehr, Mohammad Bidmeshki, Piotr Sapiecha, Steve Brown, Krishnendu Chakrabarty, Ramesh Karri, Benjamin Tan.

IEEE/ACM ICCAD, 2022

Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, Ramesh Karri.

IEEE Security and Privacy, 2022

Luca Collini, Baleegh Ahmad, Joey Ah-kiow, Ramesh Karri