Still a WIP (Work - In - Progress)
It was early September, when I reached out to Professor Yener expressing my interest to be involved in security research. Now, a few months into this RCA research, I’m glad through happenstance I reached out.
Objective
Given a binary, is it possible to create a RCA (Root Cause Analysis) of a binary, given a PoC (Proof of Concept)
Background
First, what does RCA even mean?
RCA, also known as Root Cause Analysis, is a consise report of why a system encountered an issue. In this research, the system will be a binary, and the issue will be a crash triggered by a PoC. This PoC will be either static, in the form of a .poc file, or it’ll be dynamic, in form of a .py file. Ideally, for a closed-loop, better system,
Cursory Research
During the initial weeks, Patrick invited me to perform some cursory research on the current state of LLMs creating RCAs. At first, I tried to search for papers that were directly RCA + LLMs + binary exploitation. However, after turning empty-handed, I expanded my search to any RCA + LLM paper. Using arXiv, I found two papers that caught my eye:
- RCAgent: Cloud Root Cause Analysis by Autonomous Agents with Tool-Augmented Large Language Models
- ChatDBG: Augmenting Debugging with Large Language Models
Paper 1: RCAgent
This paper pertains to RCA and LLMs in cloud systems - specifically Alibaba cloud. This paper comes up with a RCAgent, a “tool-augmented LLM autonomous agent framework”. What makes it different than the typical ReAct agent?
To answer this question, one must first understand what ReAct agents even are.
ReAct, also known as Reasoning + Acting, is a framework of autonomous agents powered by LLMs. It is essentially supposed to be a loop where the agent:
- thinks about what to do next
- acts, by calling a tool/api/api-via-mcp server
- observes the feedback from said action
- repeats the cycle
This allows the agent the ability to solve complex tasks using step-by-step reasoning and evidence-gathering, rather than just generating the answer in a single-pass.
Knowing this, RCAgent employes a few key upgrades:
- privacy-conscious (but also, not a requirement) - the paper uses locally deployed models, such as Vicuna
- a suite of tools utilized for context management
- introduces “observation snapshot key (obsk), which is compresses lengthy observations using key-value store
- it only shows the “head” to the agent, with hash-ids for the full data
-
check out the image below to see a visual representation
- enhanced tool system
[insert things here]
- a suite of stabiling tools
- jsonregen
- error handling
- advanced aggregation methods ReAct: RCAgent: self-consistency for action trajectories

Performance metrics (RCAgent vs ReAct)

Understanding Different Performance Metrics
Understanding TSC Aggregation in Detail
Paper 2: ChatDBG
Creating A Idealized System Overview
Iteration 1: The Sanity Check

Iteration 2: Attempt At Future-Proofing

Iteration 3: The Unintentional Over-engineering

Iteration 4: Back-To-Basics

Case Analysis (HTB Cyber Apocolypse 2025)
What’s for Future

2. Testing With Local Models
3. Testing With Bigger Models via openrouter.com
High-level overview
- Professor / Master Agent
- Student / Slave Agent(s)