Introduction

SAF (Static Analyzer Factory) is a Rust + WebAssembly framework for building program analysis tools. It provides whole-program pointer analysis, value-flow reasoning, and graph-based program representations -- all accessible through a Python SDK or directly in the browser via WebAssembly.

Who Is SAF For?

SAF is designed for three audiences:

Students learning program analysis. The interactive Playground lets you paste C code, visualize control-flow graphs, call graphs, and value-flow graphs, and step through pointer analysis -- all without installing anything.
Security researchers writing custom checkers. The Python SDK exposes a schema-driven API for authoring analyzers that detect use-after-free, double free, taint-flow violations, and other vulnerability classes. You write the checker logic; SAF handles the underlying analysis infrastructure.
AI and agent developers. The same schema-driven API that humans use is designed for LLM agents to consume. Agents can query points-to sets, traverse value-flow edges, and emit SARIF findings programmatically.

What Can You Do With SAF?

Capability	Description
Visualize program graphs	Render CFGs, ICFGs, call graphs, def-use chains, and value-flow graphs as interactive diagrams
Run pointer analysis	Andersen-style inclusion-based analysis with field sensitivity and on-the-fly call graph construction
Detect vulnerabilities	Built-in checkers for use-after-free, double free, memory leaks, null dereference, file descriptor leaks, and more
Write custom analyzers	Author new checkers in Python using the SDK, with full access to points-to and value-flow results
Embed analysis widgets	Embed interactive analysis `<iframe>` widgets into any web page for live program visualization
Export results	Output findings as SARIF, and graphs as the unified PropertyGraph JSON format

Key Differentiators

Runs in the browser. The core analysis engine compiles to WebAssembly. The playground and embeddable widgets require no server -- everything executes client-side.
Deterministic. Given identical inputs, SAF produces byte-identical outputs. All internal data structures use ordered collections, and all IDs are derived from BLAKE3 hashes. This makes results reproducible and diffable.
Frontend-agnostic. SAF operates on its own intermediate representation (AIR), not on LLVM IR directly. Frontends translate source languages into AIR; the analysis core never sees frontend-specific types.
Open source. SAF is open source and designed for extensibility. The crate architecture cleanly separates the core IR, frontends, analysis passes, and language bindings.

Quick Start

The fastest way to try SAF:

In the browser: Open the Playground and paste a C snippet. Click "Analyze" to see the CFG, call graph, and points-to results.

Locally with Docker:

git clone https://github.com/Static-Analyzer-Factory/static-analyzer-factory.git
cd static-analyzer-factory
make shell
# Inside the container:
python3 tutorials/memory-safety/02-use-after-free/detect.py

With the CLI:

saf run program.ll --format sarif --output results.sarif

Where to Go Next

Installation -- Set up SAF locally with Docker or install the Python package.
First Analysis -- Walk through analyzing a small C program end to end.
Playground Tour -- Learn how to use the browser-based playground.
Tutorials -- Step-by-step guides for detecting specific vulnerability classes.
API Reference -- Full reference for the Python SDK, CLI, and PropertyGraph format.

Architecture Overview

Input (.c / .ll / .bc / .air.json)
  -> Frontend (LLVM, AIR-JSON)
    -> AirBundle (canonical IR)
      -> Graph builders (CFG, ICFG, CallGraph, DefUse)
        -> PTA solver (Andersen)
          -> ValueFlow graph
            -> Queries (flows, taint_flow, points_to)
              -> Export (JSON, SARIF)

SAF is structured as a set of Rust crates:

Crate	Purpose
`saf-core`	AIR definitions, config, deterministic ID generation
`saf-frontends`	Frontend trait + LLVM/AIR-JSON implementations
`saf-analysis`	CFG, call graph, PTA, value-flow, checkers
`saf-cli`	Command-line interface
`saf-python`	PyO3 bindings for the Python SDK

Installation

SAF can be used in two ways: in the browser (no install needed) or locally via Docker for full analysis capabilities.

Browser (No Install)

The fastest way to try SAF is the Playground. It runs entirely in your browser using WebAssembly -- no server, no Docker, no setup.

The browser version supports:

Writing C code or LLVM IR
Visualizing CFGs, call graphs, def-use chains, and value-flow graphs
Running pointer analysis
Exporting results as JSON

For the full Python SDK, checker framework, and LLVM bitcode support, use the local installation below.

Local Installation (Docker)

The local installation uses Docker to provide a complete development environment with Rust, Python 3.12, LLVM 18 (default — also available: LLVM 22, opt-in), and all dependencies pre-installed. See LLVM Version Support for when to pick each variant.

Prerequisites

Docker installed and running
Git
A terminal

Step 1: Clone the Repository

git clone https://github.com/Static-Analyzer-Factory/static-analyzer-factory.git
cd static-analyzer-factory

Step 2: Start the Development Shell

make shell

This drops you into an interactive shell inside the Docker container. The project directory is mounted at /workspace.

On first run, the container automatically:

Creates a Python virtual environment
Installs pytest and development dependencies
Builds the SAF Python SDK with maturin develop --release

Subsequent runs skip the build if saf is already importable.

Step 3: Verify the Installation

Inside the Docker shell:

python3 -c "import saf; print('SAF version:', saf.version())"

You should see the SAF version printed without errors.

To run the full test suite:

# Exit Docker shell first (Ctrl+D), then:
make test

What's Included

The Docker environment provides:

Tool	Version	Purpose
Rust	stable	Core analysis engine
Python	3.12	SDK and tutorials
LLVM/Clang	18 (default) or 22 (opt-in)	C/C++ to LLVM IR compilation
maturin	latest	Python-Rust bridge (PyO3)

Available Make Commands

make help              # Show all available commands (both LLVM variants)
make test              # Run all tests (Rust + Python) in Docker — LLVM 18
make test-llvm22       # Same, but inside the LLVM 22 image
make lint              # Run clippy + rustfmt check in Docker — LLVM 18
make lint-llvm22       # Same, but inside the LLVM 22 image
make fmt               # Auto-format all Rust code in Docker
make shell             # Interactive dev shell (LLVM 18)
make shell-llvm22      # Interactive dev shell (LLVM 22)
make build             # Build minimal runtime Docker image (LLVM 18)
make build-llvm22      # Build minimal runtime Docker image (LLVM 22)
make clean             # Remove Docker volumes and local target/

Pick *-llvm22 variants only when you need to analyze IR emitted by clang-22 (or otherwise post-LLVM-18). The *-llvm22 targets build a separate Docker image from apt.llvm.org packages; the first run takes a few extra minutes.

Next Steps

First Analysis -- Analyze a C program end to end
Playground Tour -- Learn the browser-based playground

First Analysis

This guide walks you through analyzing a small C program from start to finish using the SAF Python SDK.

The Vulnerable Program

We will detect a command injection vulnerability (CWE-78) -- user-controlled input flowing directly to a shell command:

#include <stdlib.h>

int main(int argc, char *argv[]) {
    if (argc < 2) return 1;

    // SOURCE: argv[1] is user-controlled
    char *user_cmd = argv[1];

    // SINK: system() executes the string as a shell command
    return system(user_cmd);
}

If an attacker runs ./program "rm -rf /", the program executes that command.

The Analysis Pipeline

vulnerable.c --> clang-18 --> LLVM IR (.ll) --> SAF --> Findings
                 (compile)    (.ll file)        (analyze)

SAF works on LLVM IR, not source code directly. The pipeline compiles C to LLVM IR first, then runs analysis on the IR.

Step 1: Write the Detection Script

Create a file called detect.py:

import subprocess
from pathlib import Path
from saf import Project, sources, sinks

def main() -> None:
    # 1. Compile C to LLVM IR
    subprocess.run(
        ["clang-18", "-S", "-emit-llvm", "-O0", "-g",
         "-o", "vulnerable.ll", "vulnerable.c"],
        check=True,
    )
    print("Compiled: vulnerable.c -> vulnerable.ll")

    # 2. Load the LLVM IR into SAF
    proj = Project.open("vulnerable.ll")
    print("Project loaded")

    # 3. Create a query context
    q = proj.query()

    # 4. Run taint flow analysis
    findings = q.taint_flow(
        sources=sources.function_param("main", 1),  # argv
        sinks=sinks.call("system", arg_index=0),     # system()'s argument
    )

    # 5. Print results
    print(f"\nFound {len(findings)} taint flow(s):")
    for i, f in enumerate(findings):
        print(f"  [{i}] finding_id={f.finding_id}")
        if f.trace:
            print(f"       trace steps: {len(f.trace.steps)}")

if __name__ == "__main__":
    main()

Step 2: Run Inside Docker

make shell
python3 detect.py

Expected output:

Compiled: vulnerable.c -> vulnerable.ll
Project loaded

Found 1 taint flow(s):
  [0] finding_id=0x...
       trace steps: 3

SAF found one taint flow: data from argv[1] reaches system().

Understanding the Key Concepts

Sources and Sinks

Source: Where untrusted data enters. sources.function_param("main", 1) selects the second parameter of main (which is argv).
Sink: Where untrusted data is dangerous. sinks.call("system", arg_index=0) selects the first argument to any call to system().

Taint Flow

q.taint_flow(sources, sinks) performs a breadth-first search over the ValueFlow graph, looking for paths from any source to any sink. Each path found is reported as a Finding with:

finding_id -- a deterministic identifier
trace -- the step-by-step data flow path

Project.open()

Project.open("vulnerable.ll") does several things internally:

Detects the .ll extension and selects the LLVM frontend
Parses the LLVM IR into SAF's Analysis IR (AIR)
Builds the call graph, def-use chains, and points-to analysis
Constructs the ValueFlow graph

Step 3: Explore the Graphs

You can also inspect the program's structure:

# Export graphs
graphs = proj.graphs()

# Call graph
cg = graphs.export("callgraph")
print(f"Functions: {len(cg['nodes'])}")
print(f"Call edges: {len(cg['edges'])}")

# ValueFlow graph
vf = graphs.export("valueflow")
print(f"VF nodes: {len(vf['nodes'])}, edges: {len(vf['edges'])}")

Step 4: Use the Checker Framework

Instead of manually specifying sources and sinks, use the built-in checkers for common vulnerability patterns:

import saf

proj = saf.Project.open("vulnerable.ll")

# Run all built-in checkers (memory-leak, use-after-free, double-free, etc.)
all_findings = proj.check_all()

for f in all_findings:
    print(f"[{f.severity}] {f.checker}: {f.message}")

# Or run a specific checker
leak_findings = proj.check("memory-leak")

Next Steps

Playground Tour -- Try the same analysis in the browser
Browser vs Full SAF -- Understand the differences
Tutorials -- Detect UAF, double-free, leaks, and more

Playground Tour

The SAF Playground is a browser-based interactive environment for writing code, running analysis, and visualizing program graphs. Everything runs client-side via WebAssembly -- no server needed.

Opening the Playground

Navigate to the Playground in your browser. You will see three main areas:

Code Editor (left) -- Write C code or LLVM IR
Graph Viewer (right) -- Interactive visualization of analysis results
Toolbar (top) -- Controls for compilation, analysis, and graph selection

Writing Code

The editor supports two input modes:

C/C++: Write standard C code. The playground compiles it to LLVM IR using the Compiler Explorer API, then analyzes the resulting IR.
LLVM IR: Write .ll format directly for full control over the IR structure.

Example: Command Injection

Paste this into the editor:

#include <stdlib.h>

int main(int argc, char *argv[]) {
    if (argc < 2) return 1;
    char *cmd = argv[1];
    return system(cmd);
}

Running Analysis

Click Analyze to start the pipeline:

C code is compiled to LLVM IR via Compiler Explorer
The IR is parsed by tree-sitter-llvm (in WASM)
The parsed IR is converted to AIR JSON
SAF's WASM module runs the full analysis pipeline
Results are displayed as interactive graphs

Viewing Graphs

Use the graph selector tabs to switch between:

Graph	What It Shows
CFG	Control flow within each function -- basic blocks and branches
Call Graph	Which functions call which -- the program's call hierarchy
Def-Use	Where values are defined and where they are used
Value Flow	How data flows through the program, including across functions
Points-To	What each pointer may point to after analysis

Interacting with Graphs

Pan: Click and drag the background
Zoom: Scroll wheel or pinch gesture
Select: Click a node to highlight it and its connections
Details: Selected nodes show their properties in a detail panel

Built-In Examples

The playground includes pre-loaded examples demonstrating different analysis scenarios:

Command Injection -- Taint flow from argv to system()
Use-After-Free -- Pointer used after free()
Memory Leak -- Allocation without deallocation
Double Free -- Same pointer freed twice
Struct Field Access -- Field-sensitive pointer analysis

Select an example from the dropdown to load it into the editor.

Limitations

The browser playground uses a simplified analysis pipeline compared to the full Docker-based SAF:

Feature	Browser	Full SAF
C/C++ compilation	Compiler Explorer API	Local clang-18
LLVM bitcode (`.bc`)	Not supported	Supported
Python SDK	Not available	Full access
Checker framework	Not available	9 built-in checkers
Z3 path refinement	Not available	Supported
SARIF export	Not available	Supported
Graph visualization	Built-in	Export to external tools

See Browser vs Full SAF for a detailed comparison.

Next Steps

Browser vs Full SAF -- Understand what each mode offers
First Analysis -- Set up the full local environment
Concepts -- Learn about the underlying analysis

Browser vs Full SAF

SAF is available in two forms: the browser-based playground (WebAssembly) and the full local installation (Docker). This page compares the two.

Feature Comparison

Feature	Browser (WASM)	Full SAF (Docker)
Setup	None -- open a URL	Docker + `make shell`
LLVM frontend	No (tree-sitter)	Yes (inkwell, full LLVM API)
Z3 solver	No	Yes
Context-sensitive PTA	Limited	Yes
Large programs (>1000 LOC)	Slow	Fast
Python SDK	Pyodide (limited)	Full
Custom specs	Yes	Yes
Checker framework	9 built-in checkers (via query protocol)	9 built-in checkers + custom checkers
SARIF export	Not available	Standards-compliant output via CLI
Graph visualization	Built-in Cytoscape.js	Export JSON for external tools
Batch scanning	Single file	Multi-file projects
Offline use	Requires Compiler Explorer API for C	Fully offline

When to Use the Browser

The browser playground is ideal for:

Learning: Experiment with program analysis concepts interactively
Quick checks: Paste a snippet and see the CFG or call graph immediately
Demonstrations: Show analysis capabilities without requiring setup
Embedding: Include analysis widgets in documentation or blog posts

When to Use Full SAF

The full installation is needed for:

Production analysis: Scanning real codebases for vulnerabilities
Custom checkers: Writing Python scripts that use the SDK
CI/CD integration: Automated scanning with SARIF output
Advanced analysis: Z3-based path refinement, IFDS taint tracking
Bitcode input: Analyzing pre-compiled .bc files
Batch scanning: Processing multiple files or entire projects

Architecture Differences

Browser

C code --> Compiler Explorer API --> LLVM IR (.ll text)
  --> tree-sitter-llvm (WASM) --> CST
    --> TypeScript converter --> AIR JSON
      --> saf-wasm (WASM) --> PropertyGraph JSON
        --> Cytoscape.js visualization

The browser path uses tree-sitter to parse LLVM IR text and a TypeScript layer to convert the parse tree into AIR JSON. The saf-wasm crate contains saf-core and saf-analysis compiled to WebAssembly, without LLVM, Z3, or threading dependencies.

Full SAF

C code --> clang-18 --> LLVM IR (.ll or .bc)
  --> inkwell (LLVM C API) --> AIR
    --> saf-analysis (native) --> Analysis results
      --> Python SDK / CLI / JSON / SARIF

The full path uses inkwell to parse LLVM IR/bitcode natively with full access to LLVM's type system and metadata. Analysis runs in native Rust with threading support.

Migrating from Browser to Full SAF

If you start with the playground and want to move to the full SDK:

Install Docker and clone the repository (see Installation)
Save your C code to a .c file
Write a Python detection script using the same analysis concepts
Run inside make shell

The analysis concepts (sources, sinks, graph types) are the same in both environments. The full SDK simply provides more control and additional features.

LLVM Version Support

SAF ships two Docker image variants, one per supported LLVM version:

Tag	LLVM	Ubuntu base	Status	Use when
`saf:llvm18`	18.1	24.04 LTS	stable	default — analyze IR emitted by clang-18 (or older)
`saf:latest`	18.1	24.04 LTS	stable	alias for `saf:llvm18`
`saf:llvm22`	22.1	24.04 LTS	opt-in	analyze IR emitted by clang-22 / modern C++26 sources

Each image contains exactly one LLVM toolchain. Images do not share caches or layers; pick the tag that matches the clang you'll compile your sources with.

Which tag should I pick?

Pick the LLVM whose major version matches the clang you'll compile with.

LLVM bitcode is backward-compatible but not forward-compatible — an LLVM-18 library cannot parse IR produced by clang-22. If your source requires clang-22 (e.g., C++26 reflection, new attributes, recent ptrauth intrinsics), use the saf:llvm22 image so the analyzer and the compiler agree on the IR format.

For everyday analysis of mainstream C / C++ code, saf:llvm18 is the recommended default. It's the version our benchmark corpora (PTABen, NIST Juliet, SV-COMP, CruxBC) are compiled with, and it's what every stable SAF release is validated against.

Switching between tags

Note: The prebuilt saf:llvm18 / saf:llvm22 images are not published yet — the GitHub Actions release workflow is still disabled (.github/workflows/ci.yml.disabled). Until it ships, use the build-from-source path below. The docker pull examples in this section show the intended UX once the registry is live.

# Pull and run the LLVM 18 image (default) — once published
docker pull ghcr.io/thepatrickstar/saf:llvm18
docker run --rm -v "$PWD":/work ghcr.io/thepatrickstar/saf:llvm18 analyze /work/program.ll

# Pull and run the LLVM 22 image — once published
docker pull ghcr.io/thepatrickstar/saf:llvm22
docker run --rm -v "$PWD":/work ghcr.io/thepatrickstar/saf:llvm22 analyze /work/program.ll

From a local checkout (this is the current path):

make shell           # dev shell backed by LLVM 18
make shell-llvm22    # dev shell backed by LLVM 22
make test            # tests against LLVM 18
make test-llvm22     # tests against LLVM 22
make build           # build runtime image (LLVM 18)
make build-llvm22    # build runtime image (LLVM 22)

Support policy

SAF actively supports two LLVM versions at a time: the current stable default and one opt-in newer version.
Every release is tested against both tags in CI.
Older LLVM versions (16, 17, 19, 20, 21) are not actively supported. Adding support is mostly mechanical (see crates/saf-frontends/src/llvm/), but we don't carry the CI or maintenance burden by default.
The oldest supported LLVM is advanced roughly once a year, in sync with Ubuntu LTS release cadence and user demand.

Forward-incompatibility caveats

When the saf:llvm18 analyzer is handed IR emitted by clang-19 or newer, it will fail at parse time with a message like:

LLVM parse error: Invalid record at offset N

or warnings about unrecognized attributes / intrinsics. These are not SAF bugs — they indicate the IR uses features introduced after LLVM 18. Either switch to the saf:llvm22 image or have your build emit bitcode targeting an older LLVM (clang -emit-llvm -Xclang -target-version=18 and similar workarounds).

Adding support for a new LLVM version

Adding an llvm-N feature flag is a roughly three-step process:

Add a cargo feature to crates/saf-frontends/Cargo.toml pointing to the matching inkwell/llvmN-M feature.
Create crates/saf-frontends/src/llvm/llvmN.rs with a one-line impl_llvm_adapter!(LlvmNAdapter, "N.M"); invocation, and wire it into mod.rs / adapter.rs beside the existing versions.
Add a dev-llvmN / test-llvmN service to docker-compose.yml and a Make convenience target.

Audit crates/saf-frontends/src/llvm/{mapping, intrinsics, debug_info}.rs for any IR constructs that have changed meaning across the gap — new attributes, intrinsics, or the LLVM 19 debug-record format transition are typical sources of work.

Analysis IR (AIR)

AIR (Analysis Intermediate Representation) is SAF's canonical, frontend-agnostic intermediate representation. All analysis passes operate on AIR, never on frontend-specific types like LLVM IR or Clang ASTs.

Why a Separate IR?

SAF supports multiple frontends (LLVM bitcode, AIR JSON, and potentially source-level frontends in the future). Rather than coupling analysis algorithms to any specific input format, SAF defines AIR as a common target:

LLVM bitcode (.bc/.ll)  -->  LLVM Frontend  -->  AIR
AIR JSON (.air.json)    -->  JSON Frontend  -->  AIR
(future: Clang AST)     -->  AST Frontend   -->  AIR
                                                   |
                                                   v
                                              Analysis passes
                                         (CFG, PTA, ValueFlow, ...)

This design means adding a new frontend requires only implementing the mapping to AIR -- no changes to analysis algorithms.

Structure

An AIR module contains:

Entity	Description
Module	Top-level container with a fingerprint and metadata
Functions	Named functions with parameters, return types, and basic blocks
Basic Blocks	Sequences of instructions with a terminator
Instructions	Individual operations (alloc, load, store, call, etc.)
Values	SSA registers, function parameters, constants, and globals
Objects	Memory objects (stack allocas, heap allocations, globals)

Operations

AIR supports the following operation types:

Category	Operations
Allocation	`Alloca` (stack), `Global`, `HeapAlloc` (malloc/calloc)
Memory	`Load`, `Store`, `GEP` (field/element access), `Memcpy`, `Memset`
Control	`Br` (branch), `Switch`, `Ret` (return)
SSA	`Phi`, `Select`
Calls	`CallDirect`, `CallIndirect`
Transforms	`Cast`, `BinaryOp` (arithmetic, bitwise)

Deterministic IDs

Every AIR entity has a deterministic u128 ID derived from BLAKE3 hashes. IDs are serialized as 0x followed by 32 lowercase hex characters (e.g., 0x1a2b3c4d5e6f...).

The ID derivation hierarchy:

ModuleFingerprint = hash(FrontendId, input_fingerprint_bytes)
  FunctionId = hash(ModuleFingerprint, "fn", function_key)
    BlockId = hash(FunctionId, "bb", block_index)
      InstId = hash(BlockId, "inst", inst_index, opcode_tag)
        ValueId = derived from instruction results, args, globals, constants
    ObjId = derived from allocas, heap allocators, globals
      LocId = hash(ObjId, "loc", field_path)

This means identical inputs always produce identical IDs, regardless of when or where the analysis runs. Debug information does not affect structural IDs by default.

Source Metadata

AIR instructions can carry optional source-level metadata:

Span: File, line, column, byte offsets for source location
Symbol: Display name, mangled name, namespace path
Type representation: Frontend-specific type string

This metadata enables source-level error reporting without coupling the analysis to any particular frontend.

Next Steps

Control Flow Graphs -- How AIR is organized into CFGs
Points-To Analysis -- How AIR objects are analyzed for aliasing

Control Flow Graphs

A control flow graph (CFG) represents the possible execution paths within a function. An interprocedural control flow graph (ICFG) extends this to the whole program by connecting function calls to their targets.

CFG Basics

A CFG is a directed graph where:

Nodes are basic blocks -- straight-line sequences of instructions with no branches except at the end
Edges represent possible control flow between blocks

Key Concepts

Concept	Description
Entry block	Where function execution begins (no predecessors)
Exit block	Ends with a return instruction (no successors)
Branch block	Has multiple successors (if/while/switch)
Merge block	Has multiple predecessors (join point after a branch)
Back-edge	Edge to an earlier block, indicating a loop

Example

int abs(int x) {
    if (x < 0)       // Block 0: entry, branch
        x = -x;      // Block 1: negate
    return x;         // Block 2: merge, exit
}

The CFG for this function has three blocks:

[Block 0: entry] --true--> [Block 1: negate]
       |                         |
       +---false--> [Block 2: return] <--+

Block 2 is a merge point (two predecessors) and an exit point (return).

ICFG

The ICFG connects CFGs across functions by adding call and return edges:

Call edge: From a call instruction to the callee's entry block
Return edge: From the callee's return to the instruction after the call

main:
  [Block 0] --call--> validate: [entry]
                                  ...
                      validate: [exit] --return--> [Block 1]

The ICFG enables interprocedural analysis -- tracking data flow across function boundaries.

SAF's PropertyGraph Format

SAF exports CFGs in the unified PropertyGraph JSON format:

{
  "schema_version": "0.1.0",
  "graph_type": "cfg",
  "nodes": [
    {
      "id": "0x...",
      "labels": ["Block", "Entry"],
      "properties": {
        "name": "entry",
        "function": "main"
      }
    }
  ],
  "edges": [
    {
      "src": "0x...",
      "dst": "0x...",
      "edge_type": "FLOWS_TO",
      "properties": {}
    }
  ]
}

Nodes have labels including "Block" and optionally "Entry" for entry blocks
The properties.function field groups blocks by function
Edges have edge_type: "FLOWS_TO"

Exporting with the Python SDK

from saf import Project

proj = Project.open("program.ll")
graphs = proj.graphs()
cfg = graphs.export("cfg")

# Group blocks by function
from collections import defaultdict
by_function = defaultdict(list)
for node in cfg["nodes"]:
    fn = node["properties"]["function"]
    by_function[fn].append(node)

for fn, blocks in by_function.items():
    entry = [b for b in blocks if "Entry" in b["labels"]]
    print(f"{fn}: {len(blocks)} blocks, entry={entry[0]['id']}")

Why CFGs Matter

CFGs are the foundation for:

Reachability analysis: Is a particular code path possible?
Loop detection: Back-edges in the CFG reveal loops
Dominance: Which blocks must execute before others?
Dead code: Blocks with no path from entry are unreachable
Taint propagation: Data flows follow control flow paths

Next Steps

Call Graphs -- Function-level structure
Value Flow -- Data flow built on top of CFGs

Call Graphs

A call graph is a directed graph representing the calling relationships between functions in a program. It answers the question: "which functions can call which other functions?"

Structure

Nodes represent functions
Edges represent call sites (function A calls function B)

Key Concepts

Concept	Description
Entry point	Function with no callers (e.g., `main`)
Leaf function	Function that calls no other user-defined functions
Fan-out	Number of distinct functions a function calls
Fan-in	Number of distinct callers a function has
Strongly connected component	Group of functions that (transitively) call each other (recursion)

Example

main
  |-- log_error          (shared utility)
  \-- parse_request
        |-- validate_input
        |     \-- log_error
        |-- process_data
        \-- send_response

Here, main is the entry point, and log_error has fan-in of 2 (called by both main and validate_input). process_data, send_response, and log_error are leaf functions.

Direct vs Indirect Calls

SAF distinguishes between:

Direct calls: The target function is known statically (e.g., foo())
Indirect calls: The target is a function pointer (e.g., fptr())

For indirect calls, SAF uses points-to analysis to resolve the possible targets. The call graph is refined as PTA discovers which functions a pointer may refer to.

SAF's PropertyGraph Format

{
  "schema_version": "0.1.0",
  "graph_type": "callgraph",
  "nodes": [
    {
      "id": "0x...",
      "labels": ["Function"],
      "properties": {
        "name": "main",
        "kind": "defined"
      }
    }
  ],
  "edges": [
    {
      "src": "0x...",
      "dst": "0x...",
      "edge_type": "CALLS",
      "properties": {}
    }
  ]
}

Nodes have labels: ["Function"] and properties.name for the function name
The properties.kind field indicates "defined" (has a body) or "external"
Edges have edge_type: "CALLS"

Exporting with the Python SDK

from saf import Project

proj = Project.open("program.ll")
graphs = proj.graphs()
cg = graphs.export("callgraph")

nodes = cg["nodes"]
edges = cg["edges"]

# Build adjacency
callees = {}
callers = {}
for node in nodes:
    nid = node["id"]
    name = node["properties"]["name"]
    callees[nid] = []
    callers[nid] = []

for edge in edges:
    callees[edge["src"]].append(edge["dst"])
    callers[edge["dst"]].append(edge["src"])

# Find entry points and leaves
for node in nodes:
    nid = node["id"]
    name = node["properties"]["name"]
    if not callers[nid]:
        print(f"Entry point: {name}")
    if not callees[nid]:
        print(f"Leaf: {name}")

Why Call Graphs Matter

Call graphs are essential for:

Attack surface analysis: Entry points and functions that process external input
Dead code detection: Functions with no callers may be unused
Interprocedural analysis: Taint and value flow follow call edges
Modular analysis: Analyze one function at a time, using summaries for callees
Dependency analysis: Understanding how changes propagate through a codebase

Next Steps

Points-To Analysis -- Resolving indirect call targets
Value Flow -- Data flow across function calls

Points-To Analysis

Points-to analysis (PTA) determines what memory locations each pointer in a program may refer to. This is fundamental to understanding aliasing, resolving indirect calls, and building accurate data flow graphs.

The Problem

Consider:

int x = 10, y = 20;
int *p = &x;
int *q = p;     // q also points to x
*q = 30;        // modifies x through q

Without PTA, an analysis cannot determine that modifying *q affects x. Points-to analysis computes that both p and q point to the same object x, enabling the analysis to track data flow through memory.

Andersen's Analysis

SAF implements Andersen-style inclusion-based points-to analysis. This is a whole-program, flow-insensitive, context-insensitive analysis that computes a conservative approximation of the points-to sets.

Constraint Types

PTA works by extracting constraints from the program and solving them:

Constraint	Meaning	Example
Addr	`p` points to object `o`	`p = &x`
Copy	`p` points to everything `q` points to	`p = q`
Load	`p` points to what `*q` points to	`p = *q`
Store	`*p` points to what `q` points to	`*p = q`
GEP	`p` points to a field of what `q` points to	`p = &q->field`

Solving

The solver iterates until a fixed point:

Initialize points-to sets from Addr constraints
Process Copy constraints: propagate points-to sets
Process Load/Store constraints: follow indirections
Repeat until no points-to set changes

Field Sensitivity

SAF supports field-sensitive analysis, distinguishing different fields of a struct:

struct Pair { int *a; int *b; };
struct Pair s;
s.a = &x;   // s.a -> x
s.b = &y;   // s.b -> y (distinct from s.a)

Without field sensitivity, s.a and s.b would be conflated, causing false aliasing.

Querying Points-To Results

Python SDK

from saf import Project

proj = Project.open("program.ll")
q = proj.query()

# What does pointer p point to? (pass the hex ID of the pointer value)
pts = q.points_to("0x00000000000000000000000000000001")
print(f"p points to: {pts}")

# Do p and q alias? (pass hex IDs of both pointer values)
alias = q.may_alias("0x00000000000000000000000000000001",
                     "0x00000000000000000000000000000002")
print(f"p and q may alias: {alias}")

Export Format

pta = proj.pta_result().export()
# Returns: {"points_to": [{"value": "0x...", "locations": ["0x...", ...]}, ...]}

Impact on Other Analyses

PTA results feed into multiple downstream analyses:

Analysis	How PTA Helps
Call graph	Resolves indirect call targets (function pointers)
ValueFlow	Determines which stores affect which loads
Taint analysis	Tracks taint through pointer indirections
Alias analysis	Identifies when two pointers may refer to the same memory

Without PTA, these analyses must conservatively assume that any pointer could point to any object, leading to excessive false positives.

Configuration

PTA behavior can be tuned via configuration:

{
  "pta": {
    "enabled": true,
    "field_sensitivity": "struct_fields",
    "max_objects": 100000
  }
}

field_sensitivity: "none" or "struct_fields"
max_objects: Safety cap; when exceeded, analysis collapses to a conservative approximation using unknown_mem

Next Steps

Value Flow -- How PTA enables precise data flow tracking
Taint Analysis -- Using PTA-aware taint propagation

Value Flow

The ValueFlow graph is SAF's central data flow representation. It tracks how values move through a program -- from where they are created to where they are used -- across function boundaries and through memory operations.

From Def-Use to ValueFlow

Def-Use Chains

A def-use chain connects a value's definition to its uses within a single function:

int x = 10;        // definition of x
printf("%d", x);   // use of x
return x;           // another use of x

Def-use chains are intraprocedural (within one function) and track SSA values.

ValueFlow Graph

The ValueFlow graph extends def-use chains with:

Interprocedural edges: Data flowing across function calls (arguments and return values)
Memory modeling: Data flowing through stores and loads, resolved by PTA
Transform edges: Data modified by arithmetic, casts, or other operations

Edge Types

Edge Type	Meaning	Example
DefUse	SSA def-use chain (direct assignment)	`y = x`
Store	Value written to memory	`*p = x`
Load	Value read from memory	`y = *p`
CallArg	Value passed as function argument	`foo(x)`
Return	Value returned from function	`return x`
Transform	Value modified by an operation	`y = x + 1`

Node Types

Node Kind	Meaning
Value	An SSA register value
Location	A memory location (from pointer analysis)
UnknownMem	Unknown or external memory

Example

char *buf = malloc(64);    // Value: malloc return
strcpy(buf, "hello");      // CallArg: buf -> strcpy arg 0
log_message(buf);           // CallArg: buf -> log_message arg 0
free(buf);                  // CallArg: buf -> free arg 0

The ValueFlow graph captures all of these flows:

malloc() return
    |
    +--[CallArg]--> strcpy arg 0
    +--[CallArg]--> log_message arg 0
    +--[CallArg]--> free arg 0

PropertyGraph Format

{
  "schema_version": "0.1.0",
  "graph_type": "valueflow",
  "metadata": {
    "node_count": 42,
    "edge_count": 58
  },
  "nodes": [
    {
      "id": "0x...",
      "labels": ["Value"],
      "properties": { "kind": "Value" }
    }
  ],
  "edges": [
    {
      "src": "0x...",
      "dst": "0x...",
      "edge_type": "DEFUSE",
      "properties": {}
    }
  ]
}

Exporting with the Python SDK

from saf import Project

proj = Project.open("program.ll")
graphs = proj.graphs()

# Def-use chains
defuse = graphs.export("defuse")
definitions = [e for e in defuse["edges"] if e["edge_type"] == "DEFINES"]
uses = [e for e in defuse["edges"] if e["edge_type"] == "USED_BY"]
print(f"Definitions: {len(definitions)}, Uses: {len(uses)}")

# Full ValueFlow graph
vf = graphs.export("valueflow")
print(f"Nodes: {len(vf['nodes'])}, Edges: {len(vf['edges'])}")

# Count edge types
from collections import Counter
edge_types = Counter(e["edge_type"] for e in vf["edges"])
for kind, count in edge_types.most_common():
    print(f"  {kind}: {count}")

How ValueFlow Enables Analysis

The ValueFlow graph is the foundation for SAF's query capabilities:

Analysis	How It Uses ValueFlow
Taint flow	BFS from source nodes to sink nodes
Memory leak	Check if allocation nodes reach exit without passing through free
Use-after-free	Check if freed pointer reaches a dereference
Double free	Check if freed pointer reaches another free

Next Steps

Taint Analysis -- Querying the ValueFlow graph for vulnerabilities
Points-To Analysis -- How PTA makes ValueFlow precise

Taint Analysis

Taint analysis tracks the flow of untrusted data through a program to determine if it can reach security-sensitive operations. It is SAF's primary technique for detecting injection vulnerabilities, information leaks, and other data-flow security issues.

Core Concepts

Sources, Sinks, and Sanitizers

Concept	Definition	Examples
Source	Where untrusted ("tainted") data enters the program	`argv`, `getenv()`, `read()`, `fgets()`
Sink	A dangerous function that should never receive tainted data	`system()`, `execve()`, `printf()` format arg
Sanitizer	A function that validates or cleans data, removing the taint	Input validation, bounds checking, escaping

The Question

Taint analysis answers: "Can data from a source reach a sink without passing through a sanitizer?"

If the answer is yes, a finding is reported -- a potential vulnerability.

How SAF Performs Taint Analysis

SAF implements taint analysis as a graph reachability query over the ValueFlow graph:

Identify source nodes in the ValueFlow graph (e.g., argv parameter, getenv() return value)
Identify sink nodes (e.g., system() argument)
Identify sanitizer nodes (optional -- nodes that "clean" the taint)
BFS traversal from sources to sinks, skipping paths through sanitizers
Report findings with deterministic trace paths

BFS vs IFDS

SAF provides two taint analysis modes:

Mode	Method	Precision	Speed
BFS	`q.taint_flow()`	Flow-insensitive	Fast
IFDS	`proj.ifds_taint()`	Context-sensitive, flow-sensitive	Slower

BFS is sufficient for most vulnerability detection. IFDS provides higher precision when false positives from flow-insensitive analysis are a concern.

Using the Python SDK

Basic Taint Flow

from saf import Project, sources, sinks

proj = Project.open("program.ll")
q = proj.query()

# Find flows from argv to system()
findings = q.taint_flow(
    sources=sources.function_param("main", 1),  # argv
    sinks=sinks.call("system", arg_index=0),     # system()'s first arg
)

for f in findings:
    print(f"Finding: {f.finding_id}")
    if f.trace:
        for step in f.trace.steps:
            print(f"  -> {step}")

Available Selectors

Source selectors:

Selector	Description
`sources.function_param(name, index)`	Function parameter by name and position
`sources.function_return(name)`	Return value of a named function
`sources.call(name)`	Return value from calls to a named function

Sink selectors:

Selector	Description
`sinks.call(name, arg_index=N)`	Argument N of calls to a named function
`sinks.arg_to(name, index)`	Argument at index passed to a named function

With Sanitizers

from saf import sources, sinks

findings = q.taint_flow(
    sources=sources.function_param("main", 1),
    sinks=sinks.call("system", arg_index=0),
    sanitizers=sources.function_return("validate_input"),  # Paths through this function are safe
)

Common Vulnerability Patterns

Vulnerability	CWE	Source	Sink
Command injection	CWE-78	`argv`, `getenv()`	`system()`, `execve()`
Format string	CWE-134	User input	`printf()` format arg
SQL injection	CWE-89	HTTP parameters	SQL query functions
Path traversal	CWE-22	User input	`fopen()`, `open()`
Buffer overflow	CWE-120	`malloc()` return	Unchecked memory write

Checker Framework

For common patterns, SAF provides built-in checkers that pre-configure the appropriate sources, sinks, and modes:

# Instead of manually specifying sources and sinks:
findings = proj.check("memory-leak")
findings = proj.check("use-after-free")
findings = proj.check("double-free")

# Or run all 9 built-in checkers at once
all_findings = proj.check_all()

The checker framework supports 9 built-in checkers covering memory safety, information flow, and resource management. See the Python SDK reference for the full list.

Next Steps

Tutorials -- Hands-on guides for UAF, leaks, double-free, and taint analysis
Python SDK Reference -- Full API reference for selectors, checkers, and queries

Python SDK

The SAF Python SDK (import saf) provides full access to SAF's static analysis capabilities from Python. It is built with PyO3 and installed via maturin.

Installation

The SDK is built automatically when entering the Docker environment:

make shell
# SDK is available immediately:
python3 -c "import saf; print(saf.version())"

For manual installation inside the dev container:

maturin develop --release

Core API

Project

The Project class is the main entry point for all analysis operations.

from saf import Project

# Open a project from LLVM IR
proj = Project.open("program.ll")

# Open from AIR-JSON
proj = Project.open("program.air.json")

# Open with analysis tuning parameters
proj = Project.open(
    "program.ll",
    vf_mode="precise",                # "fast" (default) or "precise"
    pta_solver="worklist",             # "worklist" (default) or "datalog"
    pta_max_iterations=20000,          # default: 10000
    field_sensitivity_depth=3,         # default: 2 (0 = disabled)
    max_refinement_iterations=5,       # default: 10
)

Project.open() signature:

Project.open(
    path: str,
    *,
    vf_mode: str = "fast",
    pta_solver: str = "worklist",
    pta_max_iterations: int | None = None,
    field_sensitivity_depth: int | None = None,
    max_refinement_iterations: int | None = None,
) -> Project

Parameter	Description
`path`	Path to input file (`.air.json`, `.ll`, or `.bc`). Frontend is selected automatically by extension.
`vf_mode`	`"fast"` routes all memory through a single unknown node for robust taint analysis. `"precise"` uses points-to analysis to resolve memory locations (may miss flows through unresolved pointers).
`pta_solver`	`"worklist"` uses the imperative worklist-based solver. `"datalog"` uses the Ascent Datalog fixpoint solver.
`pta_max_iterations`	Maximum PTA solver iterations. Default: 10000.
`field_sensitivity_depth`	Field sensitivity depth. 0 = disabled, default: 2. Higher values track deeper nested struct fields.
`max_refinement_iterations`	Maximum CG refinement iterations. Default: 10.

Raises: FrontendError if the input file cannot be parsed or the required frontend is not available.

Schema Discovery

schema = proj.schema()
# Returns a dict with structured information about:
# - tool_version, schema_version
# - frontends (air-json, llvm with extensions and descriptions)
# - graphs (cfg, callgraph, defuse, valueflow)
# - queries (taint_flow, flows, points_to, may_alias with parameters)
# - selectors (sources, sinks, sanitizers)

Query Context

from saf import sources, sinks, sanitizers

q = proj.query()

# Taint flow analysis
findings = q.taint_flow(
    sources=sources.function_param("main", 1),
    sinks=sinks.call("system", arg_index=0),
)

# With sanitizers (accepts a Selector or SelectorSet, not strings)
findings = q.taint_flow(
    sources=sources.function_param("main", 1),
    sinks=sinks.call("system", arg_index=0),
    sanitizers=sanitizers.call("validate_input"),
    limit=500,  # default: 1000
)

# Data flows (without sanitizer filtering)
findings = q.flows(
    sources=sources.function_param("read_data"),
    sinks=sinks.arg_to("write_output", 0),
    limit=100,
)

# Points-to query (takes hex value ID string)
pts = q.points_to("0x00000000000000000000000000000001")

# Alias query (takes hex value ID strings)
alias = q.may_alias("0x00000001", "0x00000002")

Query method signatures:

Method	Parameters	Returns
`taint_flow(sources, sinks, sanitizers=None, *, limit=1000)`	`Selector`/`SelectorSet` for each	`list[Finding]`
`flows(sources, sinks, *, limit=1000)`	`Selector`/`SelectorSet` for each	`list[Finding]`
`points_to(ptr)`	Hex string value ID	`list[str]` (location IDs)
`may_alias(p, q)`	Hex string value IDs	`bool`

Graph Export

graphs = proj.graphs()

# List available graph types
print(graphs.available())  # ["cfg", "callgraph", "defuse", "valueflow"]

# Export to PropertyGraph dict
cfg = graphs.export("cfg")
cfg_main = graphs.export("cfg", function="main")  # single function
cg = graphs.export("callgraph")
du = graphs.export("defuse")
vf = graphs.export("valueflow")

# Export to Graphviz DOT string
dot_str = graphs.to_dot("callgraph")

# Export to interactive HTML (Cytoscape.js)
html_str = graphs.to_html("cfg", function="main")

All export() calls return a unified PropertyGraph dict:

{
    "schema_version": "0.1.0",
    "graph_type": "callgraph",
    "metadata": {},
    "nodes": [{"id": "0x...", "labels": [...], "properties": {...}}, ...],
    "edges": [{"src": "0x...", "dst": "0x...", "edge_type": "...", "properties": {}}, ...],
}

Source and Sink Selectors

Selectors identify values in the program for taint analysis. They can be combined using the | operator to form a SelectorSet.

Sources (`saf.sources`)

Function	Description
`function_param(function, index=None)`	Select function parameters by name pattern (glob-style). `index` is 0-based; `None` selects all parameters.
`function_return(function)`	Select function return values by name pattern.
`call(callee)`	Select return values of calls to a function.
`argv()`	Select command-line arguments (shortcut for `function_param("main", None)`).
`getenv(name=None)`	Select environment variable reads (shortcut for `call("getenv")`).

from saf import sources

src = sources.function_param("main", 1)
src = sources.function_param("read_*")      # glob pattern
src = sources.function_return("get_input")
src = sources.call("getenv")
src = sources.argv()

# Combine with |
combined = sources.argv() | sources.getenv()

Sinks (`saf.sinks`)

Function	Description
`call(callee, *, arg_index=None)`	Select calls to a function. If `arg_index` is given, selects that argument; otherwise selects the call result.
`arg_to(callee, index)`	Select arguments passed to a function (0-based index).

from saf import sinks

sink = sinks.call("system", arg_index=0)
sink = sinks.call("printf", arg_index=0)
sink = sinks.arg_to("free", 0)

# Without arg_index, selects the call result
sink = sinks.call("dangerous_function")

Sanitizers (`saf.sanitizers`)

Function	Description
`call(callee, *, arg_index=None)`	Select calls to a sanitizing function. If `arg_index` is given, selects that argument; otherwise the return value is considered sanitized.
`arg_to(callee, index)`	Select arguments passed to a sanitizing function.

from saf import sanitizers

san = sanitizers.call("escape_html", arg_index=0)
san = sanitizers.call("sanitize_input")

# Combine sanitizers
combined = sanitizers.call("sanitize") | sanitizers.call("escape")

Module-Level Selector Factories

The following factory functions are also available directly from saf._saf (used internally by sources, sinks, and sanitizers modules):

function_param(function, index=None) -- Select function parameters.
function_return(function) -- Select function return values.
call(callee) -- Select call results.
arg_to(callee, index=None) -- Select arguments to a callee.

Checker Framework

Running Built-In Checkers

# Run a specific checker
findings = proj.check("memory-leak")

# Run multiple checkers at once (pass a list)
findings = proj.check(["memory-leak", "use-after-free", "double-free"])

# Run all 9 built-in checkers
findings = proj.check_all()

# List available checkers and their metadata
schema = proj.checker_schema()
for checker in schema["checkers"]:
    print(f"{checker['name']}: {checker['description']} (CWE-{checker['cwe']})")

Built-In Checker Table

Checker Name	CWE	Description
`memory-leak`	401	Allocated memory never freed
`use-after-free`	416	Memory accessed after being freed
`double-free`	415	Memory freed more than once
`null-deref`	476	Null pointer dereference
`file-descriptor-leak`	403	Opened file never closed
`uninit-use`	457	Use of uninitialized memory
`stack-escape`	562	Returning stack address
`lock-not-released`	764	Mutex not unlocked
`generic-resource-leak`	N/A	Custom resource tracking

Custom Checkers

# Define a custom checker with source/sink/sanitizer roles
findings = proj.check_custom(
    "my-custom-leak",
    mode="must_not_reach",        # "may_reach", "must_not_reach", or "never_reach_sink"
    source_role="allocator",       # resource role for sources
    source_match_return=True,      # match return value (True) or first arg (False)
    sink_is_exit=True,             # sinks are function exits
    sink_role=None,                # or a resource role string
    sanitizer_role="deallocator",  # or None
    sanitizer_match_return=False,
    cwe=401,                       # optional CWE ID
    severity="warning",            # "info", "warning", "error", "critical"
)

Path-Sensitive Checking (Z3)

# Run checkers with Z3-based path feasibility filtering
result = proj.check_path_sensitive("null-deref", z3_timeout_ms=2000, max_guards=64)

# Or run all checkers with path sensitivity
result = proj.check_all_path_sensitive(z3_timeout_ms=1000, max_guards=64)

# Result has feasible, infeasible, and unknown findings
print(f"Real bugs: {len(result.feasible)}")
print(f"False positives filtered: {len(result.infeasible)}")
print(f"Unknown: {len(result.unknown)}")
print(result.diagnostics)  # dict with Z3 statistics

# Post-filter existing findings
raw_findings = proj.check_all()
result = proj.filter_infeasible(raw_findings, z3_timeout_ms=1000, max_guards=64)

`CheckerFinding` Attributes

Each item returned by check(), check_all(), or check_custom() is a CheckerFinding:

Attribute	Type	Description
`checker`	`str`	Checker name that produced this finding
`severity`	`str`	`"info"`, `"warning"`, `"error"`, or `"critical"`
`cwe`	`int \| None`	CWE ID if applicable
`message`	`str`	Human-readable description
`source`	`str`	Source SVFG node hex ID
`sink`	`str`	Sink SVFG node hex ID
`trace`	`list[str]`	Path from source to sink as hex node IDs
`sink_traces`	`list[dict]`	Per-sink traces for multi-reach findings (e.g., double-free). Each dict has `"sink"` and `"trace"` keys.

for f in proj.check("use-after-free"):
    print(f.checker, f.severity, f.message)
    print(f"  CWE-{f.cwe}: {f.source} -> {f.sink}")
    print(f"  Trace length: {len(f.trace)}")
    # Convert to dict
    d = f.to_dict()

Finding Objects

The taint_flow() and flows() query methods return Finding objects, which are distinct from CheckerFinding objects.

`Finding` Attributes

Attribute	Type	Description
`finding_id`	`str`	Deterministic hex identifier
`source_location`	`str`	Source location (file:line:col or value ID)
`sink_location`	`str`	Sink location (file:line:col or value ID)
`source_id`	`str`	Source value ID (hex)
`sink_id`	`str`	Sink value ID (hex)
`rule_id`	`str \| None`	Optional rule identifier
`trace`	`Trace`	Step-by-step data flow path

for f in q.taint_flow(sources.argv(), sinks.call("system", arg_index=0)):
    print(f"{f.source_location} -> {f.sink_location}")
    print(f.trace.pretty())       # human-readable trace
    print(f"Steps: {len(f.trace)}")
    d = f.to_dict()               # convert to dict

Trace and TraceStep

A Trace contains a list of TraceStep objects. Each step represents one hop in the value-flow graph:

TraceStep Attribute	Type	Description
`from_id`	`str`	Source node ID
`from_kind`	`str`	Source node kind
`from_symbol`	`str \| None`	Symbol name at source
`from_location`	`str \| None`	Source file:line:col
`edge`	`str`	Edge kind (def_use, transform, store, load, etc.)
`to_id`	`str`	Target node ID
`to_kind`	`str`	Target node kind
`to_symbol`	`str \| None`	Symbol name at target
`to_location`	`str \| None`	Target file:line:col

for step in finding.trace.steps:
    print(f"  {step.from_symbol or step.from_id} --{step.edge}-> "
          f"{step.to_symbol or step.to_id}")

Resource Table

The resource table maps function names to resource management roles. It ships with built-in entries for C stdlib, C++ operators, POSIX I/O, and pthreads.

table = proj.resource_table()

table.has_role("malloc", "allocator")    # True
table.has_role("free", "deallocator")    # True
table.has_role("fopen", "acquire")       # True
table.has_role("fclose", "release")      # True

# Add custom entries
table.add("my_alloc", "allocator")
table.add("my_free", "deallocator")

# Inspect
print(table.size)                 # number of entries
print(table.function_names())     # sorted list of function names
entries = table.export()          # list of {"name": ..., "roles": [...]}

Available roles: allocator, deallocator, reallocator, acquire, release, lock, unlock, null_source, dereference.

Advanced Analysis

IFDS Taint Analysis

Precise interprocedural taint tracking using the IFDS framework (Reps/Horwitz/Sagiv tabulation algorithm):

result = proj.ifds_taint(
    sources=sources.function_param("main", 0),
    sinks=sinks.call("system", arg_index=0),
    sanitizers=sanitizers.call("validate"),  # optional
)

Typestate Analysis

Track per-resource state machines using the IDE framework:

# Built-in specs: "file_io", "mutex_lock", "memory_alloc"
result = proj.typestate("file_io")

# Custom typestate spec
from saf import TypestateSpec
result = proj.typestate_custom(spec)

Flow-Sensitive Pointer Analysis

More precise than Andersen's flow-insensitive analysis for programs with pointer reassignment:

fs_result = proj.flow_sensitive_pta(pts_repr="auto")
# pts_repr options: "auto", "btreeset", "bitvector", "bdd"

Context-Sensitive Pointer Analysis (k-CFA)

Distinguishes calls to the same function from different call sites:

cs_result = proj.context_sensitive_pta(k=1, pts_repr="auto")

Demand-Driven Pointer Analysis

Computes points-to information only for explicitly queried pointers:

dda = proj.demand_pta(
    max_steps=100_000,
    max_context_depth=10,
    timeout_ms=5000,
    enable_strong_updates=True,
    pts_repr="auto",
)

Memory SSA and SVFG

mssa = proj.memory_ssa()      # Memory SSA representation
svfg = proj.svfg()            # Sparse Value-Flow Graph

Iterative CHA + PTA-based indirect call resolution:

result = proj.refine_call_graph(entry_points="all", max_iterations=10)

Abstract Interpretation

Numeric interval analysis with widening/narrowing:

result = proj.abstract_interp(
    max_widening=100,
    narrowing_iterations=3,
    use_thresholds=True,
)

Numeric Checkers

# Individual numeric checker
findings = proj.check_numeric("buffer_overflow")     # CWE-120
findings = proj.check_numeric("integer_overflow")    # CWE-190
findings = proj.check_numeric("division_by_zero")    # CWE-369
findings = proj.check_numeric("shift_count")         # CWE-682

# All numeric checkers at once
findings = proj.check_all_numeric()

Combined PTA + Abstract Interpretation

Alias-aware numeric analysis with bidirectional refinement:

result = proj.analyze_combined(
    enable_refinement=True,
    max_refinement_iterations=3,
)
interval = result.interval_at("0x1234...")
alias = result.may_alias("0x5678...", "0x9abc...")

# Prove/disprove assertions
result = proj.prove_assertions(z3_timeout_ms=1000, max_guards=64)

# Refine alias query with path constraints
result = proj.refine_alias("0xP", "0xQ", at_block="0xB", func_id="0xF")

# Check if a feasible path exists between two blocks
result = proj.check_path_reachable(
    from_block="0xB1", to_block="0xB2", func_id="0xF",
    z3_timeout_ms=1000, max_guards=64, max_paths=100,
)

JSON Protocol (LLM Agent Interface)

import json
resp = proj.request('{"action": "schema"}')
data = json.loads(resp)

AIR Module Access

The AirModule provides mid-level access to the intermediate representation:

air = proj.air()

print(air.name)              # module name
print(air.id)                # module ID (hex)
print(air.function_count)    # number of functions
print(air.global_count)      # number of globals
print(air.function_names())  # list of function names
print(air.global_names())    # list of global names

Visualization

The saf.viz module provides dependency-free graph visualization:

from saf import viz

pg = proj.graphs().export("callgraph")

# Graphviz DOT string (no dependencies)
dot_str = viz.to_dot(pg)

# Interactive HTML with Cytoscape.js (no dependencies)
html_str = viz.to_html(pg)

# Open in browser or save to file
viz.visualize(pg)                              # opens in browser
viz.visualize(pg, output="callgraph.html")     # saves to file

# Cytoscape.js JSON (for ipycytoscape in Jupyter)
cy_json = viz.to_cytoscape_json(pg)

# NetworkX DiGraph (requires networkx)
G = viz.to_networkx(pg)

# graphviz.Digraph object (requires graphviz package)
gv = viz.to_graphviz(pg)
gv.render("graph", format="svg")

Exceptions

All SAF exceptions inherit from SafError and carry .code and .details attributes for structured error handling:

Exception	Description
`SafError`	Base exception for all SAF errors
`FrontendError`	Frontend ingestion errors (parsing, I/O, unsupported features)
`AnalysisError`	Analysis errors (PTA timeout, ValueFlow build error)
`QueryError`	Query execution errors (invalid selector, no match)
`ConfigError`	Configuration errors (invalid field, incompatible options)

from saf import Project, SafError, FrontendError

try:
    proj = Project.open("nonexistent.ll")
except FrontendError as e:
    print(f"Error: {e}")

Module Exports

The saf package exports the following names:

from saf import (
    # Core classes
    Project, Query, Finding, Trace, TraceStep,
    # Selectors
    Selector, SelectorSet,
    # Selector modules
    sources, sinks, sanitizers, viz,
    # Checker types
    CheckerFinding, PathSensitiveResult, ResourceTable,
    # Typestate types
    TypestateResult, TypestateFinding, TypestateSpec, typestate_specs,
    # Exceptions
    SafError, FrontendError, AnalysisError, QueryError, ConfigError,
    # Resource role constants
    Allocator, Deallocator, Reallocator, Acquire, Release,
    NullSource, Dereference,
    # Reachability mode constants
    MayReach, MustNotReach,
    # Severity constants
    Info, Warning, Error, Critical,
    # Functions
    version,
)

CLI

The SAF command-line interface provides access to analysis capabilities from the terminal. The binary is named saf.

Global Options

Option	Description
`--json-errors`	Output errors as JSON for machine-readable consumption

Commands

saf index

Index input files via a frontend to produce AIR-JSON.

saf index program.ll
saf index program.ll --output bundle.air.json
saf index --frontend air-json bundle.air.json

Option	Type	Default	Description
`<inputs>`	positional, required		Input files to index
`--frontend <frontend>`	`llvm`, `air-json`	`llvm`	Frontend to use for ingestion
`--output <path>`	path	stdout	Write AIR-JSON output to file instead of stdout

saf run

Run analysis passes on input programs. This is the primary command -- it ingests input files, builds internal representations (CFG, call graph, pointer analysis), runs the selected checkers, and produces findings.

saf run program.ll
saf run program.ll --mode fast
saf run program.ll --format sarif --output report.sarif
saf run program.ll --checkers memory-leak,null-deref --pta cspta --pta-k 2

Input Options

Option	Type	Default	Description
`<inputs>`	positional, required		Input files to analyze
`--frontend <frontend>`	`llvm`, `air-json`	`llvm`	Frontend to use for ingestion
`--specs <path>`	path		Additional spec files or directories

Analysis Options

Option	Type	Default	Description
`--mode <mode>`	`fast`, `precise`	`precise`	Analysis mode
`--checkers <list>`	string	`all`	Checkers to run (comma-separated, or `all` / `none`)
`--combined`	flag		Run combined PTA + abstract interpretation

Pointer Analysis Options

Option	Type	Default	Description
`--pta <variant>`	`andersen`, `cspta`, `fspta`, `dda`	`andersen`	PTA variant
`--solver <backend>`	`worklist`, `datalog`	`worklist`	PTA solver backend
`--pts-repr <repr>`	`auto`, `btreeset`, `fxhash`, `roaring`, `bdd`	`auto`	Points-to set representation
`--pta-k <n>`	integer	`2`	k-CFA depth (`cspta` only)
`--field-sensitivity <level>`	`struct-fields`, `array-index`, `flat`	`struct-fields`	Field sensitivity level
`--max-pta-iterations <n>`	integer		Maximum PTA iterations

Z3 Options

Option	Type	Default	Description
`--path-sensitive`	flag		Enable Z3 path-sensitive checker filtering
`--z3-prove`	flag		Prove assertions via Z3
`--z3-refine-alias`	flag		Refine alias results via Z3
`--z3-check-reachability`	flag		Check path reachability via Z3
`--z3-timeout <ms>`	integer	`5000`	Z3 solver timeout in milliseconds

Typestate and Taint Options

Option	Type	Description
`--typestate <name>`	string	Run built-in typestate spec
`--typestate-custom <path>`	path	Run custom typestate spec from YAML
`--ifds-taint <path>`	path	Run IFDS taint analysis with config file

Output Options

Option	Type	Default	Description
`--format <fmt>`	`human`, `json`, `sarif`	`human`	Output format
`--output <path>`	path	stdout	Write output to file
`--diagnostics`	flag		Include checker/PTA diagnostics
`--verbose`	flag		Show timing, resource table, stats

saf query

Query analysis results. Requires --input and a subcommand.

saf query points-to --input program.ll 0x00ab1234...
saf query alias --input program.ll 0x001a... 0x002b...
saf query flows --input program.ll 0xsource... 0xsink...
saf query taint --input program.ll 0xsource... 0xsink...
saf query reachable --input program.ll 0xfunc1... 0xfunc2...

Option	Type	Default	Description
`--input <files>`	path(s), required		Input files to analyze
`--frontend <frontend>`	`llvm`, `air-json`	`llvm`	Frontend to use for ingestion

Subcommands

Subcommand	Arguments	Description
`points-to`	`<pointer>` (hex value ID)	Points-to set for a value
`alias`	`<p> <q>` (hex value IDs)	May-alias check between two pointers
`flows`	`<source> <sink>` (hex value IDs)	Data-flow reachability
`taint`	`<source> <sink>` (hex value IDs)	Taint-flow query
`reachable`	`<func_ids>...` (hex function IDs)	Call-graph reachability from functions

saf export

Export graphs or findings in various formats.

saf export cfg --input program.ll
saf export callgraph --input program.ll --format dot --output cg.dot
saf export findings --input program.ll --format sarif --output report.sarif
saf export valueflow --input program.ll --format json --output vfg.json

Option	Type	Default	Description
`<target>`	positional, required		Export target (see below)
`--input <files>`	path(s), required		Input files to analyze
`--format <fmt>`	`json`, `sarif`, `dot`, `html`	`json`	Output format
`--output <path>`	path	stdout	Write output to file instead of stdout
`--function <name>`	string		Filter to a specific function (for CFG export)
`--frontend <frontend>`	`llvm`, `air-json`	`llvm`	Frontend to use for ingestion

Export Targets

Target	Description
`cfg`	Control-flow graph
`callgraph`	Call graph (inter-procedural)
`defuse`	Def-use chains
`valueflow`	Value-flow graph
`svfg`	Sparse Value-Flow Graph
`findings`	Analysis findings / bug reports
`pta`	Points-to analysis results

saf schema

Print the SAF schema (supported frontends, queries, checkers).

saf schema
saf schema --checkers
saf schema --frontends
saf schema --format json

Option	Type	Default	Description
`--checkers`	flag		List available checkers only
`--frontends`	flag		List available frontends only
`--format <fmt>`	`human`, `json`, `sarif`	`human`	Output format

saf specs

Manage function specifications (list, validate, look up).

saf specs list
saf specs list --verbose
saf specs validate path/to/specs.yaml
saf specs lookup malloc

Subcommands

Subcommand	Arguments	Description
`list`	`[--verbose]`	List loaded function specifications
`validate`	`<path>` (required)	Validate spec file or directory
`lookup`	`<name>` (required)	Look up the spec for a function by name

saf incremental

Run incremental analysis with caching. Only recomputes what changed between runs.

saf incremental src/*.ll
saf incremental src/*.ll --cache-dir .saf-cache
saf incremental src/*.ll --plan
saf incremental src/*.ll --clean
saf incremental src/*.ll --export-summaries summaries.yaml

Option	Type	Default	Description
`<inputs>`	positional, required		Input files to analyze
`--frontend <frontend>`	`llvm`, `air-json`	`llvm`	Frontend to use for ingestion
`--mode <mode>`	`sound`, `best-effort`	`best-effort`	Precision mode for incremental analysis
`--cache-dir <path>`	path	`.saf-cache`	Cache directory for incremental state
`--plan`	flag		Dry-run: show what would be recomputed without running analysis
`--clean`	flag		Clear the cache before analysis
`--export-summaries <path>`	path		Export computed summaries as YAML to the given path

saf help

Show topic-based help guides.

saf help               # Overview of all commands and topics
saf help run           # How to run analyses and configure passes
saf help checkers      # Built-in bug checkers and how to select them
saf help pta           # Pointer analysis variants and configuration
saf help typestate     # Typestate checking and custom protocol specs
saf help taint         # Taint analysis modes and configuration
saf help z3            # Z3-based path sensitivity and refinement
saf help export        # Export targets, formats, and examples
saf help specs         # Function specification format and discovery
saf help incremental   # Incremental analysis and caching
saf help examples      # Common usage patterns and recipes

Available Topics

Topic	Description
`run`	How to run analyses and configure passes
`checkers`	Built-in bug checkers and how to select them
`pta`	Pointer analysis variants and configuration
`typestate`	Typestate checking and custom protocol specs
`taint`	Taint analysis modes and configuration
`z3`	Z3-based path sensitivity and refinement
`export`	Export targets, formats, and examples
`specs`	Function specification format and discovery
`incremental`	Incremental analysis and caching
`examples`	Common usage patterns and recipes

Exit Codes

Code	Meaning
0	Success (no findings, or findings exported)
1	Error (invalid input, analysis failure)
2	Findings detected (when used in CI mode)

Examples

Quick Scan

# Run all checkers with defaults (Andersen PTA, precise mode)
saf run program.ll

CI/CD Integration

# Run all checkers and output SARIF for GitHub Code Scanning
saf run program.ll --format sarif --output results.sarif

# Check exit code for pass/fail
if saf run program.ll --format json | jq '.findings | length' | grep -q '^0$'; then
    echo "No findings"
else
    echo "Findings detected"
    exit 1
fi

Batch Analysis

# Analyze multiple files
for f in src/*.ll; do
    saf run "$f" --format sarif --output "results/$(basename "$f" .ll).sarif"
done

Path-Sensitive Analysis

# Reduce false positives with Z3-based path feasibility filtering
saf run program.ll --path-sensitive --z3-timeout 10000

Export Call Graph as DOT

saf export callgraph --input program.ll --format dot --output cg.dot
dot -Tpng cg.dot -o cg.png

Incremental Analysis

# First run caches results; subsequent runs only re-analyze changed files
saf incremental src/*.ll --cache-dir .saf-cache

Demand-Driven Points-To Query

saf query points-to --input program.ll 0x00ab1234abcd5678

PropertyGraph Format

All SAF graph exports use a unified PropertyGraph JSON format. This format is shared across graph types (CFG, call graph, def-use, value-flow, SVFG, PTA) for consistent downstream processing. SAF also exports findings (checker results) and a native PTA format, documented below.

Schema

{
  "schema_version": "0.1.0",
  "graph_type": "<type>",
  "metadata": {},
  "nodes": [
    {
      "id": "0x...",
      "labels": ["Label1", "Label2"],
      "properties": { "key": "value" }
    }
  ],
  "edges": [
    {
      "src": "0x...",
      "dst": "0x...",
      "edge_type": "EDGE_TYPE",
      "properties": {}
    }
  ]
}

Top-Level Fields

Field	Type	Description
`schema_version`	`string`	Format version (currently `"0.1.0"`)
`graph_type`	`string`	One of `cfg`, `callgraph`, `defuse`, `valueflow`, `svfg`, `pta`
`metadata`	`object`	Graph-specific metadata (e.g., node/edge counts)
`nodes`	`array`	List of node objects
`edges`	`array`	List of edge objects

Node Fields

Field	Type	Description
`id`	`string`	Deterministic hex ID (`0x` + 32 hex chars)
`labels`	`array[string]`	Node type labels (e.g., `["Function"]`, `["Block", "Entry"]`)
`properties`	`object`	Type-specific properties

Edge Fields

Field	Type	Description
`src`	`string`	Source node ID
`dst`	`string`	Destination node ID
`edge_type`	`string`	Edge type label
`properties`	`object`	Edge-specific properties

Graph Types

Call Graph (`callgraph`)

Element	Details
Node labels	`["Function"]`
Node properties	`name` (function name), `kind` (`"defined"` or `"external"`)
Edge type	`"CALLS"`

CFG (`cfg`)

Element	Details
Node labels	`["Block"]`, optionally `["Block", "Entry"]`
Node properties	`name` (block name), `function` (owning function)
Edge type	`"FLOWS_TO"`

Def-Use (`defuse`)

Element	Details
Node labels	`["Value"]` or `["Instruction"]`
Node properties	Varies
Edge types	`"DEFINES"`, `"USED_BY"`

Value Flow (`valueflow`)

Element	Details
Node labels	`["Value"]`, `["Location"]`, or `["UnknownMem"]`
Node properties	`kind` (`"Value"`, `"Location"`, `"UnknownMem"`)
Edge types	`"Direct"`, `"Store"`, `"Load"`, `"CallArg"`, `"Return"`, `"Transform"`
Metadata	`node_count`, `edge_count`

SVFG (`svfg`)

The Sparse Value-Flow Graph captures both direct (top-level SSA) and indirect (memory, via MSSA) value flows. It is the foundation for SVFG-based checkers such as null-pointer dereference and use-after-free detectors.

Element	Details
Node labels	`["Value"]` or `["MemPhi"]`
Node properties	`kind` (`"value"` or `"mem_phi"`)
Edge types	`DIRECT_DEF`, `DIRECT_TRANSFORM`, `CALL_ARG`, `RETURN`, `INDIRECT_DEF`, `INDIRECT_STORE`, `INDIRECT_LOAD`, `PHI_FLOW`
Metadata	`node_count`, `edge_count`

Edge type descriptions:

Edge type	Category	Description
`DIRECT_DEF`	Direct	SSA def-use chain (including phi incoming, select, copy)
`DIRECT_TRANSFORM`	Direct	Binary/unary/cast/GEP operand to result
`CALL_ARG`	Direct	Actual argument to formal parameter
`RETURN`	Direct	Callee return value to caller result
`INDIRECT_DEF`	Indirect	Store value to load result (clobber is a store)
`INDIRECT_STORE`	Indirect	Store value to `MemPhi` node
`INDIRECT_LOAD`	Indirect	`MemPhi` node to load result
`PHI_FLOW`	Indirect	`MemPhi` to `MemPhi` (nested phi chaining)

Example:

{
  "schema_version": "0.1.0",
  "graph_type": "svfg",
  "metadata": { "node_count": 3, "edge_count": 2 },
  "nodes": [
    { "id": "0x00000000000000000000000000000001", "labels": ["Value"], "properties": { "kind": "value" } },
    { "id": "0x00000000000000000000000000000064", "labels": ["MemPhi"], "properties": { "kind": "mem_phi" } },
    { "id": "0x00000000000000000000000000000002", "labels": ["Value"], "properties": { "kind": "value" } }
  ],
  "edges": [
    { "src": "0x00000000000000000000000000000001", "dst": "0x00000000000000000000000000000064", "edge_type": "INDIRECT_STORE", "properties": {} },
    { "src": "0x00000000000000000000000000000064", "dst": "0x00000000000000000000000000000002", "edge_type": "INDIRECT_LOAD", "properties": {} }
  ]
}

Note: The SVFG also has a native (non-PropertyGraph) export format with an SvfgExport schema that includes a diagnostics object with construction statistics (direct_edge_count, indirect_edge_count, mem_phi_count, skipped_call_clobbers, skipped_live_on_entry). The PropertyGraph format shown above is what saf export svfg produces.

PTA (`pta`) — PropertyGraph Format

The points-to analysis can be exported as a PropertyGraph. Pointer values become nodes and locations become nodes, connected by POINTS_TO edges.

Element	Details
Node labels	`["Pointer"]` or `["Location"]`
Node properties	Location nodes carry `obj` (object ID hex) and optionally `path` (field path)
Edge type	`"POINTS_TO"`

Example:

{
  "schema_version": "0.1.0",
  "graph_type": "pta",
  "metadata": {},
  "nodes": [
    { "id": "0x00000000000000000000000000000001", "labels": ["Pointer"], "properties": {} },
    { "id": "0x00000000000000000000000000000064", "labels": ["Location"], "properties": { "obj": "0x000000000000000000000000000000c8", "path": [".0"] } }
  ],
  "edges": [
    { "src": "0x00000000000000000000000000000001", "dst": "0x00000000000000000000000000000064", "edge_type": "POINTS_TO", "properties": {} }
  ]
}

PTA Native Export

The PTA also has a richer native export format (not PropertyGraph) that includes analysis configuration, all abstract locations, and diagnostics. This is the format returned by PtaResult::export() in the Rust API:

{
  "schema_version": "0.1.0",
  "config": {
    "enabled": true,
    "field_sensitivity": "struct_fields(max_depth=2)",
    "max_objects": 100000,
    "max_iterations": 100
  },
  "locations": [
    {
      "id": "0x...",
      "obj": "0x...",
      "path": [".0", "[2]"]
    }
  ],
  "points_to": [
    {
      "value": "0x...",
      "locations": ["0x...", "0x..."]
    }
  ],
  "diagnostics": {
    "iterations": 12,
    "iteration_limit_hit": false,
    "collapse_warning_count": 0,
    "constraint_count": 150,
    "location_count": 45
  }
}

Field	Type	Description
`config`	`object`	PTA configuration used for the analysis run
`locations`	`array`	All abstract locations with object ID and field path
`points_to`	`array`	Points-to sets: each entry maps a `value` to its `locations`
`diagnostics`	`object`	Solver statistics (iterations, limits, constraint/location counts)

Findings Export

The findings export (saf export findings) produces a JSON array of checker findings. This is not a PropertyGraph -- it is a flat list of diagnostic results from all enabled checkers.

Each finding has the following structure:

[
  {
    "check": "null_deref",
    "severity": "error",
    "cwe": 476,
    "message": "Pointer may be null when dereferenced",
    "path": [
      { "location": "main:5", "event": "NULL assigned" },
      { "location": "main:10", "event": "pointer dereferenced" }
    ],
    "object": "p"
  }
]

Field	Type	Description
`check`	`string`	Name of the checker that produced this finding
`severity`	`string`	One of `info`, `warning`, `error`, `critical`
`cwe`	`number?`	CWE ID if applicable (omitted when not set)
`message`	`string`	Human-readable description of the issue
`path`	`array`	Trace events from source to sink (omitted when empty)
`object`	`string?`	Affected object name if applicable (omitted when not set)

Each entry in path is a PathEvent:

Field	Type	Description
`location`	`string`	Source location description
`event`	`string`	What happened at this point
`state`	`string?`	Typestate label (omitted when not applicable)

Determinism

All PropertyGraph exports are deterministic:

Nodes are sorted by (node_kind, referenced_id_hex)
Edges are sorted by (edge_kind, src_id_hex, dst_id_hex, label_hash_hex)
No timestamps or wall-clock-dependent data
Identical inputs produce byte-identical JSON output

Working with PropertyGraph

Python

import json
from saf import Project

proj = Project.open("program.ll")
graphs = proj.graphs()
cg = graphs.export("callgraph")

# Access nodes and edges directly
for node in cg["nodes"]:
    print(node["properties"]["name"])

for edge in cg["edges"]:
    print(f"{edge['src']} -> {edge['dst']}")

# Save to file
with open("callgraph.json", "w") as f:
    json.dump(cg, f, indent=2)

jq

# List function names
jq '.nodes[] | .properties.name' callgraph.json

# Count edges by type
jq '[.edges[] | .edge_type] | group_by(.) | map({type: .[0], count: length})' graph.json

# Find high fan-out nodes
jq '[.edges[] | .src] | group_by(.) | map({node: .[0], out: length}) | sort_by(-.out) | .[0:5]' callgraph.json

NetworkX

import json
import networkx as nx

with open("callgraph.json") as f:
    data = json.load(f)

G = nx.DiGraph()
for node in data["nodes"]:
    G.add_node(node["id"], **node.get("properties", {}))
for edge in data["edges"]:
    G.add_edge(edge["src"], edge["dst"], edge_type=edge.get("edge_type", ""))

print(f"Nodes: {G.number_of_nodes()}")
print(f"Edges: {G.number_of_edges()}")
print(f"Components: {nx.number_weakly_connected_components(G)}")

SAF Widgets

SAF provides embeddable widgets that bring interactive program analysis into any web page. Widgets are implemented as iframes pointing to the SAF playground in embed mode.

Basic Usage

Add a widget to any HTML page:

<div class="saf-widget">
  <iframe
    src="../../playground/?embed=true&split=true&example=taint_flow&graph=cfg"
    loading="lazy"
  ></iframe>
</div>

URL Parameters

Parameter	Values	Description
`embed`	`true`	Enables embed mode (hides toolbar, navigation)
`split`	`true`	Shows both source editor and graph side by side
`example`	`taint_flow`, `pointer_alias`, `complex_cfg`, `indirect_call`, etc.	Pre-loaded example slug
`graph`	`cfg`, `callgraph`, `defuse`, `valueflow`, `pta`	Initial graph to display

Styling

Add the SAF widget styles to your CSS:

.saf-widget {
  border: 1px solid #333;
  border-radius: 8px;
  overflow: hidden;
  margin: 1em 0;
}

.saf-widget iframe {
  width: 100%;
  height: 400px;
  border: none;
}

Adjust the height to fit your content. For split mode with source + graph, 500px or more is recommended.

mdBook Integration

In mdBook pages, use raw HTML:

<div class="saf-widget">
  <iframe
    src="../../playground/?embed=true&split=true&example=taint_flow&graph=valueflow"
    loading="lazy"
  ></iframe>
</div>

The custom.css file included in this book already provides the widget styles.

Examples

CFG Visualization

<div class="saf-widget">
  <iframe
    src="../../playground/?embed=true&split=true&example=complex_cfg&graph=cfg"
    loading="lazy"
  ></iframe>
</div>

Call Graph with Indirect Calls

<div class="saf-widget">
  <iframe
    src="../../playground/?embed=true&split=true&example=indirect_call&graph=callgraph"
    loading="lazy"
  ></iframe>
</div>

Points-To Analysis

<div class="saf-widget">
  <iframe
    src="../../playground/?embed=true&split=true&example=pointer_alias&graph=pta"
    loading="lazy"
  ></iframe>
</div>

Value Flow (Taint)

<div class="saf-widget">
  <iframe
    src="../../playground/?embed=true&split=true&example=taint_flow&graph=valueflow"
    loading="lazy"
  ></iframe>
</div>

Sizing Recommendations

Mode	Recommended Height	Description
Graph only (`embed=true`)	400px	Shows graph visualization only
Split view (`embed=true&split=true`)	500px+	Shows source code + graph side by side
Full height	600px+	Best for complex graphs with many nodes

How It Works

When embed=true is set:

The playground hides the top navigation bar
The specified example is loaded automatically
The specified graph type is selected
With split=true, both the source editor and graph panel are visible

The widget runs entirely client-side via WebAssembly. No server calls are made for the analysis itself (C compilation still uses Compiler Explorer if the example contains C code).

Responsive Design

Widgets adapt to their container width. For narrow containers, consider increasing the height:

@media (max-width: 768px) {
  .saf-widget iframe {
    height: 300px;
  }
}

Security

Widgets use iframe sandboxing. The embedded playground cannot access the parent page's DOM or cookies. Communication between the parent page and widget is not currently supported.

SAF Documentation