Introduction

SAF (Static Analyzer Factory) is a Rust + WebAssembly framework for building program analysis tools. It provides whole-program pointer analysis, value-flow reasoning, and graph-based program representations -- all accessible through a Python SDK or directly in the browser via WebAssembly.

Who Is SAF For?

SAF is designed for three audiences:

  • Students learning program analysis. The interactive Playground lets you paste C code, visualize control-flow graphs, call graphs, and value-flow graphs, and step through pointer analysis -- all without installing anything.

  • Security researchers writing custom checkers. The Python SDK exposes a schema-driven API for authoring analyzers that detect use-after-free, double free, taint-flow violations, and other vulnerability classes. You write the checker logic; SAF handles the underlying analysis infrastructure.

  • AI and agent developers. The same schema-driven API that humans use is designed for LLM agents to consume. Agents can query points-to sets, traverse value-flow edges, and emit SARIF findings programmatically.

What Can You Do With SAF?

CapabilityDescription
Visualize program graphsRender CFGs, ICFGs, call graphs, def-use chains, and value-flow graphs as interactive diagrams
Run pointer analysisAndersen-style inclusion-based analysis with field sensitivity and on-the-fly call graph construction
Detect vulnerabilitiesBuilt-in checkers for use-after-free, double free, memory leaks, null dereference, file descriptor leaks, and more
Write custom analyzersAuthor new checkers in Python using the SDK, with full access to points-to and value-flow results
Embed analysis widgetsEmbed interactive analysis <iframe> widgets into any web page for live program visualization
Export resultsOutput findings as SARIF, and graphs as the unified PropertyGraph JSON format

Key Differentiators

  • Runs in the browser. The core analysis engine compiles to WebAssembly. The playground and embeddable widgets require no server -- everything executes client-side.

  • Deterministic. Given identical inputs, SAF produces byte-identical outputs. All internal data structures use ordered collections, and all IDs are derived from BLAKE3 hashes. This makes results reproducible and diffable.

  • Frontend-agnostic. SAF operates on its own intermediate representation (AIR), not on LLVM IR directly. Frontends translate source languages into AIR; the analysis core never sees frontend-specific types.

  • Open source. SAF is open source and designed for extensibility. The crate architecture cleanly separates the core IR, frontends, analysis passes, and language bindings.

Quick Start

The fastest way to try SAF:

  1. In the browser: Open the Playground and paste a C snippet. Click "Analyze" to see the CFG, call graph, and points-to results.

  2. Locally with Docker:

    git clone https://github.com/Static-Analyzer-Factory/static-analyzer-factory.git
    cd static-analyzer-factory
    make shell
    # Inside the container:
    python3 tutorials/memory-safety/02-use-after-free/detect.py
    
  3. With the CLI:

    saf run program.ll --format sarif --output results.sarif
    

Where to Go Next

  • Installation -- Set up SAF locally with Docker or install the Python package.
  • First Analysis -- Walk through analyzing a small C program end to end.
  • Playground Tour -- Learn how to use the browser-based playground.
  • Tutorials -- Step-by-step guides for detecting specific vulnerability classes.
  • API Reference -- Full reference for the Python SDK, CLI, and PropertyGraph format.

Architecture Overview

Input (.c / .ll / .bc / .air.json)
  -> Frontend (LLVM, AIR-JSON)
    -> AirBundle (canonical IR)
      -> Graph builders (CFG, ICFG, CallGraph, DefUse)
        -> PTA solver (Andersen)
          -> ValueFlow graph
            -> Queries (flows, taint_flow, points_to)
              -> Export (JSON, SARIF)

SAF is structured as a set of Rust crates:

CratePurpose
saf-coreAIR definitions, config, deterministic ID generation
saf-frontendsFrontend trait + LLVM/AIR-JSON implementations
saf-analysisCFG, call graph, PTA, value-flow, checkers
saf-cliCommand-line interface
saf-pythonPyO3 bindings for the Python SDK

Installation

SAF can be used in two ways: in the browser (no install needed) or locally via Docker for full analysis capabilities.

Browser (No Install)

The fastest way to try SAF is the Playground. It runs entirely in your browser using WebAssembly -- no server, no Docker, no setup.

The browser version supports:

  • Writing C code or LLVM IR
  • Visualizing CFGs, call graphs, def-use chains, and value-flow graphs
  • Running pointer analysis
  • Exporting results as JSON

For the full Python SDK, checker framework, and LLVM bitcode support, use the local installation below.

Local Installation (Docker)

The local installation uses Docker to provide a complete development environment with Rust, Python 3.12, LLVM 18, and all dependencies pre-installed.

Prerequisites

  • Docker installed and running
  • Git
  • A terminal

Step 1: Clone the Repository

git clone https://github.com/Static-Analyzer-Factory/static-analyzer-factory.git
cd static-analyzer-factory

Step 2: Start the Development Shell

make shell

This drops you into an interactive shell inside the Docker container. The project directory is mounted at /workspace.

On first run, the container automatically:

  1. Creates a Python virtual environment
  2. Installs pytest and development dependencies
  3. Builds the SAF Python SDK with maturin develop --release

Subsequent runs skip the build if saf is already importable.

Step 3: Verify the Installation

Inside the Docker shell:

python3 -c "import saf; print('SAF version:', saf.version())"

You should see the SAF version printed without errors.

To run the full test suite:

# Exit Docker shell first (Ctrl+D), then:
make test

What's Included

The Docker environment provides:

ToolVersionPurpose
RuststableCore analysis engine
Python3.12SDK and tutorials
LLVM/Clang18C/C++ to LLVM IR compilation
maturinlatestPython-Rust bridge (PyO3)

Available Make Commands

make help        # Show all available commands
make test        # Run all tests (Rust + Python) in Docker
make lint        # Run clippy + rustfmt check in Docker
make fmt         # Auto-format all Rust code in Docker
make shell       # Open interactive dev shell in Docker
make build       # Build minimal runtime Docker image
make clean       # Remove Docker volumes and local target/

Next Steps

First Analysis

This guide walks you through analyzing a small C program from start to finish using the SAF Python SDK.

The Vulnerable Program

We will detect a command injection vulnerability (CWE-78) -- user-controlled input flowing directly to a shell command:

#include <stdlib.h>

int main(int argc, char *argv[]) {
    if (argc < 2) return 1;

    // SOURCE: argv[1] is user-controlled
    char *user_cmd = argv[1];

    // SINK: system() executes the string as a shell command
    return system(user_cmd);
}

If an attacker runs ./program "rm -rf /", the program executes that command.

The Analysis Pipeline

vulnerable.c --> clang-18 --> LLVM IR (.ll) --> SAF --> Findings
                 (compile)    (.ll file)        (analyze)

SAF works on LLVM IR, not source code directly. The pipeline compiles C to LLVM IR first, then runs analysis on the IR.

Step 1: Write the Detection Script

Create a file called detect.py:

import subprocess
from pathlib import Path
from saf import Project, sources, sinks

def main() -> None:
    # 1. Compile C to LLVM IR
    subprocess.run(
        ["clang-18", "-S", "-emit-llvm", "-O0", "-g",
         "-o", "vulnerable.ll", "vulnerable.c"],
        check=True,
    )
    print("Compiled: vulnerable.c -> vulnerable.ll")

    # 2. Load the LLVM IR into SAF
    proj = Project.open("vulnerable.ll")
    print("Project loaded")

    # 3. Create a query context
    q = proj.query()

    # 4. Run taint flow analysis
    findings = q.taint_flow(
        sources=sources.function_param("main", 1),  # argv
        sinks=sinks.call("system", arg_index=0),     # system()'s argument
    )

    # 5. Print results
    print(f"\nFound {len(findings)} taint flow(s):")
    for i, f in enumerate(findings):
        print(f"  [{i}] finding_id={f.finding_id}")
        if f.trace:
            print(f"       trace steps: {len(f.trace.steps)}")

if __name__ == "__main__":
    main()

Step 2: Run Inside Docker

make shell
python3 detect.py

Expected output:

Compiled: vulnerable.c -> vulnerable.ll
Project loaded

Found 1 taint flow(s):
  [0] finding_id=0x...
       trace steps: 3

SAF found one taint flow: data from argv[1] reaches system().

Understanding the Key Concepts

Sources and Sinks

  • Source: Where untrusted data enters. sources.function_param("main", 1) selects the second parameter of main (which is argv).
  • Sink: Where untrusted data is dangerous. sinks.call("system", arg_index=0) selects the first argument to any call to system().

Taint Flow

q.taint_flow(sources, sinks) performs a breadth-first search over the ValueFlow graph, looking for paths from any source to any sink. Each path found is reported as a Finding with:

  • finding_id -- a deterministic identifier
  • trace -- the step-by-step data flow path

Project.open()

Project.open("vulnerable.ll") does several things internally:

  1. Detects the .ll extension and selects the LLVM frontend
  2. Parses the LLVM IR into SAF's Analysis IR (AIR)
  3. Builds the call graph, def-use chains, and points-to analysis
  4. Constructs the ValueFlow graph

Step 3: Explore the Graphs

You can also inspect the program's structure:

# Export graphs
graphs = proj.graphs()

# Call graph
cg = graphs.export("callgraph")
print(f"Functions: {len(cg['nodes'])}")
print(f"Call edges: {len(cg['edges'])}")

# ValueFlow graph
vf = graphs.export("valueflow")
print(f"VF nodes: {len(vf['nodes'])}, edges: {len(vf['edges'])}")

Step 4: Use the Checker Framework

Instead of manually specifying sources and sinks, use the built-in checkers for common vulnerability patterns:

import saf

proj = saf.Project.open("vulnerable.ll")

# Run all built-in checkers (memory-leak, use-after-free, double-free, etc.)
all_findings = proj.check_all()

for f in all_findings:
    print(f"[{f.severity}] {f.checker}: {f.message}")

# Or run a specific checker
leak_findings = proj.check("memory-leak")

Next Steps

Playground Tour

The SAF Playground is a browser-based interactive environment for writing code, running analysis, and visualizing program graphs. Everything runs client-side via WebAssembly -- no server needed.

Opening the Playground

Navigate to the Playground in your browser. You will see three main areas:

  1. Code Editor (left) -- Write C code or LLVM IR
  2. Graph Viewer (right) -- Interactive visualization of analysis results
  3. Toolbar (top) -- Controls for compilation, analysis, and graph selection

Writing Code

The editor supports two input modes:

  • C/C++: Write standard C code. The playground compiles it to LLVM IR using the Compiler Explorer API, then analyzes the resulting IR.
  • LLVM IR: Write .ll format directly for full control over the IR structure.

Example: Command Injection

Paste this into the editor:

#include <stdlib.h>

int main(int argc, char *argv[]) {
    if (argc < 2) return 1;
    char *cmd = argv[1];
    return system(cmd);
}

Running Analysis

Click Analyze to start the pipeline:

  1. C code is compiled to LLVM IR via Compiler Explorer
  2. The IR is parsed by tree-sitter-llvm (in WASM)
  3. The parsed IR is converted to AIR JSON
  4. SAF's WASM module runs the full analysis pipeline
  5. Results are displayed as interactive graphs

Viewing Graphs

Use the graph selector tabs to switch between:

GraphWhat It Shows
CFGControl flow within each function -- basic blocks and branches
Call GraphWhich functions call which -- the program's call hierarchy
Def-UseWhere values are defined and where they are used
Value FlowHow data flows through the program, including across functions
Points-ToWhat each pointer may point to after analysis

Interacting with Graphs

  • Pan: Click and drag the background
  • Zoom: Scroll wheel or pinch gesture
  • Select: Click a node to highlight it and its connections
  • Details: Selected nodes show their properties in a detail panel

Built-In Examples

The playground includes pre-loaded examples demonstrating different analysis scenarios:

  • Command Injection -- Taint flow from argv to system()
  • Use-After-Free -- Pointer used after free()
  • Memory Leak -- Allocation without deallocation
  • Double Free -- Same pointer freed twice
  • Struct Field Access -- Field-sensitive pointer analysis

Select an example from the dropdown to load it into the editor.

Limitations

The browser playground uses a simplified analysis pipeline compared to the full Docker-based SAF:

FeatureBrowserFull SAF
C/C++ compilationCompiler Explorer APILocal clang-18
LLVM bitcode (.bc)Not supportedSupported
Python SDKNot availableFull access
Checker frameworkNot available9 built-in checkers
Z3 path refinementNot availableSupported
SARIF exportNot availableSupported
Graph visualizationBuilt-inExport to external tools

See Browser vs Full SAF for a detailed comparison.

Next Steps

Browser vs Full SAF

SAF is available in two forms: the browser-based playground (WebAssembly) and the full local installation (Docker). This page compares the two.

Feature Comparison

FeatureBrowser (WASM)Full SAF (Docker)
SetupNone -- open a URLDocker + make shell
LLVM frontendNo (tree-sitter)Yes (inkwell, full LLVM API)
Z3 solverNoYes
Context-sensitive PTALimitedYes
Large programs (>1000 LOC)SlowFast
Python SDKPyodide (limited)Full
Custom specsYesYes
Checker framework9 built-in checkers (via query protocol)9 built-in checkers + custom checkers
SARIF exportNot availableStandards-compliant output via CLI
Graph visualizationBuilt-in Cytoscape.jsExport JSON for external tools
Batch scanningSingle fileMulti-file projects
Offline useRequires Compiler Explorer API for CFully offline

When to Use the Browser

The browser playground is ideal for:

  • Learning: Experiment with program analysis concepts interactively
  • Quick checks: Paste a snippet and see the CFG or call graph immediately
  • Demonstrations: Show analysis capabilities without requiring setup
  • Embedding: Include analysis widgets in documentation or blog posts

When to Use Full SAF

The full installation is needed for:

  • Production analysis: Scanning real codebases for vulnerabilities
  • Custom checkers: Writing Python scripts that use the SDK
  • CI/CD integration: Automated scanning with SARIF output
  • Advanced analysis: Z3-based path refinement, IFDS taint tracking
  • Bitcode input: Analyzing pre-compiled .bc files
  • Batch scanning: Processing multiple files or entire projects

Architecture Differences

Browser

C code --> Compiler Explorer API --> LLVM IR (.ll text)
  --> tree-sitter-llvm (WASM) --> CST
    --> TypeScript converter --> AIR JSON
      --> saf-wasm (WASM) --> PropertyGraph JSON
        --> Cytoscape.js visualization

The browser path uses tree-sitter to parse LLVM IR text and a TypeScript layer to convert the parse tree into AIR JSON. The saf-wasm crate contains saf-core and saf-analysis compiled to WebAssembly, without LLVM, Z3, or threading dependencies.

Full SAF

C code --> clang-18 --> LLVM IR (.ll or .bc)
  --> inkwell (LLVM C API) --> AIR
    --> saf-analysis (native) --> Analysis results
      --> Python SDK / CLI / JSON / SARIF

The full path uses inkwell to parse LLVM IR/bitcode natively with full access to LLVM's type system and metadata. Analysis runs in native Rust with threading support.

Migrating from Browser to Full SAF

If you start with the playground and want to move to the full SDK:

  1. Install Docker and clone the repository (see Installation)
  2. Save your C code to a .c file
  3. Write a Python detection script using the same analysis concepts
  4. Run inside make shell

The analysis concepts (sources, sinks, graph types) are the same in both environments. The full SDK simply provides more control and additional features.

Analysis IR (AIR)

AIR (Analysis Intermediate Representation) is SAF's canonical, frontend-agnostic intermediate representation. All analysis passes operate on AIR, never on frontend-specific types like LLVM IR or Clang ASTs.

Why a Separate IR?

SAF supports multiple frontends (LLVM bitcode, AIR JSON, and potentially source-level frontends in the future). Rather than coupling analysis algorithms to any specific input format, SAF defines AIR as a common target:

LLVM bitcode (.bc/.ll)  -->  LLVM Frontend  -->  AIR
AIR JSON (.air.json)    -->  JSON Frontend  -->  AIR
(future: Clang AST)     -->  AST Frontend   -->  AIR
                                                   |
                                                   v
                                              Analysis passes
                                         (CFG, PTA, ValueFlow, ...)

This design means adding a new frontend requires only implementing the mapping to AIR -- no changes to analysis algorithms.

Structure

An AIR module contains:

EntityDescription
ModuleTop-level container with a fingerprint and metadata
FunctionsNamed functions with parameters, return types, and basic blocks
Basic BlocksSequences of instructions with a terminator
InstructionsIndividual operations (alloc, load, store, call, etc.)
ValuesSSA registers, function parameters, constants, and globals
ObjectsMemory objects (stack allocas, heap allocations, globals)

Operations

AIR supports the following operation types:

CategoryOperations
AllocationAlloca (stack), Global, HeapAlloc (malloc/calloc)
MemoryLoad, Store, GEP (field/element access), Memcpy, Memset
ControlBr (branch), Switch, Ret (return)
SSAPhi, Select
CallsCallDirect, CallIndirect
TransformsCast, BinaryOp (arithmetic, bitwise)

Deterministic IDs

Every AIR entity has a deterministic u128 ID derived from BLAKE3 hashes. IDs are serialized as 0x followed by 32 lowercase hex characters (e.g., 0x1a2b3c4d5e6f...).

The ID derivation hierarchy:

ModuleFingerprint = hash(FrontendId, input_fingerprint_bytes)
  FunctionId = hash(ModuleFingerprint, "fn", function_key)
    BlockId = hash(FunctionId, "bb", block_index)
      InstId = hash(BlockId, "inst", inst_index, opcode_tag)
        ValueId = derived from instruction results, args, globals, constants
    ObjId = derived from allocas, heap allocators, globals
      LocId = hash(ObjId, "loc", field_path)

This means identical inputs always produce identical IDs, regardless of when or where the analysis runs. Debug information does not affect structural IDs by default.

Source Metadata

AIR instructions can carry optional source-level metadata:

  • Span: File, line, column, byte offsets for source location
  • Symbol: Display name, mangled name, namespace path
  • Type representation: Frontend-specific type string

This metadata enables source-level error reporting without coupling the analysis to any particular frontend.

Next Steps

Control Flow Graphs

A control flow graph (CFG) represents the possible execution paths within a function. An interprocedural control flow graph (ICFG) extends this to the whole program by connecting function calls to their targets.

CFG Basics

A CFG is a directed graph where:

  • Nodes are basic blocks -- straight-line sequences of instructions with no branches except at the end
  • Edges represent possible control flow between blocks

Key Concepts

ConceptDescription
Entry blockWhere function execution begins (no predecessors)
Exit blockEnds with a return instruction (no successors)
Branch blockHas multiple successors (if/while/switch)
Merge blockHas multiple predecessors (join point after a branch)
Back-edgeEdge to an earlier block, indicating a loop

Example

int abs(int x) {
    if (x < 0)       // Block 0: entry, branch
        x = -x;      // Block 1: negate
    return x;         // Block 2: merge, exit
}

The CFG for this function has three blocks:

[Block 0: entry] --true--> [Block 1: negate]
       |                         |
       +---false--> [Block 2: return] <--+

Block 2 is a merge point (two predecessors) and an exit point (return).

ICFG

The ICFG connects CFGs across functions by adding call and return edges:

  • Call edge: From a call instruction to the callee's entry block
  • Return edge: From the callee's return to the instruction after the call
main:
  [Block 0] --call--> validate: [entry]
                                  ...
                      validate: [exit] --return--> [Block 1]

The ICFG enables interprocedural analysis -- tracking data flow across function boundaries.

SAF's PropertyGraph Format

SAF exports CFGs in the unified PropertyGraph JSON format:

{
  "schema_version": "0.1.0",
  "graph_type": "cfg",
  "nodes": [
    {
      "id": "0x...",
      "labels": ["Block", "Entry"],
      "properties": {
        "name": "entry",
        "function": "main"
      }
    }
  ],
  "edges": [
    {
      "src": "0x...",
      "dst": "0x...",
      "edge_type": "FLOWS_TO",
      "properties": {}
    }
  ]
}
  • Nodes have labels including "Block" and optionally "Entry" for entry blocks
  • The properties.function field groups blocks by function
  • Edges have edge_type: "FLOWS_TO"

Exporting with the Python SDK

from saf import Project

proj = Project.open("program.ll")
graphs = proj.graphs()
cfg = graphs.export("cfg")

# Group blocks by function
from collections import defaultdict
by_function = defaultdict(list)
for node in cfg["nodes"]:
    fn = node["properties"]["function"]
    by_function[fn].append(node)

for fn, blocks in by_function.items():
    entry = [b for b in blocks if "Entry" in b["labels"]]
    print(f"{fn}: {len(blocks)} blocks, entry={entry[0]['id']}")

Why CFGs Matter

CFGs are the foundation for:

  • Reachability analysis: Is a particular code path possible?
  • Loop detection: Back-edges in the CFG reveal loops
  • Dominance: Which blocks must execute before others?
  • Dead code: Blocks with no path from entry are unreachable
  • Taint propagation: Data flows follow control flow paths

Next Steps

Call Graphs

A call graph is a directed graph representing the calling relationships between functions in a program. It answers the question: "which functions can call which other functions?"

Structure

  • Nodes represent functions
  • Edges represent call sites (function A calls function B)

Key Concepts

ConceptDescription
Entry pointFunction with no callers (e.g., main)
Leaf functionFunction that calls no other user-defined functions
Fan-outNumber of distinct functions a function calls
Fan-inNumber of distinct callers a function has
Strongly connected componentGroup of functions that (transitively) call each other (recursion)

Example

main
  |-- log_error          (shared utility)
  \-- parse_request
        |-- validate_input
        |     \-- log_error
        |-- process_data
        \-- send_response

Here, main is the entry point, and log_error has fan-in of 2 (called by both main and validate_input). process_data, send_response, and log_error are leaf functions.

Direct vs Indirect Calls

SAF distinguishes between:

  • Direct calls: The target function is known statically (e.g., foo())
  • Indirect calls: The target is a function pointer (e.g., fptr())

For indirect calls, SAF uses points-to analysis to resolve the possible targets. The call graph is refined as PTA discovers which functions a pointer may refer to.

SAF's PropertyGraph Format

{
  "schema_version": "0.1.0",
  "graph_type": "callgraph",
  "nodes": [
    {
      "id": "0x...",
      "labels": ["Function"],
      "properties": {
        "name": "main",
        "kind": "defined"
      }
    }
  ],
  "edges": [
    {
      "src": "0x...",
      "dst": "0x...",
      "edge_type": "CALLS",
      "properties": {}
    }
  ]
}
  • Nodes have labels: ["Function"] and properties.name for the function name
  • The properties.kind field indicates "defined" (has a body) or "external"
  • Edges have edge_type: "CALLS"

Exporting with the Python SDK

from saf import Project

proj = Project.open("program.ll")
graphs = proj.graphs()
cg = graphs.export("callgraph")

nodes = cg["nodes"]
edges = cg["edges"]

# Build adjacency
callees = {}
callers = {}
for node in nodes:
    nid = node["id"]
    name = node["properties"]["name"]
    callees[nid] = []
    callers[nid] = []

for edge in edges:
    callees[edge["src"]].append(edge["dst"])
    callers[edge["dst"]].append(edge["src"])

# Find entry points and leaves
for node in nodes:
    nid = node["id"]
    name = node["properties"]["name"]
    if not callers[nid]:
        print(f"Entry point: {name}")
    if not callees[nid]:
        print(f"Leaf: {name}")

Why Call Graphs Matter

Call graphs are essential for:

  • Attack surface analysis: Entry points and functions that process external input
  • Dead code detection: Functions with no callers may be unused
  • Interprocedural analysis: Taint and value flow follow call edges
  • Modular analysis: Analyze one function at a time, using summaries for callees
  • Dependency analysis: Understanding how changes propagate through a codebase

Next Steps

Points-To Analysis

Points-to analysis (PTA) determines what memory locations each pointer in a program may refer to. This is fundamental to understanding aliasing, resolving indirect calls, and building accurate data flow graphs.

The Problem

Consider:

int x = 10, y = 20;
int *p = &x;
int *q = p;     // q also points to x
*q = 30;        // modifies x through q

Without PTA, an analysis cannot determine that modifying *q affects x. Points-to analysis computes that both p and q point to the same object x, enabling the analysis to track data flow through memory.

Andersen's Analysis

SAF implements Andersen-style inclusion-based points-to analysis. This is a whole-program, flow-insensitive, context-insensitive analysis that computes a conservative approximation of the points-to sets.

Constraint Types

PTA works by extracting constraints from the program and solving them:

ConstraintMeaningExample
Addrp points to object op = &x
Copyp points to everything q points top = q
Loadp points to what *q points top = *q
Store*p points to what q points to*p = q
GEPp points to a field of what q points top = &q->field

Solving

The solver iterates until a fixed point:

  1. Initialize points-to sets from Addr constraints
  2. Process Copy constraints: propagate points-to sets
  3. Process Load/Store constraints: follow indirections
  4. Repeat until no points-to set changes

Field Sensitivity

SAF supports field-sensitive analysis, distinguishing different fields of a struct:

struct Pair { int *a; int *b; };
struct Pair s;
s.a = &x;   // s.a -> x
s.b = &y;   // s.b -> y (distinct from s.a)

Without field sensitivity, s.a and s.b would be conflated, causing false aliasing.

Querying Points-To Results

Python SDK

from saf import Project

proj = Project.open("program.ll")
q = proj.query()

# What does pointer p point to? (pass the hex ID of the pointer value)
pts = q.points_to("0x00000000000000000000000000000001")
print(f"p points to: {pts}")

# Do p and q alias? (pass hex IDs of both pointer values)
alias = q.may_alias("0x00000000000000000000000000000001",
                     "0x00000000000000000000000000000002")
print(f"p and q may alias: {alias}")

Export Format

pta = proj.pta_result().export()
# Returns: {"points_to": [{"value": "0x...", "locations": ["0x...", ...]}, ...]}

Impact on Other Analyses

PTA results feed into multiple downstream analyses:

AnalysisHow PTA Helps
Call graphResolves indirect call targets (function pointers)
ValueFlowDetermines which stores affect which loads
Taint analysisTracks taint through pointer indirections
Alias analysisIdentifies when two pointers may refer to the same memory

Without PTA, these analyses must conservatively assume that any pointer could point to any object, leading to excessive false positives.

Configuration

PTA behavior can be tuned via configuration:

{
  "pta": {
    "enabled": true,
    "field_sensitivity": "struct_fields",
    "max_objects": 100000
  }
}
  • field_sensitivity: "none" or "struct_fields"
  • max_objects: Safety cap; when exceeded, analysis collapses to a conservative approximation using unknown_mem

Next Steps

Value Flow

The ValueFlow graph is SAF's central data flow representation. It tracks how values move through a program -- from where they are created to where they are used -- across function boundaries and through memory operations.

From Def-Use to ValueFlow

Def-Use Chains

A def-use chain connects a value's definition to its uses within a single function:

int x = 10;        // definition of x
printf("%d", x);   // use of x
return x;           // another use of x

Def-use chains are intraprocedural (within one function) and track SSA values.

ValueFlow Graph

The ValueFlow graph extends def-use chains with:

  • Interprocedural edges: Data flowing across function calls (arguments and return values)
  • Memory modeling: Data flowing through stores and loads, resolved by PTA
  • Transform edges: Data modified by arithmetic, casts, or other operations

Edge Types

Edge TypeMeaningExample
DefUseSSA def-use chain (direct assignment)y = x
StoreValue written to memory*p = x
LoadValue read from memoryy = *p
CallArgValue passed as function argumentfoo(x)
ReturnValue returned from functionreturn x
TransformValue modified by an operationy = x + 1

Node Types

Node KindMeaning
ValueAn SSA register value
LocationA memory location (from pointer analysis)
UnknownMemUnknown or external memory

Example

char *buf = malloc(64);    // Value: malloc return
strcpy(buf, "hello");      // CallArg: buf -> strcpy arg 0
log_message(buf);           // CallArg: buf -> log_message arg 0
free(buf);                  // CallArg: buf -> free arg 0

The ValueFlow graph captures all of these flows:

malloc() return
    |
    +--[CallArg]--> strcpy arg 0
    +--[CallArg]--> log_message arg 0
    +--[CallArg]--> free arg 0

PropertyGraph Format

{
  "schema_version": "0.1.0",
  "graph_type": "valueflow",
  "metadata": {
    "node_count": 42,
    "edge_count": 58
  },
  "nodes": [
    {
      "id": "0x...",
      "labels": ["Value"],
      "properties": { "kind": "Value" }
    }
  ],
  "edges": [
    {
      "src": "0x...",
      "dst": "0x...",
      "edge_type": "DEFUSE",
      "properties": {}
    }
  ]
}

Exporting with the Python SDK

from saf import Project

proj = Project.open("program.ll")
graphs = proj.graphs()

# Def-use chains
defuse = graphs.export("defuse")
definitions = [e for e in defuse["edges"] if e["edge_type"] == "DEFINES"]
uses = [e for e in defuse["edges"] if e["edge_type"] == "USED_BY"]
print(f"Definitions: {len(definitions)}, Uses: {len(uses)}")

# Full ValueFlow graph
vf = graphs.export("valueflow")
print(f"Nodes: {len(vf['nodes'])}, Edges: {len(vf['edges'])}")

# Count edge types
from collections import Counter
edge_types = Counter(e["edge_type"] for e in vf["edges"])
for kind, count in edge_types.most_common():
    print(f"  {kind}: {count}")

How ValueFlow Enables Analysis

The ValueFlow graph is the foundation for SAF's query capabilities:

AnalysisHow It Uses ValueFlow
Taint flowBFS from source nodes to sink nodes
Memory leakCheck if allocation nodes reach exit without passing through free
Use-after-freeCheck if freed pointer reaches a dereference
Double freeCheck if freed pointer reaches another free

Next Steps

Taint Analysis

Taint analysis tracks the flow of untrusted data through a program to determine if it can reach security-sensitive operations. It is SAF's primary technique for detecting injection vulnerabilities, information leaks, and other data-flow security issues.

Core Concepts

Sources, Sinks, and Sanitizers

ConceptDefinitionExamples
SourceWhere untrusted ("tainted") data enters the programargv, getenv(), read(), fgets()
SinkA dangerous function that should never receive tainted datasystem(), execve(), printf() format arg
SanitizerA function that validates or cleans data, removing the taintInput validation, bounds checking, escaping

The Question

Taint analysis answers: "Can data from a source reach a sink without passing through a sanitizer?"

If the answer is yes, a finding is reported -- a potential vulnerability.

How SAF Performs Taint Analysis

SAF implements taint analysis as a graph reachability query over the ValueFlow graph:

  1. Identify source nodes in the ValueFlow graph (e.g., argv parameter, getenv() return value)
  2. Identify sink nodes (e.g., system() argument)
  3. Identify sanitizer nodes (optional -- nodes that "clean" the taint)
  4. BFS traversal from sources to sinks, skipping paths through sanitizers
  5. Report findings with deterministic trace paths

BFS vs IFDS

SAF provides two taint analysis modes:

ModeMethodPrecisionSpeed
BFSq.taint_flow()Flow-insensitiveFast
IFDSproj.ifds_taint()Context-sensitive, flow-sensitiveSlower

BFS is sufficient for most vulnerability detection. IFDS provides higher precision when false positives from flow-insensitive analysis are a concern.

Using the Python SDK

Basic Taint Flow

from saf import Project, sources, sinks

proj = Project.open("program.ll")
q = proj.query()

# Find flows from argv to system()
findings = q.taint_flow(
    sources=sources.function_param("main", 1),  # argv
    sinks=sinks.call("system", arg_index=0),     # system()'s first arg
)

for f in findings:
    print(f"Finding: {f.finding_id}")
    if f.trace:
        for step in f.trace.steps:
            print(f"  -> {step}")

Available Selectors

Source selectors:

SelectorDescription
sources.function_param(name, index)Function parameter by name and position
sources.function_return(name)Return value of a named function
sources.call(name)Return value from calls to a named function

Sink selectors:

SelectorDescription
sinks.call(name, arg_index=N)Argument N of calls to a named function
sinks.arg_to(name, index)Argument at index passed to a named function

With Sanitizers

from saf import sources, sinks

findings = q.taint_flow(
    sources=sources.function_param("main", 1),
    sinks=sinks.call("system", arg_index=0),
    sanitizers=sources.function_return("validate_input"),  # Paths through this function are safe
)

Common Vulnerability Patterns

VulnerabilityCWESourceSink
Command injectionCWE-78argv, getenv()system(), execve()
Format stringCWE-134User inputprintf() format arg
SQL injectionCWE-89HTTP parametersSQL query functions
Path traversalCWE-22User inputfopen(), open()
Buffer overflowCWE-120malloc() returnUnchecked memory write

Checker Framework

For common patterns, SAF provides built-in checkers that pre-configure the appropriate sources, sinks, and modes:

# Instead of manually specifying sources and sinks:
findings = proj.check("memory-leak")
findings = proj.check("use-after-free")
findings = proj.check("double-free")

# Or run all 9 built-in checkers at once
all_findings = proj.check_all()

The checker framework supports 9 built-in checkers covering memory safety, information flow, and resource management. See the Python SDK reference for the full list.

Next Steps

  • Tutorials -- Hands-on guides for UAF, leaks, double-free, and taint analysis
  • Python SDK Reference -- Full API reference for selectors, checkers, and queries

Python SDK

The SAF Python SDK (import saf) provides full access to SAF's static analysis capabilities from Python. It is built with PyO3 and installed via maturin.

Installation

The SDK is built automatically when entering the Docker environment:

make shell
# SDK is available immediately:
python3 -c "import saf; print(saf.version())"

For manual installation inside the dev container:

maturin develop --release

Core API

Project

The Project class is the main entry point for all analysis operations.

from saf import Project

# Open a project from LLVM IR
proj = Project.open("program.ll")

# Open from AIR-JSON
proj = Project.open("program.air.json")

# Open with analysis tuning parameters
proj = Project.open(
    "program.ll",
    vf_mode="precise",                # "fast" (default) or "precise"
    pta_solver="worklist",             # "worklist" (default) or "datalog"
    pta_max_iterations=20000,          # default: 10000
    field_sensitivity_depth=3,         # default: 2 (0 = disabled)
    max_refinement_iterations=5,       # default: 10
)

Project.open() signature:

Project.open(
    path: str,
    *,
    vf_mode: str = "fast",
    pta_solver: str = "worklist",
    pta_max_iterations: int | None = None,
    field_sensitivity_depth: int | None = None,
    max_refinement_iterations: int | None = None,
) -> Project
ParameterDescription
pathPath to input file (.air.json, .ll, or .bc). Frontend is selected automatically by extension.
vf_mode"fast" routes all memory through a single unknown node for robust taint analysis. "precise" uses points-to analysis to resolve memory locations (may miss flows through unresolved pointers).
pta_solver"worklist" uses the imperative worklist-based solver. "datalog" uses the Ascent Datalog fixpoint solver.
pta_max_iterationsMaximum PTA solver iterations. Default: 10000.
field_sensitivity_depthField sensitivity depth. 0 = disabled, default: 2. Higher values track deeper nested struct fields.
max_refinement_iterationsMaximum CG refinement iterations. Default: 10.

Raises: FrontendError if the input file cannot be parsed or the required frontend is not available.

Schema Discovery

schema = proj.schema()
# Returns a dict with structured information about:
# - tool_version, schema_version
# - frontends (air-json, llvm with extensions and descriptions)
# - graphs (cfg, callgraph, defuse, valueflow)
# - queries (taint_flow, flows, points_to, may_alias with parameters)
# - selectors (sources, sinks, sanitizers)

Query Context

from saf import sources, sinks, sanitizers

q = proj.query()

# Taint flow analysis
findings = q.taint_flow(
    sources=sources.function_param("main", 1),
    sinks=sinks.call("system", arg_index=0),
)

# With sanitizers (accepts a Selector or SelectorSet, not strings)
findings = q.taint_flow(
    sources=sources.function_param("main", 1),
    sinks=sinks.call("system", arg_index=0),
    sanitizers=sanitizers.call("validate_input"),
    limit=500,  # default: 1000
)

# Data flows (without sanitizer filtering)
findings = q.flows(
    sources=sources.function_param("read_data"),
    sinks=sinks.arg_to("write_output", 0),
    limit=100,
)

# Points-to query (takes hex value ID string)
pts = q.points_to("0x00000000000000000000000000000001")

# Alias query (takes hex value ID strings)
alias = q.may_alias("0x00000001", "0x00000002")

Query method signatures:

MethodParametersReturns
taint_flow(sources, sinks, sanitizers=None, *, limit=1000)Selector/SelectorSet for eachlist[Finding]
flows(sources, sinks, *, limit=1000)Selector/SelectorSet for eachlist[Finding]
points_to(ptr)Hex string value IDlist[str] (location IDs)
may_alias(p, q)Hex string value IDsbool

Graph Export

graphs = proj.graphs()

# List available graph types
print(graphs.available())  # ["cfg", "callgraph", "defuse", "valueflow"]

# Export to PropertyGraph dict
cfg = graphs.export("cfg")
cfg_main = graphs.export("cfg", function="main")  # single function
cg = graphs.export("callgraph")
du = graphs.export("defuse")
vf = graphs.export("valueflow")

# Export to Graphviz DOT string
dot_str = graphs.to_dot("callgraph")

# Export to interactive HTML (Cytoscape.js)
html_str = graphs.to_html("cfg", function="main")

All export() calls return a unified PropertyGraph dict:

{
    "schema_version": "0.1.0",
    "graph_type": "callgraph",
    "metadata": {},
    "nodes": [{"id": "0x...", "labels": [...], "properties": {...}}, ...],
    "edges": [{"src": "0x...", "dst": "0x...", "edge_type": "...", "properties": {}}, ...],
}

Source and Sink Selectors

Selectors identify values in the program for taint analysis. They can be combined using the | operator to form a SelectorSet.

Sources (saf.sources)

FunctionDescription
function_param(function, index=None)Select function parameters by name pattern (glob-style). index is 0-based; None selects all parameters.
function_return(function)Select function return values by name pattern.
call(callee)Select return values of calls to a function.
argv()Select command-line arguments (shortcut for function_param("main", None)).
getenv(name=None)Select environment variable reads (shortcut for call("getenv")).
from saf import sources

src = sources.function_param("main", 1)
src = sources.function_param("read_*")      # glob pattern
src = sources.function_return("get_input")
src = sources.call("getenv")
src = sources.argv()

# Combine with |
combined = sources.argv() | sources.getenv()

Sinks (saf.sinks)

FunctionDescription
call(callee, *, arg_index=None)Select calls to a function. If arg_index is given, selects that argument; otherwise selects the call result.
arg_to(callee, index)Select arguments passed to a function (0-based index).
from saf import sinks

sink = sinks.call("system", arg_index=0)
sink = sinks.call("printf", arg_index=0)
sink = sinks.arg_to("free", 0)

# Without arg_index, selects the call result
sink = sinks.call("dangerous_function")

Sanitizers (saf.sanitizers)

FunctionDescription
call(callee, *, arg_index=None)Select calls to a sanitizing function. If arg_index is given, selects that argument; otherwise the return value is considered sanitized.
arg_to(callee, index)Select arguments passed to a sanitizing function.
from saf import sanitizers

san = sanitizers.call("escape_html", arg_index=0)
san = sanitizers.call("sanitize_input")

# Combine sanitizers
combined = sanitizers.call("sanitize") | sanitizers.call("escape")

Module-Level Selector Factories

The following factory functions are also available directly from saf._saf (used internally by sources, sinks, and sanitizers modules):

  • function_param(function, index=None) -- Select function parameters.
  • function_return(function) -- Select function return values.
  • call(callee) -- Select call results.
  • arg_to(callee, index=None) -- Select arguments to a callee.

Checker Framework

Running Built-In Checkers

# Run a specific checker
findings = proj.check("memory-leak")

# Run multiple checkers at once (pass a list)
findings = proj.check(["memory-leak", "use-after-free", "double-free"])

# Run all 9 built-in checkers
findings = proj.check_all()

# List available checkers and their metadata
schema = proj.checker_schema()
for checker in schema["checkers"]:
    print(f"{checker['name']}: {checker['description']} (CWE-{checker['cwe']})")

Built-In Checker Table

Checker NameCWEDescription
memory-leak401Allocated memory never freed
use-after-free416Memory accessed after being freed
double-free415Memory freed more than once
null-deref476Null pointer dereference
file-descriptor-leak403Opened file never closed
uninit-use457Use of uninitialized memory
stack-escape562Returning stack address
lock-not-released764Mutex not unlocked
generic-resource-leakN/ACustom resource tracking

Custom Checkers

# Define a custom checker with source/sink/sanitizer roles
findings = proj.check_custom(
    "my-custom-leak",
    mode="must_not_reach",        # "may_reach", "must_not_reach", or "never_reach_sink"
    source_role="allocator",       # resource role for sources
    source_match_return=True,      # match return value (True) or first arg (False)
    sink_is_exit=True,             # sinks are function exits
    sink_role=None,                # or a resource role string
    sanitizer_role="deallocator",  # or None
    sanitizer_match_return=False,
    cwe=401,                       # optional CWE ID
    severity="warning",            # "info", "warning", "error", "critical"
)

Path-Sensitive Checking (Z3)

# Run checkers with Z3-based path feasibility filtering
result = proj.check_path_sensitive("null-deref", z3_timeout_ms=2000, max_guards=64)

# Or run all checkers with path sensitivity
result = proj.check_all_path_sensitive(z3_timeout_ms=1000, max_guards=64)

# Result has feasible, infeasible, and unknown findings
print(f"Real bugs: {len(result.feasible)}")
print(f"False positives filtered: {len(result.infeasible)}")
print(f"Unknown: {len(result.unknown)}")
print(result.diagnostics)  # dict with Z3 statistics

# Post-filter existing findings
raw_findings = proj.check_all()
result = proj.filter_infeasible(raw_findings, z3_timeout_ms=1000, max_guards=64)

CheckerFinding Attributes

Each item returned by check(), check_all(), or check_custom() is a CheckerFinding:

AttributeTypeDescription
checkerstrChecker name that produced this finding
severitystr"info", "warning", "error", or "critical"
cweint | NoneCWE ID if applicable
messagestrHuman-readable description
sourcestrSource SVFG node hex ID
sinkstrSink SVFG node hex ID
tracelist[str]Path from source to sink as hex node IDs
sink_traceslist[dict]Per-sink traces for multi-reach findings (e.g., double-free). Each dict has "sink" and "trace" keys.
for f in proj.check("use-after-free"):
    print(f.checker, f.severity, f.message)
    print(f"  CWE-{f.cwe}: {f.source} -> {f.sink}")
    print(f"  Trace length: {len(f.trace)}")
    # Convert to dict
    d = f.to_dict()

Finding Objects

The taint_flow() and flows() query methods return Finding objects, which are distinct from CheckerFinding objects.

Finding Attributes

AttributeTypeDescription
finding_idstrDeterministic hex identifier
source_locationstrSource location (file:line:col or value ID)
sink_locationstrSink location (file:line:col or value ID)
source_idstrSource value ID (hex)
sink_idstrSink value ID (hex)
rule_idstr | NoneOptional rule identifier
traceTraceStep-by-step data flow path
for f in q.taint_flow(sources.argv(), sinks.call("system", arg_index=0)):
    print(f"{f.source_location} -> {f.sink_location}")
    print(f.trace.pretty())       # human-readable trace
    print(f"Steps: {len(f.trace)}")
    d = f.to_dict()               # convert to dict

Trace and TraceStep

A Trace contains a list of TraceStep objects. Each step represents one hop in the value-flow graph:

TraceStep AttributeTypeDescription
from_idstrSource node ID
from_kindstrSource node kind
from_symbolstr | NoneSymbol name at source
from_locationstr | NoneSource file:line:col
edgestrEdge kind (def_use, transform, store, load, etc.)
to_idstrTarget node ID
to_kindstrTarget node kind
to_symbolstr | NoneSymbol name at target
to_locationstr | NoneTarget file:line:col
for step in finding.trace.steps:
    print(f"  {step.from_symbol or step.from_id} --{step.edge}-> "
          f"{step.to_symbol or step.to_id}")

Resource Table

The resource table maps function names to resource management roles. It ships with built-in entries for C stdlib, C++ operators, POSIX I/O, and pthreads.

table = proj.resource_table()

table.has_role("malloc", "allocator")    # True
table.has_role("free", "deallocator")    # True
table.has_role("fopen", "acquire")       # True
table.has_role("fclose", "release")      # True

# Add custom entries
table.add("my_alloc", "allocator")
table.add("my_free", "deallocator")

# Inspect
print(table.size)                 # number of entries
print(table.function_names())     # sorted list of function names
entries = table.export()          # list of {"name": ..., "roles": [...]}

Available roles: allocator, deallocator, reallocator, acquire, release, lock, unlock, null_source, dereference.

Advanced Analysis

IFDS Taint Analysis

Precise interprocedural taint tracking using the IFDS framework (Reps/Horwitz/Sagiv tabulation algorithm):

result = proj.ifds_taint(
    sources=sources.function_param("main", 0),
    sinks=sinks.call("system", arg_index=0),
    sanitizers=sanitizers.call("validate"),  # optional
)

Typestate Analysis

Track per-resource state machines using the IDE framework:

# Built-in specs: "file_io", "mutex_lock", "memory_alloc"
result = proj.typestate("file_io")

# Custom typestate spec
from saf import TypestateSpec
result = proj.typestate_custom(spec)

Flow-Sensitive Pointer Analysis

More precise than Andersen's flow-insensitive analysis for programs with pointer reassignment:

fs_result = proj.flow_sensitive_pta(pts_repr="auto")
# pts_repr options: "auto", "btreeset", "bitvector", "bdd"

Context-Sensitive Pointer Analysis (k-CFA)

Distinguishes calls to the same function from different call sites:

cs_result = proj.context_sensitive_pta(k=1, pts_repr="auto")

Demand-Driven Pointer Analysis

Computes points-to information only for explicitly queried pointers:

dda = proj.demand_pta(
    max_steps=100_000,
    max_context_depth=10,
    timeout_ms=5000,
    enable_strong_updates=True,
    pts_repr="auto",
)

Memory SSA and SVFG

mssa = proj.memory_ssa()      # Memory SSA representation
svfg = proj.svfg()            # Sparse Value-Flow Graph

Call Graph Refinement

Iterative CHA + PTA-based indirect call resolution:

result = proj.refine_call_graph(entry_points="all", max_iterations=10)

Abstract Interpretation

Numeric interval analysis with widening/narrowing:

result = proj.abstract_interp(
    max_widening=100,
    narrowing_iterations=3,
    use_thresholds=True,
)

Numeric Checkers

# Individual numeric checker
findings = proj.check_numeric("buffer_overflow")     # CWE-120
findings = proj.check_numeric("integer_overflow")    # CWE-190
findings = proj.check_numeric("division_by_zero")    # CWE-369
findings = proj.check_numeric("shift_count")         # CWE-682

# All numeric checkers at once
findings = proj.check_all_numeric()

Combined PTA + Abstract Interpretation

Alias-aware numeric analysis with bidirectional refinement:

result = proj.analyze_combined(
    enable_refinement=True,
    max_refinement_iterations=3,
)
interval = result.interval_at("0x1234...")
alias = result.may_alias("0x5678...", "0x9abc...")

Z3 Path Refinement

# Prove/disprove assertions
result = proj.prove_assertions(z3_timeout_ms=1000, max_guards=64)

# Refine alias query with path constraints
result = proj.refine_alias("0xP", "0xQ", at_block="0xB", func_id="0xF")

# Check if a feasible path exists between two blocks
result = proj.check_path_reachable(
    from_block="0xB1", to_block="0xB2", func_id="0xF",
    z3_timeout_ms=1000, max_guards=64, max_paths=100,
)

JSON Protocol (LLM Agent Interface)

import json
resp = proj.request('{"action": "schema"}')
data = json.loads(resp)

AIR Module Access

The AirModule provides mid-level access to the intermediate representation:

air = proj.air()

print(air.name)              # module name
print(air.id)                # module ID (hex)
print(air.function_count)    # number of functions
print(air.global_count)      # number of globals
print(air.function_names())  # list of function names
print(air.global_names())    # list of global names

Visualization

The saf.viz module provides dependency-free graph visualization:

from saf import viz

pg = proj.graphs().export("callgraph")

# Graphviz DOT string (no dependencies)
dot_str = viz.to_dot(pg)

# Interactive HTML with Cytoscape.js (no dependencies)
html_str = viz.to_html(pg)

# Open in browser or save to file
viz.visualize(pg)                              # opens in browser
viz.visualize(pg, output="callgraph.html")     # saves to file

# Cytoscape.js JSON (for ipycytoscape in Jupyter)
cy_json = viz.to_cytoscape_json(pg)

# NetworkX DiGraph (requires networkx)
G = viz.to_networkx(pg)

# graphviz.Digraph object (requires graphviz package)
gv = viz.to_graphviz(pg)
gv.render("graph", format="svg")

Exceptions

All SAF exceptions inherit from SafError and carry .code and .details attributes for structured error handling:

ExceptionDescription
SafErrorBase exception for all SAF errors
FrontendErrorFrontend ingestion errors (parsing, I/O, unsupported features)
AnalysisErrorAnalysis errors (PTA timeout, ValueFlow build error)
QueryErrorQuery execution errors (invalid selector, no match)
ConfigErrorConfiguration errors (invalid field, incompatible options)
from saf import Project, SafError, FrontendError

try:
    proj = Project.open("nonexistent.ll")
except FrontendError as e:
    print(f"Error: {e}")

Module Exports

The saf package exports the following names:

from saf import (
    # Core classes
    Project, Query, Finding, Trace, TraceStep,
    # Selectors
    Selector, SelectorSet,
    # Selector modules
    sources, sinks, sanitizers, viz,
    # Checker types
    CheckerFinding, PathSensitiveResult, ResourceTable,
    # Typestate types
    TypestateResult, TypestateFinding, TypestateSpec, typestate_specs,
    # Exceptions
    SafError, FrontendError, AnalysisError, QueryError, ConfigError,
    # Resource role constants
    Allocator, Deallocator, Reallocator, Acquire, Release,
    NullSource, Dereference,
    # Reachability mode constants
    MayReach, MustNotReach,
    # Severity constants
    Info, Warning, Error, Critical,
    # Functions
    version,
)

CLI

The SAF command-line interface provides access to analysis capabilities from the terminal. The binary is named saf.

Global Options

OptionDescription
--json-errorsOutput errors as JSON for machine-readable consumption

Commands

saf index

Index input files via a frontend to produce AIR-JSON.

saf index program.ll
saf index program.ll --output bundle.air.json
saf index --frontend air-json bundle.air.json
OptionTypeDefaultDescription
<inputs>positional, requiredInput files to index
--frontend <frontend>llvm, air-jsonllvmFrontend to use for ingestion
--output <path>pathstdoutWrite AIR-JSON output to file instead of stdout

saf run

Run analysis passes on input programs. This is the primary command -- it ingests input files, builds internal representations (CFG, call graph, pointer analysis), runs the selected checkers, and produces findings.

saf run program.ll
saf run program.ll --mode fast
saf run program.ll --format sarif --output report.sarif
saf run program.ll --checkers memory-leak,null-deref --pta cspta --pta-k 2

Input Options

OptionTypeDefaultDescription
<inputs>positional, requiredInput files to analyze
--frontend <frontend>llvm, air-jsonllvmFrontend to use for ingestion
--specs <path>pathAdditional spec files or directories

Analysis Options

OptionTypeDefaultDescription
--mode <mode>fast, precisepreciseAnalysis mode
--checkers <list>stringallCheckers to run (comma-separated, or all / none)
--combinedflagRun combined PTA + abstract interpretation

Pointer Analysis Options

OptionTypeDefaultDescription
--pta <variant>andersen, cspta, fspta, ddaandersenPTA variant
--solver <backend>worklist, datalogworklistPTA solver backend
--pts-repr <repr>auto, btreeset, fxhash, roaring, bddautoPoints-to set representation
--pta-k <n>integer2k-CFA depth (cspta only)
--field-sensitivity <level>struct-fields, array-index, flatstruct-fieldsField sensitivity level
--max-pta-iterations <n>integerMaximum PTA iterations

Z3 Options

OptionTypeDefaultDescription
--path-sensitiveflagEnable Z3 path-sensitive checker filtering
--z3-proveflagProve assertions via Z3
--z3-refine-aliasflagRefine alias results via Z3
--z3-check-reachabilityflagCheck path reachability via Z3
--z3-timeout <ms>integer5000Z3 solver timeout in milliseconds

Typestate and Taint Options

OptionTypeDefaultDescription
--typestate <name>stringRun built-in typestate spec
--typestate-custom <path>pathRun custom typestate spec from YAML
--ifds-taint <path>pathRun IFDS taint analysis with config file

Output Options

OptionTypeDefaultDescription
--format <fmt>human, json, sarifhumanOutput format
--output <path>pathstdoutWrite output to file
--diagnosticsflagInclude checker/PTA diagnostics
--verboseflagShow timing, resource table, stats

saf query

Query analysis results. Requires --input and a subcommand.

saf query points-to --input program.ll 0x00ab1234...
saf query alias --input program.ll 0x001a... 0x002b...
saf query flows --input program.ll 0xsource... 0xsink...
saf query taint --input program.ll 0xsource... 0xsink...
saf query reachable --input program.ll 0xfunc1... 0xfunc2...
OptionTypeDefaultDescription
--input <files>path(s), requiredInput files to analyze
--frontend <frontend>llvm, air-jsonllvmFrontend to use for ingestion

Subcommands

SubcommandArgumentsDescription
points-to<pointer> (hex value ID)Points-to set for a value
alias<p> <q> (hex value IDs)May-alias check between two pointers
flows<source> <sink> (hex value IDs)Data-flow reachability
taint<source> <sink> (hex value IDs)Taint-flow query
reachable<func_ids>... (hex function IDs)Call-graph reachability from functions

saf export

Export graphs or findings in various formats.

saf export cfg --input program.ll
saf export callgraph --input program.ll --format dot --output cg.dot
saf export findings --input program.ll --format sarif --output report.sarif
saf export valueflow --input program.ll --format json --output vfg.json
OptionTypeDefaultDescription
<target>positional, requiredExport target (see below)
--input <files>path(s), requiredInput files to analyze
--format <fmt>json, sarif, dot, htmljsonOutput format
--output <path>pathstdoutWrite output to file instead of stdout
--function <name>stringFilter to a specific function (for CFG export)
--frontend <frontend>llvm, air-jsonllvmFrontend to use for ingestion

Export Targets

TargetDescription
cfgControl-flow graph
callgraphCall graph (inter-procedural)
defuseDef-use chains
valueflowValue-flow graph
svfgSparse Value-Flow Graph
findingsAnalysis findings / bug reports
ptaPoints-to analysis results

saf schema

Print the SAF schema (supported frontends, queries, checkers).

saf schema
saf schema --checkers
saf schema --frontends
saf schema --format json
OptionTypeDefaultDescription
--checkersflagList available checkers only
--frontendsflagList available frontends only
--format <fmt>human, json, sarifhumanOutput format

saf specs

Manage function specifications (list, validate, look up).

saf specs list
saf specs list --verbose
saf specs validate path/to/specs.yaml
saf specs lookup malloc

Subcommands

SubcommandArgumentsDescription
list[--verbose]List loaded function specifications
validate<path> (required)Validate spec file or directory
lookup<name> (required)Look up the spec for a function by name

saf incremental

Run incremental analysis with caching. Only recomputes what changed between runs.

saf incremental src/*.ll
saf incremental src/*.ll --cache-dir .saf-cache
saf incremental src/*.ll --plan
saf incremental src/*.ll --clean
saf incremental src/*.ll --export-summaries summaries.yaml
OptionTypeDefaultDescription
<inputs>positional, requiredInput files to analyze
--frontend <frontend>llvm, air-jsonllvmFrontend to use for ingestion
--mode <mode>sound, best-effortbest-effortPrecision mode for incremental analysis
--cache-dir <path>path.saf-cacheCache directory for incremental state
--planflagDry-run: show what would be recomputed without running analysis
--cleanflagClear the cache before analysis
--export-summaries <path>pathExport computed summaries as YAML to the given path

saf help

Show topic-based help guides.

saf help               # Overview of all commands and topics
saf help run           # How to run analyses and configure passes
saf help checkers      # Built-in bug checkers and how to select them
saf help pta           # Pointer analysis variants and configuration
saf help typestate     # Typestate checking and custom protocol specs
saf help taint         # Taint analysis modes and configuration
saf help z3            # Z3-based path sensitivity and refinement
saf help export        # Export targets, formats, and examples
saf help specs         # Function specification format and discovery
saf help incremental   # Incremental analysis and caching
saf help examples      # Common usage patterns and recipes

Available Topics

TopicDescription
runHow to run analyses and configure passes
checkersBuilt-in bug checkers and how to select them
ptaPointer analysis variants and configuration
typestateTypestate checking and custom protocol specs
taintTaint analysis modes and configuration
z3Z3-based path sensitivity and refinement
exportExport targets, formats, and examples
specsFunction specification format and discovery
incrementalIncremental analysis and caching
examplesCommon usage patterns and recipes

Exit Codes

CodeMeaning
0Success (no findings, or findings exported)
1Error (invalid input, analysis failure)
2Findings detected (when used in CI mode)

Examples

Quick Scan

# Run all checkers with defaults (Andersen PTA, precise mode)
saf run program.ll

CI/CD Integration

# Run all checkers and output SARIF for GitHub Code Scanning
saf run program.ll --format sarif --output results.sarif

# Check exit code for pass/fail
if saf run program.ll --format json | jq '.findings | length' | grep -q '^0$'; then
    echo "No findings"
else
    echo "Findings detected"
    exit 1
fi

Batch Analysis

# Analyze multiple files
for f in src/*.ll; do
    saf run "$f" --format sarif --output "results/$(basename "$f" .ll).sarif"
done

Path-Sensitive Analysis

# Reduce false positives with Z3-based path feasibility filtering
saf run program.ll --path-sensitive --z3-timeout 10000

Export Call Graph as DOT

saf export callgraph --input program.ll --format dot --output cg.dot
dot -Tpng cg.dot -o cg.png

Incremental Analysis

# First run caches results; subsequent runs only re-analyze changed files
saf incremental src/*.ll --cache-dir .saf-cache

Demand-Driven Points-To Query

saf query points-to --input program.ll 0x00ab1234abcd5678

PropertyGraph Format

All SAF graph exports use a unified PropertyGraph JSON format. This format is shared across graph types (CFG, call graph, def-use, value-flow, SVFG, PTA) for consistent downstream processing. SAF also exports findings (checker results) and a native PTA format, documented below.

Schema

{
  "schema_version": "0.1.0",
  "graph_type": "<type>",
  "metadata": {},
  "nodes": [
    {
      "id": "0x...",
      "labels": ["Label1", "Label2"],
      "properties": { "key": "value" }
    }
  ],
  "edges": [
    {
      "src": "0x...",
      "dst": "0x...",
      "edge_type": "EDGE_TYPE",
      "properties": {}
    }
  ]
}

Top-Level Fields

FieldTypeDescription
schema_versionstringFormat version (currently "0.1.0")
graph_typestringOne of cfg, callgraph, defuse, valueflow, svfg, pta
metadataobjectGraph-specific metadata (e.g., node/edge counts)
nodesarrayList of node objects
edgesarrayList of edge objects

Node Fields

FieldTypeDescription
idstringDeterministic hex ID (0x + 32 hex chars)
labelsarray[string]Node type labels (e.g., ["Function"], ["Block", "Entry"])
propertiesobjectType-specific properties

Edge Fields

FieldTypeDescription
srcstringSource node ID
dststringDestination node ID
edge_typestringEdge type label
propertiesobjectEdge-specific properties

Graph Types

Call Graph (callgraph)

ElementDetails
Node labels["Function"]
Node propertiesname (function name), kind ("defined" or "external")
Edge type"CALLS"

CFG (cfg)

ElementDetails
Node labels["Block"], optionally ["Block", "Entry"]
Node propertiesname (block name), function (owning function)
Edge type"FLOWS_TO"

Def-Use (defuse)

ElementDetails
Node labels["Value"] or ["Instruction"]
Node propertiesVaries
Edge types"DEFINES", "USED_BY"

Value Flow (valueflow)

ElementDetails
Node labels["Value"], ["Location"], or ["UnknownMem"]
Node propertieskind ("Value", "Location", "UnknownMem")
Edge types"Direct", "Store", "Load", "CallArg", "Return", "Transform"
Metadatanode_count, edge_count

SVFG (svfg)

The Sparse Value-Flow Graph captures both direct (top-level SSA) and indirect (memory, via MSSA) value flows. It is the foundation for SVFG-based checkers such as null-pointer dereference and use-after-free detectors.

ElementDetails
Node labels["Value"] or ["MemPhi"]
Node propertieskind ("value" or "mem_phi")
Edge typesDIRECT_DEF, DIRECT_TRANSFORM, CALL_ARG, RETURN, INDIRECT_DEF, INDIRECT_STORE, INDIRECT_LOAD, PHI_FLOW
Metadatanode_count, edge_count

Edge type descriptions:

Edge typeCategoryDescription
DIRECT_DEFDirectSSA def-use chain (including phi incoming, select, copy)
DIRECT_TRANSFORMDirectBinary/unary/cast/GEP operand to result
CALL_ARGDirectActual argument to formal parameter
RETURNDirectCallee return value to caller result
INDIRECT_DEFIndirectStore value to load result (clobber is a store)
INDIRECT_STOREIndirectStore value to MemPhi node
INDIRECT_LOADIndirectMemPhi node to load result
PHI_FLOWIndirectMemPhi to MemPhi (nested phi chaining)

Example:

{
  "schema_version": "0.1.0",
  "graph_type": "svfg",
  "metadata": { "node_count": 3, "edge_count": 2 },
  "nodes": [
    { "id": "0x00000000000000000000000000000001", "labels": ["Value"], "properties": { "kind": "value" } },
    { "id": "0x00000000000000000000000000000064", "labels": ["MemPhi"], "properties": { "kind": "mem_phi" } },
    { "id": "0x00000000000000000000000000000002", "labels": ["Value"], "properties": { "kind": "value" } }
  ],
  "edges": [
    { "src": "0x00000000000000000000000000000001", "dst": "0x00000000000000000000000000000064", "edge_type": "INDIRECT_STORE", "properties": {} },
    { "src": "0x00000000000000000000000000000064", "dst": "0x00000000000000000000000000000002", "edge_type": "INDIRECT_LOAD", "properties": {} }
  ]
}

Note: The SVFG also has a native (non-PropertyGraph) export format with an SvfgExport schema that includes a diagnostics object with construction statistics (direct_edge_count, indirect_edge_count, mem_phi_count, skipped_call_clobbers, skipped_live_on_entry). The PropertyGraph format shown above is what saf export svfg produces.

PTA (pta) — PropertyGraph Format

The points-to analysis can be exported as a PropertyGraph. Pointer values become nodes and locations become nodes, connected by POINTS_TO edges.

ElementDetails
Node labels["Pointer"] or ["Location"]
Node propertiesLocation nodes carry obj (object ID hex) and optionally path (field path)
Edge type"POINTS_TO"

Example:

{
  "schema_version": "0.1.0",
  "graph_type": "pta",
  "metadata": {},
  "nodes": [
    { "id": "0x00000000000000000000000000000001", "labels": ["Pointer"], "properties": {} },
    { "id": "0x00000000000000000000000000000064", "labels": ["Location"], "properties": { "obj": "0x000000000000000000000000000000c8", "path": [".0"] } }
  ],
  "edges": [
    { "src": "0x00000000000000000000000000000001", "dst": "0x00000000000000000000000000000064", "edge_type": "POINTS_TO", "properties": {} }
  ]
}

PTA Native Export

The PTA also has a richer native export format (not PropertyGraph) that includes analysis configuration, all abstract locations, and diagnostics. This is the format returned by PtaResult::export() in the Rust API:

{
  "schema_version": "0.1.0",
  "config": {
    "enabled": true,
    "field_sensitivity": "struct_fields(max_depth=2)",
    "max_objects": 100000,
    "max_iterations": 100
  },
  "locations": [
    {
      "id": "0x...",
      "obj": "0x...",
      "path": [".0", "[2]"]
    }
  ],
  "points_to": [
    {
      "value": "0x...",
      "locations": ["0x...", "0x..."]
    }
  ],
  "diagnostics": {
    "iterations": 12,
    "iteration_limit_hit": false,
    "collapse_warning_count": 0,
    "constraint_count": 150,
    "location_count": 45
  }
}
FieldTypeDescription
configobjectPTA configuration used for the analysis run
locationsarrayAll abstract locations with object ID and field path
points_toarrayPoints-to sets: each entry maps a value to its locations
diagnosticsobjectSolver statistics (iterations, limits, constraint/location counts)

Findings Export

The findings export (saf export findings) produces a JSON array of checker findings. This is not a PropertyGraph -- it is a flat list of diagnostic results from all enabled checkers.

Each finding has the following structure:

[
  {
    "check": "null_deref",
    "severity": "error",
    "cwe": 476,
    "message": "Pointer may be null when dereferenced",
    "path": [
      { "location": "main:5", "event": "NULL assigned" },
      { "location": "main:10", "event": "pointer dereferenced" }
    ],
    "object": "p"
  }
]
FieldTypeDescription
checkstringName of the checker that produced this finding
severitystringOne of info, warning, error, critical
cwenumber?CWE ID if applicable (omitted when not set)
messagestringHuman-readable description of the issue
patharrayTrace events from source to sink (omitted when empty)
objectstring?Affected object name if applicable (omitted when not set)

Each entry in path is a PathEvent:

FieldTypeDescription
locationstringSource location description
eventstringWhat happened at this point
statestring?Typestate label (omitted when not applicable)

Determinism

All PropertyGraph exports are deterministic:

  • Nodes are sorted by (node_kind, referenced_id_hex)
  • Edges are sorted by (edge_kind, src_id_hex, dst_id_hex, label_hash_hex)
  • No timestamps or wall-clock-dependent data
  • Identical inputs produce byte-identical JSON output

Working with PropertyGraph

Python

import json
from saf import Project

proj = Project.open("program.ll")
graphs = proj.graphs()
cg = graphs.export("callgraph")

# Access nodes and edges directly
for node in cg["nodes"]:
    print(node["properties"]["name"])

for edge in cg["edges"]:
    print(f"{edge['src']} -> {edge['dst']}")

# Save to file
with open("callgraph.json", "w") as f:
    json.dump(cg, f, indent=2)

jq

# List function names
jq '.nodes[] | .properties.name' callgraph.json

# Count edges by type
jq '[.edges[] | .edge_type] | group_by(.) | map({type: .[0], count: length})' graph.json

# Find high fan-out nodes
jq '[.edges[] | .src] | group_by(.) | map({node: .[0], out: length}) | sort_by(-.out) | .[0:5]' callgraph.json

NetworkX

import json
import networkx as nx

with open("callgraph.json") as f:
    data = json.load(f)

G = nx.DiGraph()
for node in data["nodes"]:
    G.add_node(node["id"], **node.get("properties", {}))
for edge in data["edges"]:
    G.add_edge(edge["src"], edge["dst"], edge_type=edge.get("edge_type", ""))

print(f"Nodes: {G.number_of_nodes()}")
print(f"Edges: {G.number_of_edges()}")
print(f"Components: {nx.number_weakly_connected_components(G)}")

SAF Widgets

SAF provides embeddable widgets that bring interactive program analysis into any web page. Widgets are implemented as iframes pointing to the SAF playground in embed mode.

Basic Usage

Add a widget to any HTML page:

<div class="saf-widget">
  <iframe
    src="../../playground/?embed=true&split=true&example=taint_flow&graph=cfg"
    loading="lazy"
  ></iframe>
</div>

URL Parameters

ParameterValuesDescription
embedtrueEnables embed mode (hides toolbar, navigation)
splittrueShows both source editor and graph side by side
exampletaint_flow, pointer_alias, complex_cfg, indirect_call, etc.Pre-loaded example slug
graphcfg, callgraph, defuse, valueflow, ptaInitial graph to display

Styling

Add the SAF widget styles to your CSS:

.saf-widget {
  border: 1px solid #333;
  border-radius: 8px;
  overflow: hidden;
  margin: 1em 0;
}

.saf-widget iframe {
  width: 100%;
  height: 400px;
  border: none;
}

Adjust the height to fit your content. For split mode with source + graph, 500px or more is recommended.

mdBook Integration

In mdBook pages, use raw HTML:

<div class="saf-widget">
  <iframe
    src="../../playground/?embed=true&split=true&example=taint_flow&graph=valueflow"
    loading="lazy"
  ></iframe>
</div>

The custom.css file included in this book already provides the widget styles.

Examples

CFG Visualization

<div class="saf-widget">
  <iframe
    src="../../playground/?embed=true&split=true&example=complex_cfg&graph=cfg"
    loading="lazy"
  ></iframe>
</div>

Call Graph with Indirect Calls

<div class="saf-widget">
  <iframe
    src="../../playground/?embed=true&split=true&example=indirect_call&graph=callgraph"
    loading="lazy"
  ></iframe>
</div>

Points-To Analysis

<div class="saf-widget">
  <iframe
    src="../../playground/?embed=true&split=true&example=pointer_alias&graph=pta"
    loading="lazy"
  ></iframe>
</div>

Value Flow (Taint)

<div class="saf-widget">
  <iframe
    src="../../playground/?embed=true&split=true&example=taint_flow&graph=valueflow"
    loading="lazy"
  ></iframe>
</div>

Sizing Recommendations

ModeRecommended HeightDescription
Graph only (embed=true)400pxShows graph visualization only
Split view (embed=true&split=true)500px+Shows source code + graph side by side
Full height600px+Best for complex graphs with many nodes

How It Works

When embed=true is set:

  1. The playground hides the top navigation bar
  2. The specified example is loaded automatically
  3. The specified graph type is selected
  4. With split=true, both the source editor and graph panel are visible

The widget runs entirely client-side via WebAssembly. No server calls are made for the analysis itself (C compilation still uses Compiler Explorer if the example contains C code).

Responsive Design

Widgets adapt to their container width. For narrow containers, consider increasing the height:

@media (max-width: 768px) {
  .saf-widget iframe {
    height: 300px;
  }
}

Security

Widgets use iframe sandboxing. The embedded playground cannot access the parent page's DOM or cookies. Communication between the parent page and widget is not currently supported.