Python SDK
The SAF Python SDK (import saf) provides full access to SAF's static analysis
capabilities from Python. It is built with PyO3 and installed via maturin.
Installation
The SDK is built automatically when entering the Docker environment:
make shell
# SDK is available immediately:
python3 -c "import saf; print(saf.version())"
For manual installation inside the dev container:
maturin develop --release
Core API
Project
The Project class is the main entry point for all analysis operations.
from saf import Project
# Open a project from LLVM IR
proj = Project.open("program.ll")
# Open from AIR-JSON
proj = Project.open("program.air.json")
# Open with analysis tuning parameters
proj = Project.open(
"program.ll",
vf_mode="precise", # "fast" (default) or "precise"
pta_solver="worklist", # "worklist" (default) or "datalog"
pta_max_iterations=20000, # default: 10000
field_sensitivity_depth=3, # default: 2 (0 = disabled)
max_refinement_iterations=5, # default: 10
)
Project.open() signature:
Project.open(
path: str,
*,
vf_mode: str = "fast",
pta_solver: str = "worklist",
pta_max_iterations: int | None = None,
field_sensitivity_depth: int | None = None,
max_refinement_iterations: int | None = None,
) -> Project
| Parameter | Description |
|---|---|
path | Path to input file (.air.json, .ll, or .bc). Frontend is selected automatically by extension. |
vf_mode | "fast" routes all memory through a single unknown node for robust taint analysis. "precise" uses points-to analysis to resolve memory locations (may miss flows through unresolved pointers). |
pta_solver | "worklist" uses the imperative worklist-based solver. "datalog" uses the Ascent Datalog fixpoint solver. |
pta_max_iterations | Maximum PTA solver iterations. Default: 10000. |
field_sensitivity_depth | Field sensitivity depth. 0 = disabled, default: 2. Higher values track deeper nested struct fields. |
max_refinement_iterations | Maximum CG refinement iterations. Default: 10. |
Raises: FrontendError if the input file cannot be parsed or the required frontend is not available.
Schema Discovery
schema = proj.schema()
# Returns a dict with structured information about:
# - tool_version, schema_version
# - frontends (air-json, llvm with extensions and descriptions)
# - graphs (cfg, callgraph, defuse, valueflow)
# - queries (taint_flow, flows, points_to, may_alias with parameters)
# - selectors (sources, sinks, sanitizers)
Query Context
from saf import sources, sinks, sanitizers
q = proj.query()
# Taint flow analysis
findings = q.taint_flow(
sources=sources.function_param("main", 1),
sinks=sinks.call("system", arg_index=0),
)
# With sanitizers (accepts a Selector or SelectorSet, not strings)
findings = q.taint_flow(
sources=sources.function_param("main", 1),
sinks=sinks.call("system", arg_index=0),
sanitizers=sanitizers.call("validate_input"),
limit=500, # default: 1000
)
# Data flows (without sanitizer filtering)
findings = q.flows(
sources=sources.function_param("read_data"),
sinks=sinks.arg_to("write_output", 0),
limit=100,
)
# Points-to query (takes hex value ID string)
pts = q.points_to("0x00000000000000000000000000000001")
# Alias query (takes hex value ID strings)
alias = q.may_alias("0x00000001", "0x00000002")
Query method signatures:
| Method | Parameters | Returns |
|---|---|---|
taint_flow(sources, sinks, sanitizers=None, *, limit=1000) | Selector/SelectorSet for each | list[Finding] |
flows(sources, sinks, *, limit=1000) | Selector/SelectorSet for each | list[Finding] |
points_to(ptr) | Hex string value ID | list[str] (location IDs) |
may_alias(p, q) | Hex string value IDs | bool |
Graph Export
graphs = proj.graphs()
# List available graph types
print(graphs.available()) # ["cfg", "callgraph", "defuse", "valueflow"]
# Export to PropertyGraph dict
cfg = graphs.export("cfg")
cfg_main = graphs.export("cfg", function="main") # single function
cg = graphs.export("callgraph")
du = graphs.export("defuse")
vf = graphs.export("valueflow")
# Export to Graphviz DOT string
dot_str = graphs.to_dot("callgraph")
# Export to interactive HTML (Cytoscape.js)
html_str = graphs.to_html("cfg", function="main")
All export() calls return a unified PropertyGraph dict:
{
"schema_version": "0.1.0",
"graph_type": "callgraph",
"metadata": {},
"nodes": [{"id": "0x...", "labels": [...], "properties": {...}}, ...],
"edges": [{"src": "0x...", "dst": "0x...", "edge_type": "...", "properties": {}}, ...],
}
Source and Sink Selectors
Selectors identify values in the program for taint analysis. They can be
combined using the | operator to form a SelectorSet.
Sources (saf.sources)
| Function | Description |
|---|---|
function_param(function, index=None) | Select function parameters by name pattern (glob-style). index is 0-based; None selects all parameters. |
function_return(function) | Select function return values by name pattern. |
call(callee) | Select return values of calls to a function. |
argv() | Select command-line arguments (shortcut for function_param("main", None)). |
getenv(name=None) | Select environment variable reads (shortcut for call("getenv")). |
from saf import sources
src = sources.function_param("main", 1)
src = sources.function_param("read_*") # glob pattern
src = sources.function_return("get_input")
src = sources.call("getenv")
src = sources.argv()
# Combine with |
combined = sources.argv() | sources.getenv()
Sinks (saf.sinks)
| Function | Description |
|---|---|
call(callee, *, arg_index=None) | Select calls to a function. If arg_index is given, selects that argument; otherwise selects the call result. |
arg_to(callee, index) | Select arguments passed to a function (0-based index). |
from saf import sinks
sink = sinks.call("system", arg_index=0)
sink = sinks.call("printf", arg_index=0)
sink = sinks.arg_to("free", 0)
# Without arg_index, selects the call result
sink = sinks.call("dangerous_function")
Sanitizers (saf.sanitizers)
| Function | Description |
|---|---|
call(callee, *, arg_index=None) | Select calls to a sanitizing function. If arg_index is given, selects that argument; otherwise the return value is considered sanitized. |
arg_to(callee, index) | Select arguments passed to a sanitizing function. |
from saf import sanitizers
san = sanitizers.call("escape_html", arg_index=0)
san = sanitizers.call("sanitize_input")
# Combine sanitizers
combined = sanitizers.call("sanitize") | sanitizers.call("escape")
Module-Level Selector Factories
The following factory functions are also available directly from saf._saf
(used internally by sources, sinks, and sanitizers modules):
function_param(function, index=None)-- Select function parameters.function_return(function)-- Select function return values.call(callee)-- Select call results.arg_to(callee, index=None)-- Select arguments to a callee.
Checker Framework
Running Built-In Checkers
# Run a specific checker
findings = proj.check("memory-leak")
# Run multiple checkers at once (pass a list)
findings = proj.check(["memory-leak", "use-after-free", "double-free"])
# Run all 9 built-in checkers
findings = proj.check_all()
# List available checkers and their metadata
schema = proj.checker_schema()
for checker in schema["checkers"]:
print(f"{checker['name']}: {checker['description']} (CWE-{checker['cwe']})")
Built-In Checker Table
| Checker Name | CWE | Description |
|---|---|---|
memory-leak | 401 | Allocated memory never freed |
use-after-free | 416 | Memory accessed after being freed |
double-free | 415 | Memory freed more than once |
null-deref | 476 | Null pointer dereference |
file-descriptor-leak | 403 | Opened file never closed |
uninit-use | 457 | Use of uninitialized memory |
stack-escape | 562 | Returning stack address |
lock-not-released | 764 | Mutex not unlocked |
generic-resource-leak | N/A | Custom resource tracking |
Custom Checkers
# Define a custom checker with source/sink/sanitizer roles
findings = proj.check_custom(
"my-custom-leak",
mode="must_not_reach", # "may_reach", "must_not_reach", or "never_reach_sink"
source_role="allocator", # resource role for sources
source_match_return=True, # match return value (True) or first arg (False)
sink_is_exit=True, # sinks are function exits
sink_role=None, # or a resource role string
sanitizer_role="deallocator", # or None
sanitizer_match_return=False,
cwe=401, # optional CWE ID
severity="warning", # "info", "warning", "error", "critical"
)
Path-Sensitive Checking (Z3)
# Run checkers with Z3-based path feasibility filtering
result = proj.check_path_sensitive("null-deref", z3_timeout_ms=2000, max_guards=64)
# Or run all checkers with path sensitivity
result = proj.check_all_path_sensitive(z3_timeout_ms=1000, max_guards=64)
# Result has feasible, infeasible, and unknown findings
print(f"Real bugs: {len(result.feasible)}")
print(f"False positives filtered: {len(result.infeasible)}")
print(f"Unknown: {len(result.unknown)}")
print(result.diagnostics) # dict with Z3 statistics
# Post-filter existing findings
raw_findings = proj.check_all()
result = proj.filter_infeasible(raw_findings, z3_timeout_ms=1000, max_guards=64)
CheckerFinding Attributes
Each item returned by check(), check_all(), or check_custom() is a
CheckerFinding:
| Attribute | Type | Description |
|---|---|---|
checker | str | Checker name that produced this finding |
severity | str | "info", "warning", "error", or "critical" |
cwe | int | None | CWE ID if applicable |
message | str | Human-readable description |
source | str | Source SVFG node hex ID |
sink | str | Sink SVFG node hex ID |
trace | list[str] | Path from source to sink as hex node IDs |
sink_traces | list[dict] | Per-sink traces for multi-reach findings (e.g., double-free). Each dict has "sink" and "trace" keys. |
for f in proj.check("use-after-free"):
print(f.checker, f.severity, f.message)
print(f" CWE-{f.cwe}: {f.source} -> {f.sink}")
print(f" Trace length: {len(f.trace)}")
# Convert to dict
d = f.to_dict()
Finding Objects
The taint_flow() and flows() query methods return Finding objects, which
are distinct from CheckerFinding objects.
Finding Attributes
| Attribute | Type | Description |
|---|---|---|
finding_id | str | Deterministic hex identifier |
source_location | str | Source location (file:line:col or value ID) |
sink_location | str | Sink location (file:line:col or value ID) |
source_id | str | Source value ID (hex) |
sink_id | str | Sink value ID (hex) |
rule_id | str | None | Optional rule identifier |
trace | Trace | Step-by-step data flow path |
for f in q.taint_flow(sources.argv(), sinks.call("system", arg_index=0)):
print(f"{f.source_location} -> {f.sink_location}")
print(f.trace.pretty()) # human-readable trace
print(f"Steps: {len(f.trace)}")
d = f.to_dict() # convert to dict
Trace and TraceStep
A Trace contains a list of TraceStep objects. Each step represents one
hop in the value-flow graph:
| TraceStep Attribute | Type | Description |
|---|---|---|
from_id | str | Source node ID |
from_kind | str | Source node kind |
from_symbol | str | None | Symbol name at source |
from_location | str | None | Source file:line:col |
edge | str | Edge kind (def_use, transform, store, load, etc.) |
to_id | str | Target node ID |
to_kind | str | Target node kind |
to_symbol | str | None | Symbol name at target |
to_location | str | None | Target file:line:col |
for step in finding.trace.steps:
print(f" {step.from_symbol or step.from_id} --{step.edge}-> "
f"{step.to_symbol or step.to_id}")
Resource Table
The resource table maps function names to resource management roles. It ships with built-in entries for C stdlib, C++ operators, POSIX I/O, and pthreads.
table = proj.resource_table()
table.has_role("malloc", "allocator") # True
table.has_role("free", "deallocator") # True
table.has_role("fopen", "acquire") # True
table.has_role("fclose", "release") # True
# Add custom entries
table.add("my_alloc", "allocator")
table.add("my_free", "deallocator")
# Inspect
print(table.size) # number of entries
print(table.function_names()) # sorted list of function names
entries = table.export() # list of {"name": ..., "roles": [...]}
Available roles: allocator, deallocator, reallocator, acquire,
release, lock, unlock, null_source, dereference.
Advanced Analysis
IFDS Taint Analysis
Precise interprocedural taint tracking using the IFDS framework (Reps/Horwitz/Sagiv tabulation algorithm):
result = proj.ifds_taint(
sources=sources.function_param("main", 0),
sinks=sinks.call("system", arg_index=0),
sanitizers=sanitizers.call("validate"), # optional
)
Typestate Analysis
Track per-resource state machines using the IDE framework:
# Built-in specs: "file_io", "mutex_lock", "memory_alloc"
result = proj.typestate("file_io")
# Custom typestate spec
from saf import TypestateSpec
result = proj.typestate_custom(spec)
Flow-Sensitive Pointer Analysis
More precise than Andersen's flow-insensitive analysis for programs with pointer reassignment:
fs_result = proj.flow_sensitive_pta(pts_repr="auto")
# pts_repr options: "auto", "btreeset", "bitvector", "bdd"
Context-Sensitive Pointer Analysis (k-CFA)
Distinguishes calls to the same function from different call sites:
cs_result = proj.context_sensitive_pta(k=1, pts_repr="auto")
Demand-Driven Pointer Analysis
Computes points-to information only for explicitly queried pointers:
dda = proj.demand_pta(
max_steps=100_000,
max_context_depth=10,
timeout_ms=5000,
enable_strong_updates=True,
pts_repr="auto",
)
Memory SSA and SVFG
mssa = proj.memory_ssa() # Memory SSA representation
svfg = proj.svfg() # Sparse Value-Flow Graph
Call Graph Refinement
Iterative CHA + PTA-based indirect call resolution:
result = proj.refine_call_graph(entry_points="all", max_iterations=10)
Abstract Interpretation
Numeric interval analysis with widening/narrowing:
result = proj.abstract_interp(
max_widening=100,
narrowing_iterations=3,
use_thresholds=True,
)
Numeric Checkers
# Individual numeric checker
findings = proj.check_numeric("buffer_overflow") # CWE-120
findings = proj.check_numeric("integer_overflow") # CWE-190
findings = proj.check_numeric("division_by_zero") # CWE-369
findings = proj.check_numeric("shift_count") # CWE-682
# All numeric checkers at once
findings = proj.check_all_numeric()
Combined PTA + Abstract Interpretation
Alias-aware numeric analysis with bidirectional refinement:
result = proj.analyze_combined(
enable_refinement=True,
max_refinement_iterations=3,
)
interval = result.interval_at("0x1234...")
alias = result.may_alias("0x5678...", "0x9abc...")
Z3 Path Refinement
# Prove/disprove assertions
result = proj.prove_assertions(z3_timeout_ms=1000, max_guards=64)
# Refine alias query with path constraints
result = proj.refine_alias("0xP", "0xQ", at_block="0xB", func_id="0xF")
# Check if a feasible path exists between two blocks
result = proj.check_path_reachable(
from_block="0xB1", to_block="0xB2", func_id="0xF",
z3_timeout_ms=1000, max_guards=64, max_paths=100,
)
JSON Protocol (LLM Agent Interface)
import json
resp = proj.request('{"action": "schema"}')
data = json.loads(resp)
AIR Module Access
The AirModule provides mid-level access to the intermediate representation:
air = proj.air()
print(air.name) # module name
print(air.id) # module ID (hex)
print(air.function_count) # number of functions
print(air.global_count) # number of globals
print(air.function_names()) # list of function names
print(air.global_names()) # list of global names
Visualization
The saf.viz module provides dependency-free graph visualization:
from saf import viz
pg = proj.graphs().export("callgraph")
# Graphviz DOT string (no dependencies)
dot_str = viz.to_dot(pg)
# Interactive HTML with Cytoscape.js (no dependencies)
html_str = viz.to_html(pg)
# Open in browser or save to file
viz.visualize(pg) # opens in browser
viz.visualize(pg, output="callgraph.html") # saves to file
# Cytoscape.js JSON (for ipycytoscape in Jupyter)
cy_json = viz.to_cytoscape_json(pg)
# NetworkX DiGraph (requires networkx)
G = viz.to_networkx(pg)
# graphviz.Digraph object (requires graphviz package)
gv = viz.to_graphviz(pg)
gv.render("graph", format="svg")
Exceptions
All SAF exceptions inherit from SafError and carry .code and .details
attributes for structured error handling:
| Exception | Description |
|---|---|
SafError | Base exception for all SAF errors |
FrontendError | Frontend ingestion errors (parsing, I/O, unsupported features) |
AnalysisError | Analysis errors (PTA timeout, ValueFlow build error) |
QueryError | Query execution errors (invalid selector, no match) |
ConfigError | Configuration errors (invalid field, incompatible options) |
from saf import Project, SafError, FrontendError
try:
proj = Project.open("nonexistent.ll")
except FrontendError as e:
print(f"Error: {e}")
Module Exports
The saf package exports the following names:
from saf import (
# Core classes
Project, Query, Finding, Trace, TraceStep,
# Selectors
Selector, SelectorSet,
# Selector modules
sources, sinks, sanitizers, viz,
# Checker types
CheckerFinding, PathSensitiveResult, ResourceTable,
# Typestate types
TypestateResult, TypestateFinding, TypestateSpec, typestate_specs,
# Exceptions
SafError, FrontendError, AnalysisError, QueryError, ConfigError,
# Resource role constants
Allocator, Deallocator, Reallocator, Acquire, Release,
NullSource, Dereference,
# Reachability mode constants
MayReach, MustNotReach,
# Severity constants
Info, Warning, Error, Critical,
# Functions
version,
)