Audio Graph Primitives Architecture

This document describes the target architecture for audio effects: primitives as graph nodes with parameter modulation, eliminating dedicated composition structs.

Problem Statement

Current Architecture

Primitives (DelayLine, PhaseOsc, EnvelopeFollower, Allpass1, Smoother)
    ↓ hardcoded wiring in structs
Compositions (ModulatedDelay, AmplitudeMod, AllpassBank)
    ↓ impl AudioNode
Chain/AudioGraph (runtime, dyn dispatch)

Issues

Arbitrary groupings - Why is ModulatedDelay a composition but DynamicsProcessor isn't? The boundaries feel arbitrary.
Extensibility burden - New effects require: new struct, impl AudioNode, export from lib.rs. Users can't easily create custom effects.
Duplicate implementation paths - Hardcoded compositions duplicate what the graph could do. Maintaining both is wasteful.
Alternate implementations - GPU/WASM backends must implement both primitives AND compositions.

Target Architecture

Primitives (implement AudioNode directly)
    ↓
Graph with parameter modulation
    ↓
Effects = graph configurations (data, not code)

Benefits

Aspect	Current	Target
New effect	New struct + impl + export	Graph configuration
Custom modulation	Not possible	Wire any output → any param
Alternate backends	Must impl compositions	Only impl primitives
Serialization	Code	Data (graph description)

Design

Primitives as AudioNode

Each primitive implements AudioNode with modulatable parameters:

rust

pub trait AudioNode: Send {
    /// Process one sample. Parameters come from context or modulation.
    fn process(&mut self, input: f32, ctx: &AudioContext) -> f32;

    /// Reset internal state.
    fn reset(&mut self) {}

    /// Declare modulatable parameters.
    fn params(&self) -> &[ParamDescriptor] { &[] }

    /// Set a parameter value (called by graph before process).
    fn set_param(&mut self, index: usize, value: f32);
}

pub struct ParamDescriptor {
    pub name: &'static str,
    pub default: f32,
    pub min: f32,
    pub max: f32,
}

BlockProcessor Trait

For tier-agnostic code, use BlockProcessor which all tiers implement:

rust

pub trait BlockProcessor: Send {
    fn process_block(&mut self, input: &[f32], output: &mut [f32], ctx: &mut AudioContext);
    fn reset(&mut self);
}

// Blanket impl for AudioNode types (Tier 1/2/4)
impl<T: AudioNode> BlockProcessor for T {
    fn process_block(&mut self, input: &[f32], output: &mut [f32], ctx: &mut AudioContext) {
        for (inp, out) in input.iter().zip(output.iter_mut()) {
            *out = self.process(*inp, ctx);
            ctx.advance();
        }
    }
}

// JIT (Tier 3) implements BlockProcessor natively
impl BlockProcessor for CompiledGraph { ... }

Code written against BlockProcessor works with any tier:

rust

fn apply_effect<P: BlockProcessor>(effect: &mut P, audio: &mut [f32], sample_rate: f32) {
    let mut output = vec![0.0; audio.len()];
    let mut ctx = AudioContext::new(sample_rate);
    effect.process_block(audio, &mut output, &mut ctx);
    audio.copy_from_slice(&output);
}

Example primitive implementations:

rust

// DelayLine - modulatable delay time
impl AudioNode for DelayLine<true> {
    fn process(&mut self, input: f32, _ctx: &AudioContext) -> f32 {
        self.write(input);
        self.read_interp(self.time)  // time set via set_param
    }

    fn params(&self) -> &[ParamDescriptor] {
        &[ParamDescriptor { name: "time", default: 0.0, min: 0.0, max: 1.0 }]
    }

    fn set_param(&mut self, index: usize, value: f32) {
        if index == 0 { self.time = value; }
    }
}

// PhaseOsc - outputs control signal, modulatable rate
impl AudioNode for PhaseOsc {
    fn process(&mut self, _input: f32, _ctx: &AudioContext) -> f32 {
        self.tick(self.phase_inc)  // phase_inc set via set_param
    }

    fn params(&self) -> &[ParamDescriptor] {
        &[ParamDescriptor { name: "rate", default: 1.0, min: 0.01, max: 100.0 }]
    }

    fn set_param(&mut self, index: usize, value: f32) {
        if index == 0 { self.phase_inc = value; }
    }
}

Parameter Modulation in Graph

The graph supports two connection types:

Audio connections - sample data flows between nodes
Param connections - one node's output modulates another's parameter

rust

pub struct AudioGraph {
    nodes: Vec<Box<dyn AudioNode>>,
    audio_wires: Vec<AudioWire>,
    param_wires: Vec<ParamWire>,
}

pub struct AudioWire {
    from_node: usize,
    to_node: usize,
}

pub struct ParamWire {
    from_node: usize,
    to_node: usize,
    to_param: usize,
    /// Transform: base + modulation * scale
    scale: f32,
    base: f32,
}

impl AudioGraph {
    /// Connect audio output → audio input
    pub fn connect(&mut self, from: usize, to: usize) { ... }

    /// Connect audio output → parameter (modulation)
    pub fn modulate(&mut self, from: usize, to: usize, param: &str, scale: f32, base: f32) { ... }
}

Graph Execution

Per-sample execution with modulation:

rust

impl AudioGraph {
    pub fn process(&mut self, input: f32, ctx: &AudioContext) -> f32 {
        // Store node outputs
        let mut outputs = vec![0.0; self.nodes.len()];
        outputs[0] = input;  // Input node

        for (i, node) in self.nodes.iter_mut().enumerate().skip(1) {
            // Apply parameter modulation from connected sources
            for wire in &self.param_wires {
                if wire.to_node == i {
                    let mod_value = outputs[wire.from_node];
                    let param_value = wire.base + mod_value * wire.scale;
                    node.set_param(wire.to_param, param_value);
                }
            }

            // Get audio input (sum of all connected sources)
            let audio_in: f32 = self.audio_wires
                .iter()
                .filter(|w| w.to_node == i)
                .map(|w| outputs[w.from_node])
                .sum();

            outputs[i] = node.process(audio_in, ctx);
        }

        outputs[self.output_node]
    }
}

Effects as Graph Constructors

Effects become functions that build graphs:

rust

pub fn chorus(sample_rate: f32) -> AudioGraph {
    let mut g = AudioGraph::new();

    let input = g.add_input();
    let lfo = g.add(PhaseOsc::new());
    let delay = g.add(DelayLine::new(4096));
    let output = g.add_output();

    // LFO modulates delay time: base 20ms, depth ±5ms
    g.modulate(lfo, delay, "time", 0.005 * sample_rate, 0.020 * sample_rate);

    // Audio path: input → delay → output (mixed with dry)
    g.connect(input, delay);
    g.connect_mix(input, delay, output, 0.5);  // 50% wet

    g
}

pub fn flanger(sample_rate: f32) -> AudioGraph {
    // Same structure as chorus, different parameters!
    let mut g = AudioGraph::new();
    // ... shorter delay, more feedback, faster LFO
    g
}

Relationship to resin-core Graph

The resin-core Graph is for general node-based pipelines (meshes, textures, etc.) with typed ports and single execution.

The audio graph needs:

Per-sample execution (not single-shot)
Parameter modulation (control signals → params)
Stateful nodes (delay lines, filters)

Options:

Separate AudioGraph (current direction) - Audio-specific, optimized for sample processing
Extend resin-core - Add rate/modulation concepts to general graph
Shared modulation abstraction - Modulatable<T> type used by both

Recommendation: Keep AudioGraph separate for now, but design Modulatable<T> as a shared concept that both can use later. The execution models are too different to force into one graph type.

Migration Path

Phase 1: Primitives as AudioNode

Add params() and set_param() to AudioNode trait
Implement AudioNode for primitives in primitive.rs
Keep existing compositions working

Phase 2: Graph with Modulation

Add param_wires to AudioGraph
Add modulate() method
Update execution to apply modulation

Phase 3: Effects as Graphs

Rewrite chorus(), flanger(), etc. to return AudioGraph
Deprecate composition structs (ModulatedDelay, etc.)
Update benchmarks to verify no regression

Phase 4: Cleanup

Remove composition structs
Primitives become the only building blocks
Document graph-based effect creation

Implementation Status

Phase	Status	Notes
Phase 1: Primitives as AudioNode	✅ Complete	`ParamDescriptor`, `params()`, `set_param()` added to AudioNode. Primitive nodes: `DelayNode`, `LfoNode`, `EnvelopeNode`, `AllpassNode`, `GainNode`, `MixNode`
Phase 2: Graph with Modulation	✅ Complete	`AudioGraph` with `modulate()` and `modulate_named()`. Param wires apply `base + output * scale` before each node processes.
Phase 3: Effects as Graphs	✅ Partial	`tremolo_graph()`, `chorus_graph()`, `flanger_graph()` demonstrate the pattern. Old composition structs still exist alongside.
Phase 4: Cleanup	⏳ Pending	Composition structs (ModulatedDelay, AmplitudeMod, AllpassBank) can be removed once graph versions are validated.

Remaining Work

Benchmark comparison - Compare graph-based vs composition-based effects performance
Dry/wet routing - MixNode needs cleaner way to receive dry signal in graph context
Remove composition structs - After validation, delete old types
Phaser as graph - AllpassBank with modulated coefficients needs multi-stage allpass

Performance Considerations

Benchmark Results

Initial benchmarks showed significant overhead. After optimization:

Effect	Composition	Graph (initial)	Graph (optimized)	Final Overhead
Tremolo	189 µs	718 µs	587 µs	3.11x (was 3.76x)
Chorus	735 µs	1,637 µs	1,163 µs	1.58x (was 2.19x)
Flanger	754 µs	1,224 µs	984 µs	1.31x (was 1.65x)

(Times are for processing 1 second of audio at 44.1kHz)

Optimizations implemented:

Pre-computed wire lookups (per-node execution info, ~3-4% improvement)
Control-rate parameter updates (every 64 samples, ~16-26% improvement)

Overhead Sources

The remaining overhead comes from:

Source	Impact	Notes
Dynamic dispatch	MODERATE	`Box<dyn AudioNode>` vtable lookup per node per sample
Node loop	LOW	Iterating node vec vs hardcoded struct fields
Output vector	LOW	Writing to `outputs[i]` vs direct field access

Wire iteration and set_param were addressed by optimizations.

Optimization Strategies

1. Pre-compute wire lookups ✅ Implemented

rust

struct NodeExecInfo {
    audio_inputs: Vec<NodeIndex>,  // Which nodes feed this one
    param_mods: Vec<(NodeIndex, usize, f32, f32)>,  // (from, param, base, scale)
}

Built once when graph structure changes via build_exec_info().

2. Control-rate parameter updates ✅ Implemented

rust

graph.set_control_rate(64);  // Update params every 64 samples

Default effect constructors use control rate 64 (~1.5ms at 44.1kHz).

Why Tier 2 Beats Hand-Written Structs

Counterintuitively, pattern-matched optimized structs (Tier 2) are faster than the original hand-written effect structs:

Effect	Original Struct	Tier 2 Optimized	Improvement
Tremolo	189 µs	175 µs	8% faster
Chorus	735 µs	611 µs	17% faster
Flanger	754 µs	647 µs	14% faster

Root cause: algebraic simplification.

The original AmplitudeMod (tremolo) does this per sample:

rust

let lfo_val = self.lfo.sine_uni();  // = sine() * 0.5 + 0.5
let mod_amount = 1.0 - self.depth * lfo_val;
input * mod_amount
// 5 ops after sine: *0.5, +0.5, *depth, 1.0-, *input

The optimized TremoloOptimized pre-computes constants:

rust

let lfo_out = self.lfo.sine();
let gain = self.base + lfo_out * self.scale;  // base/scale computed once
input * gain
// 3 ops after sine: *scale, +base, *input

Saving 2 FLOPs per sample × 44,100 samples/sec = 88,200 fewer operations/sec.

Implication for codegen: Real codegen should perform:

Constant folding (pre-compute what's known at graph construction)
Algebraic simplification (eliminate redundant conversions)
Dead code elimination (remove unused paths)

3. Monomorphization / code generation (not implemented) For small fixed graphs, generate specialized code that inlines the graph structure. Options:

Proc macro at compile time
Cranelift JIT at runtime (see below)

4. SIMD batching (not implemented) Process N samples at once, amortizing per-node overhead.

Cranelift JIT Consideration

Cranelift could JIT-compile graph configurations into native code, eliminating:

Wire iteration (baked into generated code)
Dynamic dispatch (concrete function calls)
set_param overhead (params computed inline)

Pros:

Near-zero overhead for any graph configuration
Users get composition-level performance without writing Rust
Graph changes trigger recompilation (acceptable for effect design, not live modulation)

Cons:

Significant complexity (cranelift dependency, IR generation)
Compile latency when graph changes (~1-10ms typical)
Platform support considerations (cranelift targets)
Debugging JIT code is harder

Verdict: Worth exploring if simpler optimizations don't close the gap. Start with pre-computed wire lookups.

Acceptable Overhead?

For context, 1.6ms to process 1 second of audio = 0.16% CPU at real-time rate. Even the slowest graph version (chorus at 1.6ms) is well under real-time requirements.

However, effects are often chained. A chain of 10 graph-based effects at 2x overhead could matter. And the principle of "graph should be as fast as hardcoded" is worth pursuing.

Performance Tiers

Four compilation tiers offer different tradeoffs:

Tier	Feature	Overhead	Flexibility	Complexity
1. Dynamic graph	default	~1.3-3x	Full runtime flexibility	Low
2. Pattern matching	`optimize`	~0% (faster than concrete!)	Common idioms	Low
3. Cranelift JIT	`cranelift`	~0-5%	Runtime flexibility	High
4. Build.rs codegen	`codegen` module	~0-7%	Compile-time only	Low

Tier 1: Dynamic AudioGraph (default)

The default. Graphs are built at runtime, nodes are boxed trait objects.

Overhead sources: dyn dispatch per node per sample, output vector indexing.

When to use: Most cases. Real-time audio is feasible (0.16% CPU for 1 second of chorus).

Tier 2: Pattern Matching + Optimized Compositions (`optimize` feature)

Graph fingerprinting identifies common idioms (tremolo, chorus, flanger) and replaces subgraphs with hand-optimized struct implementations.

Benchmark results:

Tremolo: 175µs (8% faster than concrete struct baseline!)
Chorus: 611µs (17% faster than concrete struct!)
Flanger: 647µs (13% faster than concrete struct!)

The optimized compositions use primitives directly with no abstraction overhead, allowing better compiler optimization than even hand-written effect structs.

When to use: Transparent optimization of dynamic graphs. Enable optimize feature and call optimize_graph() after building.

Status: Implemented in optimize.rs. Patterns for tremolo, chorus, flanger.

Tier 3: Cranelift JIT (`cranelift` feature)

JIT-compile graphs to native code at runtime. Eliminates dyn dispatch and wire iteration.

Block Processing: JIT uses block-based processing via BlockProcessor::process_block() to amortize function call overhead. Per-sample JIT calls have ~15 cycle overhead that dominates simple effects. Block processing amortizes this across 64-512 samples.

rust

// JIT only supports block processing (no per-sample process())
compiled.process_block(&input, &mut output, &mut ctx);

Trade-off: ~1-10ms compilation latency when graph changes, significant implementation complexity, harder to debug.

When to use: User-designed effects at runtime (not known at compile time). For compile-time known graphs, prefer Tier 4 (codegen).

Status: Proof-of-concept in jit.rs. Simple gain/tremolo tests pass. Full graph compilation with block processing implemented.

Tier 4: Build.rs Code Generation (`codegen` module)

Generate optimized Rust code from graphs at build time. Define graphs as SerialAudioGraph in build.rs, generate specialized struct code, and include it in your crate.

Benchmark results:

Effect	Tier 2 Optimized	Tier 4 Codegen	Difference
Tremolo	176 µs	176 µs	~0%
Chorus	610 µs	653 µs	~7%
Flanger	684 µs	697 µs	~2%

The codegen is generic (topological sort + node processing) rather than pattern-matching specific effects. Performance is within rounding error of hand-optimized Tier 2 code.

Example build.rs:

rust

use rhizome_resin_audio_codegen::{
    SerialAudioGraph, SerialAudioNode, SerialParamWire, generate_effect, generate_header,
};
use std::env;
use std::fs;
use std::path::Path;

fn main() {
    let out_dir = env::var("OUT_DIR").unwrap();
    let dest = Path::new(&out_dir).join("my_effects.rs");

    let mut code = generate_header();

    // Tremolo: LFO modulating gain
    let tremolo_graph = SerialAudioGraph {
        nodes: vec![
            SerialAudioNode::Lfo { rate: 5.0 },
            SerialAudioNode::Gain { gain: 1.0 },
        ],
        audio_wires: vec![],
        param_wires: vec![SerialParamWire {
            from: 0,
            to: 1,
            param: 0,
            base: 0.5,
            scale: 0.5,
        }],
        input_node: Some(1),
        output_node: Some(1),
    };
    code.push_str(&generate_effect(&tremolo_graph, "MyTremolo"));

    fs::write(&dest, code).unwrap();
    println!("cargo:rerun-if-changed=build.rs");
}

Using in your crate:

rust

// Include generated code
mod generated {
    include!(concat!(env!("OUT_DIR"), "/my_effects.rs"));
}

use generated::MyTremolo;
use rhizome_resin_audio::graph::{AudioContext, AudioNode};

let mut tremolo = MyTremolo::new(44100.0);
let output = tremolo.process(input, &ctx);

Trade-off: Graphs must be known at compile time. The generated code uses topological sort to determine processing order - works with arbitrary graph structures, not just recognized patterns.

When to use: Library authors providing optimized effects, maximum performance for fixed graph structures.

Status: ✅ Complete. Implementation in resin-audio-codegen crate.

Codegen Implementation

Status: ✅ Complete in resin-audio-codegen crate

Architecture:

SerialAudioGraph (Rust structs)
    → generate_effect()
    → topological sort
    → Rust code with concrete types + inlined processing

Key types (resin-audio-codegen):

rust

pub struct SerialAudioGraph {
    pub nodes: Vec<SerialAudioNode>,
    pub audio_wires: Vec<(usize, usize)>,
    pub param_wires: Vec<SerialParamWire>,
    pub input_node: Option<usize>,
    pub output_node: Option<usize>,
}

pub enum SerialAudioNode {
    Lfo { rate: f32 },
    Gain { gain: f32 },
    Delay { max_samples: usize, feedback: f32, mix: f32 },
    Mix { mix: f32 },
    Envelope { attack: f32, release: f32 },
    Allpass { coefficient: f32 },
    Passthrough,
}

Implementation approach:

Topological sort (Kahn's algorithm) determines processing order from dependencies
Each node generates concrete struct fields and processing code
Param modulation applies base + output * scale before node processing
Constant folding for known values (e.g., wet/dry mix pre-computed)

Files:

crates/resin-audio-codegen/src/lib.rs - Main implementation (~780 lines)
crates/resin-audio/build.rs - Uses codegen for benchmark effects
crates/resin-audio/src/codegen.rs - Re-exports for library consumers

Future: Control Rate

Currently deferred, but the design supports it:

rust

pub trait AudioNode {
    /// How often params need updating (default: every sample)
    fn control_rate(&self) -> ControlRate { ControlRate::Audio }
}

pub enum ControlRate {
    Audio,           // Every sample
    Block(usize),    // Every N samples
    OnChange,        // Only when modulator changes
}

The graph executor can batch param updates for Block rate nodes, reducing overhead for LFOs that don't need sample-accurate modulation.

Open Questions

Feedback loops - Audio effects often have feedback (delay output → input). The graph needs to handle this without cycle errors. Solution: explicit Feedback node that provides one-sample delay.
Multi-output nodes - Some primitives (like stereo processors) have multiple outputs. Current design assumes single output. May need outputs: Vec<f32> instead of single return.
Preset serialization - Effects-as-graphs are naturally serializable (node types + connections). Need schema for saving/loading effect presets.

unification.md - Original analysis of audio/graphics effects
../domains/audio.md - Audio domain overview
../open-questions.md - Original modulation decisions (unimplemented)

Audio Graph Primitives Architecture ​

Problem Statement ​

Current Architecture ​

Issues ​

Target Architecture ​

Benefits ​

Design ​

Primitives as AudioNode ​

BlockProcessor Trait ​

Parameter Modulation in Graph ​

Graph Execution ​

Effects as Graph Constructors ​

Relationship to resin-core Graph ​

Migration Path ​

Phase 1: Primitives as AudioNode ​

Phase 2: Graph with Modulation ​

Phase 3: Effects as Graphs ​

Phase 4: Cleanup ​

Implementation Status ​

Remaining Work ​

Performance Considerations ​

Benchmark Results ​

Overhead Sources ​

Optimization Strategies ​

Why Tier 2 Beats Hand-Written Structs ​

Cranelift JIT Consideration ​

Acceptable Overhead? ​

Performance Tiers ​

Tier 1: Dynamic AudioGraph (default) ​

Tier 2: Pattern Matching + Optimized Compositions (optimize feature) ​

Tier 3: Cranelift JIT (cranelift feature) ​

Tier 4: Build.rs Code Generation (codegen module) ​

Codegen Implementation ​

Future: Control Rate ​

Open Questions ​

Related Documents ​

Audio Graph Primitives Architecture

Problem Statement

Current Architecture

Issues

Target Architecture

Benefits

Design

Primitives as AudioNode

BlockProcessor Trait

Parameter Modulation in Graph

Graph Execution

Effects as Graph Constructors

Relationship to resin-core Graph

Migration Path

Phase 1: Primitives as AudioNode

Phase 2: Graph with Modulation

Phase 3: Effects as Graphs

Phase 4: Cleanup

Implementation Status

Remaining Work

Performance Considerations

Benchmark Results

Overhead Sources

Optimization Strategies

Why Tier 2 Beats Hand-Written Structs

Cranelift JIT Consideration

Acceptable Overhead?

Performance Tiers

Tier 1: Dynamic AudioGraph (default)

Tier 2: Pattern Matching + Optimized Compositions (`optimize` feature)

Tier 3: Cranelift JIT (`cranelift` feature)

Tier 4: Build.rs Code Generation (`codegen` module)

Codegen Implementation

Future: Control Rate

Open Questions

Related Documents