Skip to content

Audio Graph Primitives Architecture

This document describes the target architecture for audio effects: primitives as graph nodes with parameter modulation, eliminating dedicated composition structs.

Problem Statement

Current Architecture

Primitives (DelayLine, PhaseOsc, EnvelopeFollower, Allpass1, Smoother)
    ↓ hardcoded wiring in structs
Compositions (ModulatedDelay, AmplitudeMod, AllpassBank)
    ↓ impl AudioNode
Chain/AudioGraph (runtime, dyn dispatch)

Issues

  1. Arbitrary groupings - Why is ModulatedDelay a composition but DynamicsProcessor isn't? The boundaries feel arbitrary.

  2. Extensibility burden - New effects require: new struct, impl AudioNode, export from lib.rs. Users can't easily create custom effects.

  3. Duplicate implementation paths - Hardcoded compositions duplicate what the graph could do. Maintaining both is wasteful.

  4. Alternate implementations - GPU/WASM backends must implement both primitives AND compositions.

Target Architecture

Primitives (implement AudioNode directly)

Graph with parameter modulation

Effects = graph configurations (data, not code)

Benefits

AspectCurrentTarget
New effectNew struct + impl + exportGraph configuration
Custom modulationNot possibleWire any output → any param
Alternate backendsMust impl compositionsOnly impl primitives
SerializationCodeData (graph description)

Design

Primitives as AudioNode

Each primitive implements AudioNode with modulatable parameters:

rust
pub trait AudioNode: Send {
    /// Process one sample. Parameters come from context or modulation.
    fn process(&mut self, input: f32, ctx: &AudioContext) -> f32;

    /// Reset internal state.
    fn reset(&mut self) {}

    /// Declare modulatable parameters.
    fn params(&self) -> &[ParamDescriptor] { &[] }

    /// Set a parameter value (called by graph before process).
    fn set_param(&mut self, index: usize, value: f32);
}

pub struct ParamDescriptor {
    pub name: &'static str,
    pub default: f32,
    pub min: f32,
    pub max: f32,
}

BlockProcessor Trait

For tier-agnostic code, use BlockProcessor which all tiers implement:

rust
pub trait BlockProcessor: Send {
    fn process_block(&mut self, input: &[f32], output: &mut [f32], ctx: &mut AudioContext);
    fn reset(&mut self);
}

// Blanket impl for AudioNode types (Tier 1/2/4)
impl<T: AudioNode> BlockProcessor for T {
    fn process_block(&mut self, input: &[f32], output: &mut [f32], ctx: &mut AudioContext) {
        for (inp, out) in input.iter().zip(output.iter_mut()) {
            *out = self.process(*inp, ctx);
            ctx.advance();
        }
    }
}

// JIT (Tier 3) implements BlockProcessor natively
impl BlockProcessor for CompiledGraph { ... }

Code written against BlockProcessor works with any tier:

rust
fn apply_effect<P: BlockProcessor>(effect: &mut P, audio: &mut [f32], sample_rate: f32) {
    let mut output = vec![0.0; audio.len()];
    let mut ctx = AudioContext::new(sample_rate);
    effect.process_block(audio, &mut output, &mut ctx);
    audio.copy_from_slice(&output);
}

Example primitive implementations:

rust
// DelayLine - modulatable delay time
impl AudioNode for DelayLine<true> {
    fn process(&mut self, input: f32, _ctx: &AudioContext) -> f32 {
        self.write(input);
        self.read_interp(self.time)  // time set via set_param
    }

    fn params(&self) -> &[ParamDescriptor] {
        &[ParamDescriptor { name: "time", default: 0.0, min: 0.0, max: 1.0 }]
    }

    fn set_param(&mut self, index: usize, value: f32) {
        if index == 0 { self.time = value; }
    }
}

// PhaseOsc - outputs control signal, modulatable rate
impl AudioNode for PhaseOsc {
    fn process(&mut self, _input: f32, _ctx: &AudioContext) -> f32 {
        self.tick(self.phase_inc)  // phase_inc set via set_param
    }

    fn params(&self) -> &[ParamDescriptor] {
        &[ParamDescriptor { name: "rate", default: 1.0, min: 0.01, max: 100.0 }]
    }

    fn set_param(&mut self, index: usize, value: f32) {
        if index == 0 { self.phase_inc = value; }
    }
}

Parameter Modulation in Graph

The graph supports two connection types:

  • Audio connections - sample data flows between nodes
  • Param connections - one node's output modulates another's parameter
rust
pub struct AudioGraph {
    nodes: Vec<Box<dyn AudioNode>>,
    audio_wires: Vec<AudioWire>,
    param_wires: Vec<ParamWire>,
}

pub struct AudioWire {
    from_node: usize,
    to_node: usize,
}

pub struct ParamWire {
    from_node: usize,
    to_node: usize,
    to_param: usize,
    /// Transform: base + modulation * scale
    scale: f32,
    base: f32,
}

impl AudioGraph {
    /// Connect audio output → audio input
    pub fn connect(&mut self, from: usize, to: usize) { ... }

    /// Connect audio output → parameter (modulation)
    pub fn modulate(&mut self, from: usize, to: usize, param: &str, scale: f32, base: f32) { ... }
}

Graph Execution

Per-sample execution with modulation:

rust
impl AudioGraph {
    pub fn process(&mut self, input: f32, ctx: &AudioContext) -> f32 {
        // Store node outputs
        let mut outputs = vec![0.0; self.nodes.len()];
        outputs[0] = input;  // Input node

        for (i, node) in self.nodes.iter_mut().enumerate().skip(1) {
            // Apply parameter modulation from connected sources
            for wire in &self.param_wires {
                if wire.to_node == i {
                    let mod_value = outputs[wire.from_node];
                    let param_value = wire.base + mod_value * wire.scale;
                    node.set_param(wire.to_param, param_value);
                }
            }

            // Get audio input (sum of all connected sources)
            let audio_in: f32 = self.audio_wires
                .iter()
                .filter(|w| w.to_node == i)
                .map(|w| outputs[w.from_node])
                .sum();

            outputs[i] = node.process(audio_in, ctx);
        }

        outputs[self.output_node]
    }
}

Effects as Graph Constructors

Effects become functions that build graphs:

rust
pub fn chorus(sample_rate: f32) -> AudioGraph {
    let mut g = AudioGraph::new();

    let input = g.add_input();
    let lfo = g.add(PhaseOsc::new());
    let delay = g.add(DelayLine::new(4096));
    let output = g.add_output();

    // LFO modulates delay time: base 20ms, depth ±5ms
    g.modulate(lfo, delay, "time", 0.005 * sample_rate, 0.020 * sample_rate);

    // Audio path: input → delay → output (mixed with dry)
    g.connect(input, delay);
    g.connect_mix(input, delay, output, 0.5);  // 50% wet

    g
}

pub fn flanger(sample_rate: f32) -> AudioGraph {
    // Same structure as chorus, different parameters!
    let mut g = AudioGraph::new();
    // ... shorter delay, more feedback, faster LFO
    g
}

Relationship to resin-core Graph

The resin-core Graph is for general node-based pipelines (meshes, textures, etc.) with typed ports and single execution.

The audio graph needs:

  • Per-sample execution (not single-shot)
  • Parameter modulation (control signals → params)
  • Stateful nodes (delay lines, filters)

Options:

  1. Separate AudioGraph (current direction) - Audio-specific, optimized for sample processing
  2. Extend resin-core - Add rate/modulation concepts to general graph
  3. Shared modulation abstraction - Modulatable<T> type used by both

Recommendation: Keep AudioGraph separate for now, but design Modulatable<T> as a shared concept that both can use later. The execution models are too different to force into one graph type.

Migration Path

Phase 1: Primitives as AudioNode

  • Add params() and set_param() to AudioNode trait
  • Implement AudioNode for primitives in primitive.rs
  • Keep existing compositions working

Phase 2: Graph with Modulation

  • Add param_wires to AudioGraph
  • Add modulate() method
  • Update execution to apply modulation

Phase 3: Effects as Graphs

  • Rewrite chorus(), flanger(), etc. to return AudioGraph
  • Deprecate composition structs (ModulatedDelay, etc.)
  • Update benchmarks to verify no regression

Phase 4: Cleanup

  • Remove composition structs
  • Primitives become the only building blocks
  • Document graph-based effect creation

Implementation Status

PhaseStatusNotes
Phase 1: Primitives as AudioNode✅ CompleteParamDescriptor, params(), set_param() added to AudioNode. Primitive nodes: DelayNode, LfoNode, EnvelopeNode, AllpassNode, GainNode, MixNode
Phase 2: Graph with Modulation✅ CompleteAudioGraph with modulate() and modulate_named(). Param wires apply base + output * scale before each node processes.
Phase 3: Effects as Graphs✅ Partialtremolo_graph(), chorus_graph(), flanger_graph() demonstrate the pattern. Old composition structs still exist alongside.
Phase 4: Cleanup⏳ PendingComposition structs (ModulatedDelay, AmplitudeMod, AllpassBank) can be removed once graph versions are validated.

Remaining Work

  1. Benchmark comparison - Compare graph-based vs composition-based effects performance
  2. Dry/wet routing - MixNode needs cleaner way to receive dry signal in graph context
  3. Remove composition structs - After validation, delete old types
  4. Phaser as graph - AllpassBank with modulated coefficients needs multi-stage allpass

Performance Considerations

Benchmark Results

Initial benchmarks showed significant overhead. After optimization:

EffectCompositionGraph (initial)Graph (optimized)Final Overhead
Tremolo189 µs718 µs587 µs3.11x (was 3.76x)
Chorus735 µs1,637 µs1,163 µs1.58x (was 2.19x)
Flanger754 µs1,224 µs984 µs1.31x (was 1.65x)

(Times are for processing 1 second of audio at 44.1kHz)

Optimizations implemented:

  1. Pre-computed wire lookups (per-node execution info, ~3-4% improvement)
  2. Control-rate parameter updates (every 64 samples, ~16-26% improvement)

Overhead Sources

The remaining overhead comes from:

SourceImpactNotes
Dynamic dispatchMODERATEBox<dyn AudioNode> vtable lookup per node per sample
Node loopLOWIterating node vec vs hardcoded struct fields
Output vectorLOWWriting to outputs[i] vs direct field access

Wire iteration and set_param were addressed by optimizations.

Optimization Strategies

1. Pre-compute wire lookups ✅ Implemented

rust
struct NodeExecInfo {
    audio_inputs: Vec<NodeIndex>,  // Which nodes feed this one
    param_mods: Vec<(NodeIndex, usize, f32, f32)>,  // (from, param, base, scale)
}

Built once when graph structure changes via build_exec_info().

2. Control-rate parameter updates ✅ Implemented

rust
graph.set_control_rate(64);  // Update params every 64 samples

Default effect constructors use control rate 64 (~1.5ms at 44.1kHz).

Why Tier 2 Beats Hand-Written Structs

Counterintuitively, pattern-matched optimized structs (Tier 2) are faster than the original hand-written effect structs:

EffectOriginal StructTier 2 OptimizedImprovement
Tremolo189 µs175 µs8% faster
Chorus735 µs611 µs17% faster
Flanger754 µs647 µs14% faster

Root cause: algebraic simplification.

The original AmplitudeMod (tremolo) does this per sample:

rust
let lfo_val = self.lfo.sine_uni();  // = sine() * 0.5 + 0.5
let mod_amount = 1.0 - self.depth * lfo_val;
input * mod_amount
// 5 ops after sine: *0.5, +0.5, *depth, 1.0-, *input

The optimized TremoloOptimized pre-computes constants:

rust
let lfo_out = self.lfo.sine();
let gain = self.base + lfo_out * self.scale;  // base/scale computed once
input * gain
// 3 ops after sine: *scale, +base, *input

Saving 2 FLOPs per sample × 44,100 samples/sec = 88,200 fewer operations/sec.

Implication for codegen: Real codegen should perform:

  • Constant folding (pre-compute what's known at graph construction)
  • Algebraic simplification (eliminate redundant conversions)
  • Dead code elimination (remove unused paths)

3. Monomorphization / code generation (not implemented) For small fixed graphs, generate specialized code that inlines the graph structure. Options:

  • Proc macro at compile time
  • Cranelift JIT at runtime (see below)

4. SIMD batching (not implemented) Process N samples at once, amortizing per-node overhead.

Cranelift JIT Consideration

Cranelift could JIT-compile graph configurations into native code, eliminating:

  • Wire iteration (baked into generated code)
  • Dynamic dispatch (concrete function calls)
  • set_param overhead (params computed inline)

Pros:

  • Near-zero overhead for any graph configuration
  • Users get composition-level performance without writing Rust
  • Graph changes trigger recompilation (acceptable for effect design, not live modulation)

Cons:

  • Significant complexity (cranelift dependency, IR generation)
  • Compile latency when graph changes (~1-10ms typical)
  • Platform support considerations (cranelift targets)
  • Debugging JIT code is harder

Verdict: Worth exploring if simpler optimizations don't close the gap. Start with pre-computed wire lookups.

Acceptable Overhead?

For context, 1.6ms to process 1 second of audio = 0.16% CPU at real-time rate. Even the slowest graph version (chorus at 1.6ms) is well under real-time requirements.

However, effects are often chained. A chain of 10 graph-based effects at 2x overhead could matter. And the principle of "graph should be as fast as hardcoded" is worth pursuing.

Performance Tiers

Four compilation tiers offer different tradeoffs:

TierFeatureOverheadFlexibilityComplexity
1. Dynamic graphdefault~1.3-3xFull runtime flexibilityLow
2. Pattern matchingoptimize~0% (faster than concrete!)Common idiomsLow
3. Cranelift JITcranelift~0-5%Runtime flexibilityHigh
4. Build.rs codegencodegen module~0-7%Compile-time onlyLow

Tier 1: Dynamic AudioGraph (default)

The default. Graphs are built at runtime, nodes are boxed trait objects.

Overhead sources: dyn dispatch per node per sample, output vector indexing.

When to use: Most cases. Real-time audio is feasible (0.16% CPU for 1 second of chorus).

Tier 2: Pattern Matching + Optimized Compositions (optimize feature)

Graph fingerprinting identifies common idioms (tremolo, chorus, flanger) and replaces subgraphs with hand-optimized struct implementations.

Benchmark results:

  • Tremolo: 175µs (8% faster than concrete struct baseline!)
  • Chorus: 611µs (17% faster than concrete struct!)
  • Flanger: 647µs (13% faster than concrete struct!)

The optimized compositions use primitives directly with no abstraction overhead, allowing better compiler optimization than even hand-written effect structs.

When to use: Transparent optimization of dynamic graphs. Enable optimize feature and call optimize_graph() after building.

Status: Implemented in optimize.rs. Patterns for tremolo, chorus, flanger.

Tier 3: Cranelift JIT (cranelift feature)

JIT-compile graphs to native code at runtime. Eliminates dyn dispatch and wire iteration.

Block Processing: JIT uses block-based processing via BlockProcessor::process_block() to amortize function call overhead. Per-sample JIT calls have ~15 cycle overhead that dominates simple effects. Block processing amortizes this across 64-512 samples.

rust
// JIT only supports block processing (no per-sample process())
compiled.process_block(&input, &mut output, &mut ctx);

Trade-off: ~1-10ms compilation latency when graph changes, significant implementation complexity, harder to debug.

When to use: User-designed effects at runtime (not known at compile time). For compile-time known graphs, prefer Tier 4 (codegen).

Status: Proof-of-concept in jit.rs. Simple gain/tremolo tests pass. Full graph compilation with block processing implemented.

Tier 4: Build.rs Code Generation (codegen module)

Generate optimized Rust code from graphs at build time. Define graphs as SerialAudioGraph in build.rs, generate specialized struct code, and include it in your crate.

Benchmark results:

EffectTier 2 OptimizedTier 4 CodegenDifference
Tremolo176 µs176 µs~0%
Chorus610 µs653 µs~7%
Flanger684 µs697 µs~2%

The codegen is generic (topological sort + node processing) rather than pattern-matching specific effects. Performance is within rounding error of hand-optimized Tier 2 code.

Example build.rs:

rust
use rhizome_resin_audio_codegen::{
    SerialAudioGraph, SerialAudioNode, SerialParamWire, generate_effect, generate_header,
};
use std::env;
use std::fs;
use std::path::Path;

fn main() {
    let out_dir = env::var("OUT_DIR").unwrap();
    let dest = Path::new(&out_dir).join("my_effects.rs");

    let mut code = generate_header();

    // Tremolo: LFO modulating gain
    let tremolo_graph = SerialAudioGraph {
        nodes: vec![
            SerialAudioNode::Lfo { rate: 5.0 },
            SerialAudioNode::Gain { gain: 1.0 },
        ],
        audio_wires: vec![],
        param_wires: vec![SerialParamWire {
            from: 0,
            to: 1,
            param: 0,
            base: 0.5,
            scale: 0.5,
        }],
        input_node: Some(1),
        output_node: Some(1),
    };
    code.push_str(&generate_effect(&tremolo_graph, "MyTremolo"));

    fs::write(&dest, code).unwrap();
    println!("cargo:rerun-if-changed=build.rs");
}

Using in your crate:

rust
// Include generated code
mod generated {
    include!(concat!(env!("OUT_DIR"), "/my_effects.rs"));
}

use generated::MyTremolo;
use rhizome_resin_audio::graph::{AudioContext, AudioNode};

let mut tremolo = MyTremolo::new(44100.0);
let output = tremolo.process(input, &ctx);

Trade-off: Graphs must be known at compile time. The generated code uses topological sort to determine processing order - works with arbitrary graph structures, not just recognized patterns.

When to use: Library authors providing optimized effects, maximum performance for fixed graph structures.

Status: ✅ Complete. Implementation in resin-audio-codegen crate.

Codegen Implementation

Status: ✅ Complete in resin-audio-codegen crate

Architecture:

SerialAudioGraph (Rust structs)
    → generate_effect()
    → topological sort
    → Rust code with concrete types + inlined processing

Key types (resin-audio-codegen):

rust
pub struct SerialAudioGraph {
    pub nodes: Vec<SerialAudioNode>,
    pub audio_wires: Vec<(usize, usize)>,
    pub param_wires: Vec<SerialParamWire>,
    pub input_node: Option<usize>,
    pub output_node: Option<usize>,
}

pub enum SerialAudioNode {
    Lfo { rate: f32 },
    Gain { gain: f32 },
    Delay { max_samples: usize, feedback: f32, mix: f32 },
    Mix { mix: f32 },
    Envelope { attack: f32, release: f32 },
    Allpass { coefficient: f32 },
    Passthrough,
}

Implementation approach:

  1. Topological sort (Kahn's algorithm) determines processing order from dependencies
  2. Each node generates concrete struct fields and processing code
  3. Param modulation applies base + output * scale before node processing
  4. Constant folding for known values (e.g., wet/dry mix pre-computed)

Files:

  • crates/resin-audio-codegen/src/lib.rs - Main implementation (~780 lines)
  • crates/resin-audio/build.rs - Uses codegen for benchmark effects
  • crates/resin-audio/src/codegen.rs - Re-exports for library consumers

Future: Control Rate

Currently deferred, but the design supports it:

rust
pub trait AudioNode {
    /// How often params need updating (default: every sample)
    fn control_rate(&self) -> ControlRate { ControlRate::Audio }
}

pub enum ControlRate {
    Audio,           // Every sample
    Block(usize),    // Every N samples
    OnChange,        // Only when modulator changes
}

The graph executor can batch param updates for Block rate nodes, reducing overhead for LFOs that don't need sample-accurate modulation.

Open Questions

  1. Feedback loops - Audio effects often have feedback (delay output → input). The graph needs to handle this without cycle errors. Solution: explicit Feedback node that provides one-sample delay.

  2. Multi-output nodes - Some primitives (like stereo processors) have multiple outputs. Current design assumes single output. May need outputs: Vec<f32> instead of single return.

  3. Preset serialization - Effects-as-graphs are naturally serializable (node types + connections). Need schema for saving/loading effect presets.

  • unification.md - Original analysis of audio/graphics effects
  • ../domains/audio.md - Audio domain overview
  • ../open-questions.md - Original modulation decisions (unimplemented)