Audio Graph Primitives Architecture
This document describes the target architecture for audio effects: primitives as graph nodes with parameter modulation, eliminating dedicated composition structs.
Problem Statement
Current Architecture
Primitives (DelayLine, PhaseOsc, EnvelopeFollower, Allpass1, Smoother)
↓ hardcoded wiring in structs
Compositions (ModulatedDelay, AmplitudeMod, AllpassBank)
↓ impl AudioNode
Chain/AudioGraph (runtime, dyn dispatch)Issues
Arbitrary groupings - Why is
ModulatedDelaya composition butDynamicsProcessorisn't? The boundaries feel arbitrary.Extensibility burden - New effects require: new struct, impl AudioNode, export from lib.rs. Users can't easily create custom effects.
Duplicate implementation paths - Hardcoded compositions duplicate what the graph could do. Maintaining both is wasteful.
Alternate implementations - GPU/WASM backends must implement both primitives AND compositions.
Target Architecture
Primitives (implement AudioNode directly)
↓
Graph with parameter modulation
↓
Effects = graph configurations (data, not code)Benefits
| Aspect | Current | Target |
|---|---|---|
| New effect | New struct + impl + export | Graph configuration |
| Custom modulation | Not possible | Wire any output → any param |
| Alternate backends | Must impl compositions | Only impl primitives |
| Serialization | Code | Data (graph description) |
Design
Primitives as AudioNode
Each primitive implements AudioNode with modulatable parameters:
pub trait AudioNode: Send {
/// Process one sample. Parameters come from context or modulation.
fn process(&mut self, input: f32, ctx: &AudioContext) -> f32;
/// Reset internal state.
fn reset(&mut self) {}
/// Declare modulatable parameters.
fn params(&self) -> &[ParamDescriptor] { &[] }
/// Set a parameter value (called by graph before process).
fn set_param(&mut self, index: usize, value: f32);
}
pub struct ParamDescriptor {
pub name: &'static str,
pub default: f32,
pub min: f32,
pub max: f32,
}BlockProcessor Trait
For tier-agnostic code, use BlockProcessor which all tiers implement:
pub trait BlockProcessor: Send {
fn process_block(&mut self, input: &[f32], output: &mut [f32], ctx: &mut AudioContext);
fn reset(&mut self);
}
// Blanket impl for AudioNode types (Tier 1/2/4)
impl<T: AudioNode> BlockProcessor for T {
fn process_block(&mut self, input: &[f32], output: &mut [f32], ctx: &mut AudioContext) {
for (inp, out) in input.iter().zip(output.iter_mut()) {
*out = self.process(*inp, ctx);
ctx.advance();
}
}
}
// JIT (Tier 3) implements BlockProcessor natively
impl BlockProcessor for CompiledGraph { ... }Code written against BlockProcessor works with any tier:
fn apply_effect<P: BlockProcessor>(effect: &mut P, audio: &mut [f32], sample_rate: f32) {
let mut output = vec![0.0; audio.len()];
let mut ctx = AudioContext::new(sample_rate);
effect.process_block(audio, &mut output, &mut ctx);
audio.copy_from_slice(&output);
}Example primitive implementations:
// DelayLine - modulatable delay time
impl AudioNode for DelayLine<true> {
fn process(&mut self, input: f32, _ctx: &AudioContext) -> f32 {
self.write(input);
self.read_interp(self.time) // time set via set_param
}
fn params(&self) -> &[ParamDescriptor] {
&[ParamDescriptor { name: "time", default: 0.0, min: 0.0, max: 1.0 }]
}
fn set_param(&mut self, index: usize, value: f32) {
if index == 0 { self.time = value; }
}
}
// PhaseOsc - outputs control signal, modulatable rate
impl AudioNode for PhaseOsc {
fn process(&mut self, _input: f32, _ctx: &AudioContext) -> f32 {
self.tick(self.phase_inc) // phase_inc set via set_param
}
fn params(&self) -> &[ParamDescriptor] {
&[ParamDescriptor { name: "rate", default: 1.0, min: 0.01, max: 100.0 }]
}
fn set_param(&mut self, index: usize, value: f32) {
if index == 0 { self.phase_inc = value; }
}
}Parameter Modulation in Graph
The graph supports two connection types:
- Audio connections - sample data flows between nodes
- Param connections - one node's output modulates another's parameter
pub struct AudioGraph {
nodes: Vec<Box<dyn AudioNode>>,
audio_wires: Vec<AudioWire>,
param_wires: Vec<ParamWire>,
}
pub struct AudioWire {
from_node: usize,
to_node: usize,
}
pub struct ParamWire {
from_node: usize,
to_node: usize,
to_param: usize,
/// Transform: base + modulation * scale
scale: f32,
base: f32,
}
impl AudioGraph {
/// Connect audio output → audio input
pub fn connect(&mut self, from: usize, to: usize) { ... }
/// Connect audio output → parameter (modulation)
pub fn modulate(&mut self, from: usize, to: usize, param: &str, scale: f32, base: f32) { ... }
}Graph Execution
Per-sample execution with modulation:
impl AudioGraph {
pub fn process(&mut self, input: f32, ctx: &AudioContext) -> f32 {
// Store node outputs
let mut outputs = vec![0.0; self.nodes.len()];
outputs[0] = input; // Input node
for (i, node) in self.nodes.iter_mut().enumerate().skip(1) {
// Apply parameter modulation from connected sources
for wire in &self.param_wires {
if wire.to_node == i {
let mod_value = outputs[wire.from_node];
let param_value = wire.base + mod_value * wire.scale;
node.set_param(wire.to_param, param_value);
}
}
// Get audio input (sum of all connected sources)
let audio_in: f32 = self.audio_wires
.iter()
.filter(|w| w.to_node == i)
.map(|w| outputs[w.from_node])
.sum();
outputs[i] = node.process(audio_in, ctx);
}
outputs[self.output_node]
}
}Effects as Graph Constructors
Effects become functions that build graphs:
pub fn chorus(sample_rate: f32) -> AudioGraph {
let mut g = AudioGraph::new();
let input = g.add_input();
let lfo = g.add(PhaseOsc::new());
let delay = g.add(DelayLine::new(4096));
let output = g.add_output();
// LFO modulates delay time: base 20ms, depth ±5ms
g.modulate(lfo, delay, "time", 0.005 * sample_rate, 0.020 * sample_rate);
// Audio path: input → delay → output (mixed with dry)
g.connect(input, delay);
g.connect_mix(input, delay, output, 0.5); // 50% wet
g
}
pub fn flanger(sample_rate: f32) -> AudioGraph {
// Same structure as chorus, different parameters!
let mut g = AudioGraph::new();
// ... shorter delay, more feedback, faster LFO
g
}Relationship to resin-core Graph
The resin-core Graph is for general node-based pipelines (meshes, textures, etc.) with typed ports and single execution.
The audio graph needs:
- Per-sample execution (not single-shot)
- Parameter modulation (control signals → params)
- Stateful nodes (delay lines, filters)
Options:
- Separate AudioGraph (current direction) - Audio-specific, optimized for sample processing
- Extend resin-core - Add rate/modulation concepts to general graph
- Shared modulation abstraction -
Modulatable<T>type used by both
Recommendation: Keep AudioGraph separate for now, but design Modulatable<T> as a shared concept that both can use later. The execution models are too different to force into one graph type.
Migration Path
Phase 1: Primitives as AudioNode
- Add
params()andset_param()to AudioNode trait - Implement AudioNode for primitives in
primitive.rs - Keep existing compositions working
Phase 2: Graph with Modulation
- Add
param_wiresto AudioGraph - Add
modulate()method - Update execution to apply modulation
Phase 3: Effects as Graphs
- Rewrite
chorus(),flanger(), etc. to return AudioGraph - Deprecate composition structs (ModulatedDelay, etc.)
- Update benchmarks to verify no regression
Phase 4: Cleanup
- Remove composition structs
- Primitives become the only building blocks
- Document graph-based effect creation
Implementation Status
| Phase | Status | Notes |
|---|---|---|
| Phase 1: Primitives as AudioNode | ✅ Complete | ParamDescriptor, params(), set_param() added to AudioNode. Primitive nodes: DelayNode, LfoNode, EnvelopeNode, AllpassNode, GainNode, MixNode |
| Phase 2: Graph with Modulation | ✅ Complete | AudioGraph with modulate() and modulate_named(). Param wires apply base + output * scale before each node processes. |
| Phase 3: Effects as Graphs | ✅ Partial | tremolo_graph(), chorus_graph(), flanger_graph() demonstrate the pattern. Old composition structs still exist alongside. |
| Phase 4: Cleanup | ⏳ Pending | Composition structs (ModulatedDelay, AmplitudeMod, AllpassBank) can be removed once graph versions are validated. |
Remaining Work
- Benchmark comparison - Compare graph-based vs composition-based effects performance
- Dry/wet routing - MixNode needs cleaner way to receive dry signal in graph context
- Remove composition structs - After validation, delete old types
- Phaser as graph - AllpassBank with modulated coefficients needs multi-stage allpass
Performance Considerations
Benchmark Results
Initial benchmarks showed significant overhead. After optimization:
| Effect | Composition | Graph (initial) | Graph (optimized) | Final Overhead |
|---|---|---|---|---|
| Tremolo | 189 µs | 718 µs | 587 µs | 3.11x (was 3.76x) |
| Chorus | 735 µs | 1,637 µs | 1,163 µs | 1.58x (was 2.19x) |
| Flanger | 754 µs | 1,224 µs | 984 µs | 1.31x (was 1.65x) |
(Times are for processing 1 second of audio at 44.1kHz)
Optimizations implemented:
- Pre-computed wire lookups (per-node execution info, ~3-4% improvement)
- Control-rate parameter updates (every 64 samples, ~16-26% improvement)
Overhead Sources
The remaining overhead comes from:
| Source | Impact | Notes |
|---|---|---|
| Dynamic dispatch | MODERATE | Box<dyn AudioNode> vtable lookup per node per sample |
| Node loop | LOW | Iterating node vec vs hardcoded struct fields |
| Output vector | LOW | Writing to outputs[i] vs direct field access |
Wire iteration and set_param were addressed by optimizations.
Optimization Strategies
1. Pre-compute wire lookups ✅ Implemented
struct NodeExecInfo {
audio_inputs: Vec<NodeIndex>, // Which nodes feed this one
param_mods: Vec<(NodeIndex, usize, f32, f32)>, // (from, param, base, scale)
}Built once when graph structure changes via build_exec_info().
2. Control-rate parameter updates ✅ Implemented
graph.set_control_rate(64); // Update params every 64 samplesDefault effect constructors use control rate 64 (~1.5ms at 44.1kHz).
Why Tier 2 Beats Hand-Written Structs
Counterintuitively, pattern-matched optimized structs (Tier 2) are faster than the original hand-written effect structs:
| Effect | Original Struct | Tier 2 Optimized | Improvement |
|---|---|---|---|
| Tremolo | 189 µs | 175 µs | 8% faster |
| Chorus | 735 µs | 611 µs | 17% faster |
| Flanger | 754 µs | 647 µs | 14% faster |
Root cause: algebraic simplification.
The original AmplitudeMod (tremolo) does this per sample:
let lfo_val = self.lfo.sine_uni(); // = sine() * 0.5 + 0.5
let mod_amount = 1.0 - self.depth * lfo_val;
input * mod_amount
// 5 ops after sine: *0.5, +0.5, *depth, 1.0-, *inputThe optimized TremoloOptimized pre-computes constants:
let lfo_out = self.lfo.sine();
let gain = self.base + lfo_out * self.scale; // base/scale computed once
input * gain
// 3 ops after sine: *scale, +base, *inputSaving 2 FLOPs per sample × 44,100 samples/sec = 88,200 fewer operations/sec.
Implication for codegen: Real codegen should perform:
- Constant folding (pre-compute what's known at graph construction)
- Algebraic simplification (eliminate redundant conversions)
- Dead code elimination (remove unused paths)
3. Monomorphization / code generation (not implemented) For small fixed graphs, generate specialized code that inlines the graph structure. Options:
- Proc macro at compile time
- Cranelift JIT at runtime (see below)
4. SIMD batching (not implemented) Process N samples at once, amortizing per-node overhead.
Cranelift JIT Consideration
Cranelift could JIT-compile graph configurations into native code, eliminating:
- Wire iteration (baked into generated code)
- Dynamic dispatch (concrete function calls)
- set_param overhead (params computed inline)
Pros:
- Near-zero overhead for any graph configuration
- Users get composition-level performance without writing Rust
- Graph changes trigger recompilation (acceptable for effect design, not live modulation)
Cons:
- Significant complexity (cranelift dependency, IR generation)
- Compile latency when graph changes (~1-10ms typical)
- Platform support considerations (cranelift targets)
- Debugging JIT code is harder
Verdict: Worth exploring if simpler optimizations don't close the gap. Start with pre-computed wire lookups.
Acceptable Overhead?
For context, 1.6ms to process 1 second of audio = 0.16% CPU at real-time rate. Even the slowest graph version (chorus at 1.6ms) is well under real-time requirements.
However, effects are often chained. A chain of 10 graph-based effects at 2x overhead could matter. And the principle of "graph should be as fast as hardcoded" is worth pursuing.
Performance Tiers
Four compilation tiers offer different tradeoffs:
| Tier | Feature | Overhead | Flexibility | Complexity |
|---|---|---|---|---|
| 1. Dynamic graph | default | ~1.3-3x | Full runtime flexibility | Low |
| 2. Pattern matching | optimize | ~0% (faster than concrete!) | Common idioms | Low |
| 3. Cranelift JIT | cranelift | ~0-5% | Runtime flexibility | High |
| 4. Build.rs codegen | codegen module | ~0-7% | Compile-time only | Low |
Tier 1: Dynamic AudioGraph (default)
The default. Graphs are built at runtime, nodes are boxed trait objects.
Overhead sources: dyn dispatch per node per sample, output vector indexing.
When to use: Most cases. Real-time audio is feasible (0.16% CPU for 1 second of chorus).
Tier 2: Pattern Matching + Optimized Compositions (optimize feature)
Graph fingerprinting identifies common idioms (tremolo, chorus, flanger) and replaces subgraphs with hand-optimized struct implementations.
Benchmark results:
- Tremolo: 175µs (8% faster than concrete struct baseline!)
- Chorus: 611µs (17% faster than concrete struct!)
- Flanger: 647µs (13% faster than concrete struct!)
The optimized compositions use primitives directly with no abstraction overhead, allowing better compiler optimization than even hand-written effect structs.
When to use: Transparent optimization of dynamic graphs. Enable optimize feature and call optimize_graph() after building.
Status: Implemented in optimize.rs. Patterns for tremolo, chorus, flanger.
Tier 3: Cranelift JIT (cranelift feature)
JIT-compile graphs to native code at runtime. Eliminates dyn dispatch and wire iteration.
Block Processing: JIT uses block-based processing via BlockProcessor::process_block() to amortize function call overhead. Per-sample JIT calls have ~15 cycle overhead that dominates simple effects. Block processing amortizes this across 64-512 samples.
// JIT only supports block processing (no per-sample process())
compiled.process_block(&input, &mut output, &mut ctx);Trade-off: ~1-10ms compilation latency when graph changes, significant implementation complexity, harder to debug.
When to use: User-designed effects at runtime (not known at compile time). For compile-time known graphs, prefer Tier 4 (codegen).
Status: Proof-of-concept in jit.rs. Simple gain/tremolo tests pass. Full graph compilation with block processing implemented.
Tier 4: Build.rs Code Generation (codegen module)
Generate optimized Rust code from graphs at build time. Define graphs as SerialAudioGraph in build.rs, generate specialized struct code, and include it in your crate.
Benchmark results:
| Effect | Tier 2 Optimized | Tier 4 Codegen | Difference |
|---|---|---|---|
| Tremolo | 176 µs | 176 µs | ~0% |
| Chorus | 610 µs | 653 µs | ~7% |
| Flanger | 684 µs | 697 µs | ~2% |
The codegen is generic (topological sort + node processing) rather than pattern-matching specific effects. Performance is within rounding error of hand-optimized Tier 2 code.
Example build.rs:
use rhizome_resin_audio_codegen::{
SerialAudioGraph, SerialAudioNode, SerialParamWire, generate_effect, generate_header,
};
use std::env;
use std::fs;
use std::path::Path;
fn main() {
let out_dir = env::var("OUT_DIR").unwrap();
let dest = Path::new(&out_dir).join("my_effects.rs");
let mut code = generate_header();
// Tremolo: LFO modulating gain
let tremolo_graph = SerialAudioGraph {
nodes: vec![
SerialAudioNode::Lfo { rate: 5.0 },
SerialAudioNode::Gain { gain: 1.0 },
],
audio_wires: vec![],
param_wires: vec![SerialParamWire {
from: 0,
to: 1,
param: 0,
base: 0.5,
scale: 0.5,
}],
input_node: Some(1),
output_node: Some(1),
};
code.push_str(&generate_effect(&tremolo_graph, "MyTremolo"));
fs::write(&dest, code).unwrap();
println!("cargo:rerun-if-changed=build.rs");
}Using in your crate:
// Include generated code
mod generated {
include!(concat!(env!("OUT_DIR"), "/my_effects.rs"));
}
use generated::MyTremolo;
use rhizome_resin_audio::graph::{AudioContext, AudioNode};
let mut tremolo = MyTremolo::new(44100.0);
let output = tremolo.process(input, &ctx);Trade-off: Graphs must be known at compile time. The generated code uses topological sort to determine processing order - works with arbitrary graph structures, not just recognized patterns.
When to use: Library authors providing optimized effects, maximum performance for fixed graph structures.
Status: ✅ Complete. Implementation in resin-audio-codegen crate.
Codegen Implementation
Status: ✅ Complete in
resin-audio-codegencrate
Architecture:
SerialAudioGraph (Rust structs)
→ generate_effect()
→ topological sort
→ Rust code with concrete types + inlined processingKey types (resin-audio-codegen):
pub struct SerialAudioGraph {
pub nodes: Vec<SerialAudioNode>,
pub audio_wires: Vec<(usize, usize)>,
pub param_wires: Vec<SerialParamWire>,
pub input_node: Option<usize>,
pub output_node: Option<usize>,
}
pub enum SerialAudioNode {
Lfo { rate: f32 },
Gain { gain: f32 },
Delay { max_samples: usize, feedback: f32, mix: f32 },
Mix { mix: f32 },
Envelope { attack: f32, release: f32 },
Allpass { coefficient: f32 },
Passthrough,
}Implementation approach:
- Topological sort (Kahn's algorithm) determines processing order from dependencies
- Each node generates concrete struct fields and processing code
- Param modulation applies
base + output * scalebefore node processing - Constant folding for known values (e.g., wet/dry mix pre-computed)
Files:
crates/resin-audio-codegen/src/lib.rs- Main implementation (~780 lines)crates/resin-audio/build.rs- Uses codegen for benchmark effectscrates/resin-audio/src/codegen.rs- Re-exports for library consumers
Future: Control Rate
Currently deferred, but the design supports it:
pub trait AudioNode {
/// How often params need updating (default: every sample)
fn control_rate(&self) -> ControlRate { ControlRate::Audio }
}
pub enum ControlRate {
Audio, // Every sample
Block(usize), // Every N samples
OnChange, // Only when modulator changes
}The graph executor can batch param updates for Block rate nodes, reducing overhead for LFOs that don't need sample-accurate modulation.
Open Questions
Feedback loops - Audio effects often have feedback (delay output → input). The graph needs to handle this without cycle errors. Solution: explicit
Feedbacknode that provides one-sample delay.Multi-output nodes - Some primitives (like stereo processors) have multiple outputs. Current design assumes single output. May need
outputs: Vec<f32>instead of single return.Preset serialization - Effects-as-graphs are naturally serializable (node types + connections). Need schema for saving/loading effect presets.
Related Documents
unification.md- Original analysis of audio/graphics effects../domains/audio.md- Audio domain overview../open-questions.md- Original modulation decisions (unimplemented)