1. Introduction: The Bio-Acoustic Paradigm

Conchordal represents a fundamental divergence from established norms in generative music and computational audio. Where traditional systems rely on symbolic manipulation—operating on grids of quantized pitch (MIDI, Equal Temperament) and discretized time (BPM, measures)—Conchordal functions as a continuous, biologically grounded simulation of auditory perception. It posits that musical structure is not an artifact of abstract composition but an emergent property of acoustic survival.

This technical note serves as an exhaustive reference for the system's architecture, signal processing algorithms, and artificial life strategies. It details how Conchordal synthesizes the principles of psychoacoustics—specifically critical band theory, virtual pitch perception, and neural entrainment—with the dynamics of an autonomous ecosystem. In this environment, sound is treated as a living organism, an "Individual" possessing metabolism, sensory processing capabilities, and the autonomy to navigate a hostile spectral terrain.

The emergent behavior of the system is driven by a unified fitness function: the pursuit of Consonance. Agents within the Conchordal ecosystem do not follow a pre-written score. Instead, they continuously analyze their environment to maximize their "Spectral Comfort"—defined as the minimization of sensory roughness—and their "Harmonic Stability," or the maximization of virtual root strength. The result is a self-organizing soundscape where harmony, rhythm, and timbre evolve organically through the interactions of physical laws rather than deterministic sequencing.

This document explores the three foundational pillars of the Conchordal architecture:

  1. The Psychoacoustic Coordinate System: The mathematical framework of Log2Space and ERB scales that replaces linear Hertz and integer MIDI notes.
  2. The Cognitive Landscape: The real-time DSP pipeline that computes Roughness ($R$) and Harmonicity ($H$) fields from the raw audio stream.
  3. The Life Engine: The agent-based model governing the metabolism, movement, and neural entrainment of the audio entities.

2. The Psychoacoustic Coordinate System

A critical innovation in Conchordal is the rejection of the linear frequency scale ($f$) for internal processing. Human auditory perception is inherently logarithmic; our perception of pitch interval is based on frequency ratios rather than differences. To model this accurately and efficiently, Conchordal establishes a custom coordinate system, Log2Space, which aligns the computational grid with the tonotopic map of the cochlea.

2.1 The Log2 Space Foundation

The Log2Space struct serves as the backbone for all spectral analysis, kernel convolution, and agent positioning within the system. It maps the physical frequency domain ($f$ in Hz) to a perceptual logarithmic domain ($l$).

2.1.1 Mathematical Definition

The transformation from Hertz to the internal log-coordinate is defined as the base-2 logarithm of the frequency. This choice is deliberate: in base-2, an increment of 1.0 corresponds exactly to an octave, the most fundamental interval in pitch perception.

$$ l(f) = \log_2(f) $$

The inverse transformation, used to derive synthesis parameters for the audio thread, is:

$$ f(l) = 2^l $$

The coordinate space is discretized into a grid defined by a resolution parameter, bins_per_oct ($B$). This parameter determines the granularity of the simulation. A typical value of $B=48$ or $B=96$ provides sub-semitone resolution sufficient for continuous pitch gliding and microtonal inflection. The step size $\Delta l$ is constant across the entire spectral range:

$$ \Delta l = \frac{1}{B} $$

2.1.2 Grid Construction and Indexing

The Log2Space structure pre-calculates the center frequencies for all bins spanning the configured range $[f_{min}, f_{max}]$. The number of bins $N$ is determined to ensure complete coverage:

$$ N = \lfloor \frac{\log_2(f_{max}) - \log_2(f_{min})}{\Delta l} \rfloor + 1 $$

The system maintains two parallel vectors for $O(1)$ access during DSP operations:

  • centers_log2: The logarithmic coordinates $l_i = \log_2(f_{min}) + i \cdot \Delta l$.
  • centers_hz: The pre-computed linear frequencies $f_i = 2^{l_i}$.

This pre-computation is vital for real-time performance, removing the need for costly log2 and pow calls inside the inner loops of the spectral kernels. The method index_of_freq(hz) provides the quantization logic, mapping an arbitrary float frequency to the nearest bin index.

2.2 Constant-Q Bandwidth Characteristics

The Log2Space inherently enforces a Constant-Q (Constant Quality Factor) characteristic across the spectrum. In signal processing terms, $Q$ is defined as the ratio of the center frequency to the bandwidth: $Q = f / \Delta f$.

In a linear system (like a standard FFT), $\Delta f$ is constant, meaning $Q$ increases with frequency. In Log2Space, the bandwidth $\Delta f_i$ of the $i$-th bin scales proportionally with the center frequency $f_i$. This property mimics the frequency selectivity of the human auditory system, where the ear's ability to resolve frequencies diminishes (in absolute Hz terms) as frequency increases. This alignment allows Conchordal to allocate computational resources efficiently—using high temporal resolution at high frequencies and high spectral resolution at low frequencies—without manual multirate processing.

2.3 The Equivalent Rectangular Bandwidth (ERB) Scale

While Log2Space handles pitch relationships (octaves, harmonics), it does not perfectly model the critical bands of the ear, which are wider at low frequencies than a pure logarithmic mapping suggests. To accurately calculate sensory roughness (dissonance), Conchordal implements the Equivalent Rectangular Bandwidth (ERB) scale based on the Glasberg & Moore (1990) model.

The core/erb.rs module provides the transformation functions used by the Roughness Kernel. The conversion from frequency $f$ (Hz) to ERB-rate units $E$ is given by:

$$ E(f) = 21.4 \log_{10}(0.00437f + 1) $$

The bandwidth of a critical band at frequency $f$ is:

$$ BW_{ERB}(f) = 24.7(0.00437f + 1) $$

This scale is distinct from Log2Space. While Log2Space is the domain for pitch and harmonicity (where relationships are octave-invariant), the roughness calculation requires mapping spectral energy into the ERB domain to evaluate interference. The system effectively maintains a dual-view of the spectrum: one strictly logarithmic for harmonic templates, and one psychoacoustic for dissonance evaluation.

3. The Auditory Landscape: Analyzing the Environment

The "Landscape" is the central data structure in Conchordal. It acts as the shared environment for all agents, a dynamic scalar field representing the psychoacoustic "potential" of every frequency bin. Agents do not interact directly with each other; they interact with the Landscape, which aggregates the spectral energy of the entire population. This decouples the complexity of the simulation from the number of agents ($O(N)$ vs $O(N^2)$).

The Landscape is updated every audio frame (or block) by the Analysis Workers. It synthesizes two primary metrics:

  • Roughness ($R$): The sensory dissonance caused by rapid beating between proximal partials.
  • Harmonicity ($H$): The measure of virtual pitch strength and spectral periodicity.

Both metrics are normalized to the $[0, 1]$ range before combination. The net Consonance ($C$) is computed as a signed difference, then rescaled:

$$ C_{signed} = \text{clip}(H_{01} - w_r \cdot R_{01},; -1,; 1) $$

$$ C_{01} = \frac{C_{signed} + 1}{2} $$

where $w_r$ is the roughness_weight parameter (default 1.0). Individual agents maintain their own perceptual context (PerceptualContext) which tracks per-agent boredom and familiarity, providing additional score adjustments during pitch selection.

3.1 Non-Stationary Gabor Transform (NSGT)

To populate the Log2Space with spectral data, Conchordal uses a custom implementation of the Non-Stationary Gabor Transform (NSGT). Unlike the Short-Time Fourier Transform (STFT), which uses a fixed window size, the NSGT varies the window length $L$ inversely with frequency to maintain the Constant-Q property derived in Section 2.2.

3.1.1 Kernel-Based Spectral Analysis

The implementation in core/nsgt_kernel.rs utilizes a sparse kernel approach to perform this transform efficiently. For each log-frequency band $k$, a time-domain kernel $h_k$ is precomputed. This kernel combines a complex sinusoid at the band's center frequency $f_k$ with a periodic Hann window $w_k$ of length $L_k \approx Q \cdot f_s / f_k$.

$$ h_k[n] = w_k[n] \cdot e^{-j 2\pi f_k n / f_s} $$

These kernels are transformed into the frequency domain ($K_k[\nu]$) during initialization. To optimize performance, the system sparsifies these frequency kernels, storing only the bins with significant energy.

During runtime, the system performs a single FFT on the input audio buffer to obtain the spectrum $X[\nu]$. The complex coefficient $C_k$ for band $k$ is then computed via the inner product in the frequency domain:

$$ C_k = \frac{1}{N_{fft}} \sum_{\nu} X[\nu] \cdot K_k^*[\nu] $$

This "one FFT, many kernels" approach allows Conchordal to generate a high-resolution, logarithmically spaced spectrum covering 20Hz to 20kHz without the computational overhead of calculating separate DFTs for each band or using recursive filter banks.

3.1.2 Real-Time Temporal Smoothing

The raw spectral coefficients $C_k$ exhibit high variance due to the stochastic nature of the audio input (especially with noise-based agents). To create a stable field for agents to sample, the RtNsgtKernelLog2 struct wraps the NSGT with a temporal smoothing layer.

It implements a per-band leaky integrator (exponential smoothing). Crucially, the time constant $\tau$ is frequency-dependent. Low frequencies, which evolve slowly, are smoothed with a longer $\tau$, while high frequencies, which carry transient details, have a shorter $\tau$.

$$ y_k[t] = (1 - \alpha_k) \cdot |C_k[t]| + \alpha_k \cdot y_k[t-1] $$

where the smoothing factor $\alpha_k$ is derived from the frame interval $\Delta t$:

$$ \alpha_k = e^{-\Delta t / \tau(f_k)} $$

This models the "integration time" of the ear, ensuring that the Landscape reflects a psychoacoustic percept rather than instantaneous signal power.

3.2 Roughness ($R$) Calculation: The Plomp-Levelt Model

Roughness is the sensation of "harshness" or "buzzing" caused by the interference of spectral components that fall within the same critical band but are not sufficiently close to be perceived as a single tone (beating). Conchordal implements a variation of the Plomp-Levelt model via convolution in the ERB domain.

3.2.1 The Interference Kernel

The core of the calculation is the Roughness Kernel, defined in core/roughness_kernel.rs. This kernel $K_{rough}(\Delta z)$ models the interference curve between two partials separated by $\Delta z$ ERB. The curve creates a penalty that rises rapidly as partials separate, peaks at approximately 0.25 ERB (maximum roughness), and then decays as they separate further.

The implementation uses a parameterized function eval_kernel_delta_erb to generate this shape:

$$ g(\Delta z) = e^{-\frac{\Delta z^2}{2\sigma^2}} \cdot (1 - e^{-(\frac{\Delta z}{\sigma_{suppress}})^p}) $$

The second term is a suppression factor that ensures the kernel goes to zero as $\Delta z \to 0$, preventing a single pure tone from generating self-roughness.

3.2.2 Convolutional Approach

Calculating roughness pairwise for all spectral bins ($N^2$ complexity) is computationally prohibitive for real-time applications. Conchordal solves this by treating the Roughness calculation as a linear convolution.

  1. Mapping: The log-spaced amplitude spectrum from the NSGT is mapped (or interpolated) onto a linear ERB grid.
  2. Convolution: This density $A(z)$ is convolved with the pre-calculated roughness kernel $K_{rough}$.

$$ R_{shape}(z) = (A * K_{rough})(z) = \int A(z-\tau) K_{rough}(\tau) d\tau $$

The result $R_{shape}(z)$ represents the raw "Roughness Shape" at frequency $z$. To convert this to a normalized fitness signal, Conchordal applies a physiological saturation mapping.

3.2.3 Physiological Saturation Mapping

Raw roughness values from the convolution have unbounded range. Rather than hard-clamping, Conchordal uses a saturation curve that models the compressive nonlinearity of auditory perception. This mapping converts reference-normalized roughness ratios to the $[0, 1]$ range.

Reference Normalization: The system maintains reference values $r_{ref,peak}$ and $r_{ref,total}$ representing "typical" roughness levels. The reference-normalized ratios are:

$$ x_{peak}(u) = \frac{R_{shape}(u)}{r_{ref,peak}} $$

$$ x_{total} = \frac{R_{shape,total}}{r_{ref,total}} $$

The Saturation Parameter: The parameter roughness_k ($k > 0$) controls the saturation curve's shoulder. The reference ratio $x = 1$ maps to:

$$ R_{ref} = \frac{1}{1+k} $$

Larger $k$ reduces $R_{01}$ for the same input ratio, making the system more tolerant of roughness.

Piecewise Saturation Mapping: The normalized roughness $R_{01}$ is computed from the reference-normalized ratio $x$ as:

$$ R_{01}(x; k) = \begin{cases} 0 & \text{if } x \leq 0 \ x \cdot \frac{1}{1+k} & \text{if } 0 < x < 1 \ 1 - \frac{k}{x+k} & \text{if } x \geq 1 \end{cases} $$

This function is continuous at $x = 1$ (both branches yield $\frac{1}{1+k}$) and saturates asymptotically to 1 as $x \to \infty$. The piecewise structure ensures linear response for low roughness (preserving sensitivity) while compressing extreme values (preventing saturation).

Numerical Safety: The implementation handles edge cases robustly:

  • $x = \text{NaN} \to 0$
  • $x = +\infty \to 1$
  • $x = -\infty \to 0$
  • Non-finite $k$ is treated as $10^{-6}$

Agents seeking consonance actively avoid peaks in the $R_{01}$ field.

3.3 Harmonicity ($H$): The Sibling Projection Algorithm

While Roughness drives agents away from dissonance (segregation), Harmonicity ($H$) drives them toward fusion—the creation of coherent chords and timbres. Conchordal introduces a novel algorithm termed "Sibling Projection" to compute this field. This algorithm approximates the brain's mechanism of "Common Root" detection (Virtual Pitch) entirely in the frequency domain.

3.3.1 Concept: Virtual Roots

The algorithm posits that any spectral peak at frequency $f$ implies the potential existence of a fundamental frequency (root) at its subharmonics ($f/2, f/3, f/4 \dots$). If multiple spectral peaks share a common subharmonic, that subharmonic represents a strong "Virtual Root".

3.3.2 The Two-Pass Projection

The algorithm operates on the Log2Space spectrum in two passes, utilizing the integer properties of the logarithmic grid:

  1. Downward Projection (Root Search): The current spectral envelope is "smeared" downward. For every bin $i$ with energy, the algorithm adds energy to bins $i - \log_2(k)$ for integers $k \in {1, 2, \dots, N}$.

    $$ Roots[i] = \sum_k A[i + \log_2(k)] \cdot w_k $$

    Here, $w_k$ is a weighting factor that decays with harmonic index $k$ (e.g., $k^{-\rho}$), reflecting that lower harmonics imply their roots more strongly than higher ones. The result Roots describes the strength of the virtual pitch at every frequency.

  2. Upward Projection (Harmonic Resonance): The system then projects the Roots spectrum back upwards. If a strong root exists at $f_r$, it implies stability for all its natural harmonics ($f_r, 2f_r, 3f_r \dots$).

    $$ H[i] = \sum_m Roots[i - \log_2(m)] \cdot w_m $$

Emergent Tonal Stability: Consider an environment with a single tone at 200 Hz.

  • Step 1 (Down): It projects roots at 100 Hz ($f/2$), 66.6 Hz ($f/3$), 50 Hz ($f/4$), etc.
  • Step 2 (Up): The 100 Hz root projects stability to 100, 200, 300, 400, 500... Hz.
    • 300 Hz is the Perfect 5th of the 100 Hz root.
    • 500 Hz is the Major 3rd of the 100 Hz root.

Thus, without any hardcoded knowledge of Western music theory, the system naturally generates stability peaks at the Major 3rd and Perfect 5th relationships, simply as a consequence of the physics of the harmonic series. An agent at 200 Hz creates a "gravity well" at 300 Hz and 500 Hz, inviting other agents to form a major triad.

3.3.3 Mirror Dualism: Overtone vs. Undertone

The implementation in core/harmonicity_kernel.rs includes a profound parameter: mirror_weight ($\alpha$). This parameter blends two distinct projection paths:

  • Path A (Overtone/Major): The standard "Down-then-Up" projection described above. It creates gravity based on the Overtone Series, favoring Major tonalities.
  • Path B (Undertone/Minor): An inverted "Up-then-Down" projection. It finds common overtones and projects undertones. This is the theoretical dual of Path A and favors Minor or Phrygian tonalities (the Undertone Series).

$$ H_{final} = (1-\alpha)H_{overtone} + \alpha H_{undertone} $$

By modulating mirror_weight, a user can continuously morph the fundamental physics of the universe from Major-centric to Minor-centric, observing how the ecosystem reorganizes itself in response.

4. The Life Engine: Agents and Autonomy

The "Life Engine" is the agent-based simulation layer that runs atop the DSP landscape. It manages the population of "Individuals," handling their lifecycle, sensory processing, vocalization timing, and audio synthesis.

4.1 Overview: The Individual Architecture

The Individual struct (life/individual.rs) is the atomic unit of the ecosystem. Its internal structure comprises:

ComponentTypeResponsibility
bodyAnySoundBodySound generation (waveform synthesis, spectral projection)
articulationArticulationWrapperRhythm, gating, envelope dynamics
pitchAnyPitchCorePitch target selection in log-frequency space
perceptualPerceptualContextPer-agent boredom/familiarity adaptation
phonation_enginePhonationEngineNote timing, clock sources, social coupling
phonationPhonationConfigConfiguration for the phonation engine

The Individual acts as an integration layer: it orchestrates lifecycle (metabolism, energy), coordinates cores via control-plane signals (PlannedPitch), and manages state transitions without coupling the cores directly.

4.2 SoundBody: The Actuator

The SoundBody trait (life/sound_body.rs) defines sound generation capabilities. Two implementations exist:

4.2.1 SineBody

A pure sine tone oscillator. Minimal parameters:

  • freq_hz: Fundamental frequency
  • amp: Amplitude
  • audio_phase: Current oscillator phase

4.2.2 HarmonicBody and TimbreGenotype

Synthesizes a complex tone with a fundamental and configurable partials. The TimbreGenotype struct encodes the timbre DNA:

ParameterTypeDescription
modeHarmonicModeHarmonic (integer multiples: $1, 2, 3...$) or Metallic (non-integer: $k^{1.4}$)
stiffnessf32Inharmonicity coefficient; stretches partial series via $f_k = k \cdot (1 + \text{stiffness} \cdot k^2)$
brightnessf32Spectral slope; partial amplitude decays as $k^{-\text{brightness}}$
combf32Even harmonic attenuation (0–1); creates hollow timbres
dampingf32Energy-dependent high-frequency decay; higher partials fade faster at low energy
vibrato_ratef32LFO frequency (Hz) for pitch modulation
vibrato_depthf32Vibrato extent (fraction of frequency)
jitterf32$1/f$ pink noise FM strength; adds organic fluctuation
unisonf32Detune amount for chorus effect (0 = single voice)

Spectral Projection: Both bodies implement project_spectral_body(), which writes their energy distribution back to the Log2Space grid for Landscape computation. This enables the system to "see" each agent's spectral footprint.

4.3 The Behavioral Core Stack

Behavior is split into two focused cores, each extensible with new strategies.

4.3.1 ArticulationCore (When/Gate)

Defined in life/articulation_core.rs. Manages rhythm, gating, and envelope dynamics. Three variants:

VariantDescriptionKey Parameters
EntrainKuramoto-style coupling to NeuralRhythmslifecycle, rhythm_freq, rhythm_sensitivity
SeqFixed-duration envelopeduration (seconds)
DroneContinuous tone with slow amplitude swaysway (modulation depth)

ArticulationWrapper: Wraps the core with a PlannedGate struct that manages fade-in/fade-out transitions when pitch changes occur. The gate value (0–1) multiplies the amplitude, ensuring smooth transitions.

ArticulationSignal: The output of articulation processing:

  • amplitude: Current envelope level
  • is_active: Whether the agent is currently sounding
  • relaxation: Modulation signal for vibrato/unison expansion
  • tension: Modulation signal for jitter intensification

4.3.2 PitchCore (Where)

Defined in life/pitch_core.rs. Proposes pitch targets in log-frequency space.

PitchHillClimbPitchCore: The default hill-climbing implementation with parameters:

ParameterDefaultDescription
neighbor_step_cents200Step size for neighbor exploration
tessitura_gravity0.1Penalty for distance from initial pitch center
improvement_threshold0.1Minimum score gain to trigger movement
exploration0.0Probability of random exploration (0–1)
persistence0.5Resistance to movement when satisfied (0–1)

Scoring: Each candidate is evaluated as:

$$ \text{score} = C_{01} - d_{\text{penalty}} - g_{\text{tessitura}} + \Delta s_{\text{perceptual}} $$

The TargetProposal output includes target_pitch_log2 and a salience score (0–1) reflecting improvement strength.

4.4 PerceptualContext: Subjective Adaptation

Defined in life/perceptual.rs. Models per-agent habituation and preference, preventing agents from "getting stuck" at locally optimal positions.

The context maintains two leaky integrators per frequency bin:

  • h_fast: Short-term exposure (boredom accumulator)
  • h_slow: Long-term exposure (familiarity accumulator)
ParameterDefaultDescription
tau_fast0.5 sTime constant for boredom decay
tau_slow20.0 sTime constant for familiarity decay
w_boredom1.0Weight of boredom penalty
w_familiarity0.2Weight of familiarity bonus
rho_self0.15Self-injection ratio (how much the agent's own position contributes)
boredom_gamma0.5Curvature exponent for boredom ($h_{\text{fast}}^\gamma$)
self_smoothing_radius1Spatial smoothing radius for self-injection
silence_mass_epsilon1e-6Threshold for detecting silence

Score Adjustment:

$$ \Delta s = w_{\text{familiarity}} \cdot h_{\text{slow}} - w_{\text{boredom}} \cdot h_{\text{fast}}^{\gamma} $$

This creates a dynamic where agents are drawn to familiar regions but pushed away from over-visited locations.

4.5 PhonationEngine: Timing and Vocalization

Defined in life/phonation_engine.rs. The PhonationEngine governs when an agent vocalizes, managing note onsets, durations, and coordination with other agents.

4.5.1 Clock Sources

The engine supports multiple clock sources for determining vocalization timing:

SourceDescription
ThetaGateAligns onsets to theta band zero-crossings from NeuralRhythms
CompositeCombines subdivision and internal phase sources

Composite Clock Configuration:

  • SubdivisionClockConfig: Divides the theta gate into subdivisions (divisions: Vec<u32>)
  • InternalPhaseClockConfig: Internal oscillator with ratio (relative to theta) and phase0 (initial phase)

4.5.2 Interval and Connect Policies

PhonationIntervalConfig: Controls spacing between note onsets:

VariantDescription
NoneNo automatic onset generation
AccumulatorProbabilistic onset based on rate with refractory period (gates)

PhonationConnectConfig: Controls note duration (legato vs staccato):

VariantDescription
FixedGateFixed duration of length_gates theta cycles
FieldDuration adapts to consonance field with sigmoid mapping

Field parameters:

  • hold_min_theta, hold_max_theta: Duration range in theta cycles
  • curve_k, curve_x0: Sigmoid shape parameters
  • drop_gain: Amplitude reduction for short notes

4.5.3 SubThetaModulation

Amplitude modulation within the theta cycle:

VariantDescription
NoneNo sub-theta modulation
CosineCosine modulation with n (harmonic number), depth, phase0

4.5.4 Social Coupling

The SocialConfig enables agents to respond to the vocalization density of the population:

ParameterDescription
couplingStrength of social influence (0 = independent)
bin_ticksTemporal resolution for density measurement
smoothSmoothing factor for density trace

SocialDensityTrace (life/social_density.rs): Tracks the recent onset density of the population, allowing agents to synchronize or avoid crowded moments.

4.6 Lifecycle and Metabolism

Agents are governed by energy dynamics modeled on biological metabolism. The LifecycleConfig (life/lifecycle.rs) defines two modes:

4.6.1 Decay Mode

Models transient sounds (plucks, percussion):

ParameterDefaultDescription
initial_energy1.0Starting energy pool
half_life_secExponential decay half-life
attack_sec0.01Attack ramp duration

Energy evolves as: $E(t) = E_0 \cdot e^{\lambda t}$ where $\lambda = \ln(0.5) / t_{1/2}$

4.6.2 Sustain Mode

Models sustained sounds with metabolic feedback:

ParameterDefaultDescription
initial_energy1.0Starting energy pool
metabolism_rateEnergy drain per second
recharge_rate0.5Energy gain rate (consonance-dependent)
action_cost0.02Energy cost per vocalization
envelopeADSR config (attack_sec, decay_sec, sustain_level)

Breath Gain Feedback: The breath_gain parameter (set at spawn via breath_gain_init) determines how much consonance contributes to energy recovery. An agent in a dissonant region "starves" while one in a consonant region "feeds."

This creates Darwinian pressure: Survival of the Consonant. Musical structure emerges because only agents that find harmonic relationships survive to be heard.

4.7 Pitch Retargeting and the Hop Policy

Agents move through frequency space to improve fitness. The execution flow:

  1. Retarget Gate: The Individual integrates time and fires when a theta zero-crossing aligns with the integration window (frequency-dependent: $w = 2 + 10/f$).

  2. Pitch Proposal: PitchCore evaluates candidates and returns a TargetProposal with target pitch and salience.

  3. Gate Coordination: The ArticulationWrapper manages the hop transition via PlannedPitch:

    • target_pitch_log2: Next intended pitch
    • jump_cents_abs: Distance to target
    • salience: Improvement strength (0–1)

4.7.1 The Hop Policy

Pitch movement uses discrete hops rather than continuous portamento:

  1. Fade-out: Gate closes when jump_cents_abs > 10 (movement threshold)
  2. Snap: When gate < 0.1, pitch updates to target
  3. Fade-in: Gate reopens, new pitch sounds

Ordering: On snap, pitch updates before consonance evaluation, ensuring the Landscape score reflects the actual sounding frequency. These timing-sensitive transitions are guarded by regression tests.

5. Temporal Dynamics: Neural Rhythms

Conchordal eschews the concept of a master clock or metronome. Instead, time is structured by a continuous modulation field inspired by Neural Oscillations (brainwaves). This is the "Time" equivalent of the "Space" landscape.

5.1 The Modulation Bank

The NeuralRhythms struct manages a bank of resonating filters tuned to physiological frequency bands:

  • Delta (0.5–4 Hz): The macroscopic "pulse" of the ecosystem. Agents locked to this band play long, phrase-level notes.
  • Theta (4–8 Hz): The "articulation" rate. Governs syllabic rhythms and medium-speed motifs.
  • Alpha (8–12 Hz): The "texture" rate. Used for tremolo, vibrato, and shimmering effects.
  • Beta (15–30 Hz): The "tension" rate. High-speed flutters associated with dissonance or excitement.

5.2 Vitality and Self-Oscillation

Each band is implemented as a Resonator, a damped harmonic oscillator. A key parameter is vitality.

  • Vitality = 0: The resonator acts as a passive filter. It only rings when excited by an event (e.g., a loud agent spawning) and then decays.
  • Vitality > 0: The resonator has active gain. It can self-oscillate, maintaining a rhythmic cycle even in the absence of input.

This creates a two-way interaction: The global rhythm drives the agents (entrainment), but the agents also drive the global rhythm (excitation). A loud "kick" agent spawning in the Delta band will "ring" the Delta resonator, causing other agents coupled to that band to synchronize.

5.3 Kuramoto Entrainment

The entrain ArticulationCore uses a Kuramoto-style model of coupled oscillators.

$$ \frac{d\theta_i}{dt} = \omega_i + \frac{K}{N} \sum_{j=1}^N \sin(\theta_j - \theta_i) $$

In Conchordal, the "coupling" $K$ is to the global NeuralRhythms rather than directly to every other agent (Mean Field approximation).

  • Sensitivity: Each agent has a sensitivity profile determining which bands (Delta, Theta, etc.) it listens to.
  • Phase Locking: Agents adjust their internal articulation phase to match the phase of the resonator.

This results in emergent synchronization. Agents spawned at random times will gradually align their attacks to the beat of the Delta or Theta bands, creating coherent rhythmic patterns without a central sequencer.

6. System Architecture and Implementation Details

Conchordal is implemented in Rust to satisfy the stringent requirements of real-time audio (latency < 10ms) alongside heavy numerical analysis (NSGT/Convolution). The architecture uses a concurrent, lock-free design pattern.

6.1 Threading Model

The application creates four primary thread contexts:

  1. Audio Thread (Real-Time Priority):

    • Managed by cpal in audio/output.rs.
    • Constraint: Must never block. No Mutexes, no memory allocation.
    • Responsibility: Iterates through the Population, calling render_wave on every active agent, mixing the output, and pushing to the hardware buffer. It reads from a read-only snapshot of the Landscape.
  2. Harmonicity Worker (Background Priority):

    • Defined in core/stream/harmonicity.rs.
    • Responsibility: Receives spectral data (log2 amplitude spectrum) and computes the Harmonicity field using the Sibling Projection algorithm.
    • Update Cycle: When analysis is complete, it sends the updated Harmonicity data back to the main loop via a lock-free SPSC channel.
  3. Roughness Worker (Background Priority):

    • Defined in core/stream/roughness.rs.
    • Responsibility: Receives audio chunks and computes the Roughness field via ERB-domain convolution.
    • Update Cycle: Sends updated Roughness data back to the main loop via a separate SPSC channel.
  4. App/GUI Thread (Main):

    • Runs the egui visualizer and the Rhai scripting engine.
    • Responsibility: Handles user input, visualizing the Landscape (ui/plots.rs), and executing the Scenario script. It sends command actions (e.g., SpawnAgent, SetGlobalParameter) to the Population.
    • DorsalStream: Rhythm analysis (core/stream/dorsal.rs) runs synchronously within the main loop, processing audio chunks to extract rhythmic energy metrics (e_low, e_mid, e_high, flux) for the NeuralRhythms modulation bank.

6.2 Data Flow and Double Buffering

To maintain data consistency without locking the audio thread, Conchordal uses a multi-channel update strategy for the Landscape.

  1. The Harmonicity Worker builds the Harmonicity field from the log2 spectrum in the background.
  2. The Roughness Worker builds the Roughness field from audio chunks in the background.
  3. The Main Loop receives updates from both workers via separate SPSC channels and merges them into the LandscapeFrame.
  4. The Population holds the "current" Landscape. When new data arrives from either worker, the main loop updates the corresponding field and recomputes the combined Consonance.
  5. The DorsalStream processes audio synchronously to update rhythm metrics, which are stored in landscape.rhythm.

This decoupled architecture ensures that the audio thread always sees a consistent snapshot of the physics, even if the analysis workers lag slightly behind real-time. Each worker can operate at its own pace without blocking the others.

6.3 The Conductor: Scripting with Rhai

The Conductor module acts as the interface between the human artist and the ecosystem. It embeds the Rhai scripting language, exposing a high-level API for controlling the simulation.

The ScriptHost struct maps internal Rust functions to Rhai commands:

  • spawn_agents(tag, method, life, count, amp): Maps to Action::SpawnAgents. Allows defining probabilistic spawn clouds (e.g., "Spawn 5 agents in the 200-400Hz range using Harmonicity density").
  • add_agent(tag, freq, amp, life): Maps to Action::AddAgent. Spawns a single agent at a specific frequency.
  • set_harmonicity(map): Maps to Action::SetHarmonicity. Allows real-time modulation of physics parameters like mirror_weight and limit.
  • set_roughness_tolerance(value): Adjusts the Roughness penalty weight in the Consonance calculation.
  • set_rhythm_vitality(value): Controls the self-oscillation energy of the DorsalStream rhythm section.
  • wait(seconds): A non-blocking wait that yields control back to the event loop, allowing the script to govern the timeline.
  • scene(name): Marks the beginning of a named scene for visualization and debugging.
  • remove(target): Removes agents matching the target pattern (supports wildcards like "kick_*").

Scenario Parsing: Scenarios are loaded from .rhai files. This separation allows users to compose the "Macro-Structure" (the narrative arc, the changing laws of physics) while the "Micro-Structure" (the specific notes and rhythms) emerges from the agents' adaptation to those changes.

7. Case Studies: Analysis of Emergent Behavior

The following examples, derived from the samples/ directory, illustrate how specific parameter configurations lead to complex musical behaviors.

7.1 Case Study: Self-Organizing Rhythm (samples/02_mechanisms/rhythmic_sync.rhai)

This script demonstrates the emergent quantization of time.

  1. Phase 1 (The Seed): A single, high-energy agent "Kick" is spawned at 60 Hz. Its periodic articulation excites the Delta band resonator in the NeuralRhythms.
  2. Phase 2 (The Swarm): A cloud of agents is spawned with random phases.
  3. Emergence: Because the agents use entrain ArticulationCores coupled to the Delta band, they sense the rhythm established by the Kick. Over a period of seconds, their phases drift and lock into alignment with the Kick. The result is a synchronized pulse that was not explicitly programmed into the swarm—it arose from the physics of the coupled oscillators.

7.2 Case Study: Mirror Dualism (samples/04_ecosystems/mirror_dualism.rhai)

This script explores the structural role of the mirror_weight parameter.

  1. Setup: An anchor drone is established at C4 (261.63 Hz).
  2. State A (Major): set_harmonicity(#{ mirror: 0.0 }). The system uses the Common Root projection (Overtone Series). Agents seeking consonance cluster around E4 and G4, forming a C Major triad.
  3. State B (Minor): set_harmonicity(#{ mirror: 1.0 }). The system switches to Common Overtone projection (Undertone Series). The "gravity" of the landscape inverts. Agents now find stability at Ab3 and F3 (intervals of a minor sixth and perfect fourth relative to C), creating a Phrygian/Minor texture. This demonstrates that "Tonality" in Conchordal is a manipulable environmental variable, akin to temperature or gravity.

7.3 Case Study: Drift and Flow (samples/04_ecosystems/drift_flow.rhai)

This script validates the hop-based movement logic.

  1. Action: A strongly dissonant agent (C#3) is placed next to a strong anchor (C2).
  2. Observation: The C#3 agent makes discrete hops in pitch. It is "pulled" by the Harmonicity field, fading out and snapping to a nearby harmonic "well" (likely E3 or G3).
  3. Dynamics: If per-agent boredom is enabled, the agent will settle at E3 for a few seconds, then "get bored" (local consonance drops due to perceptual adaptation), and hop away again to find a new stable interval. This results in an endless, non-repeating melody generated by simple physical rules of attraction and repulsion.

8. Conclusion

Conchordal successfully establishes a proof-of-concept for Bio-Mimetic Computational Audio. By replacing the rigid abstractions of music theory (notes, grids, BPM) with continuous physiological models (Log2Space, ERB bands, neural oscillation), it creates a system where music is not constructed, but grown.

The technical architecture—anchored by the Log2Space coordinate system and the "Sibling Projection" algorithm—provides a robust mathematical foundation for this new paradigm. The use of Rust ensures that these complex biological simulations can run in real-time, bridging the gap between ALife research and performative musical instruments.

Future development of Conchordal will focus on spatialization (extending the landscape to 3D space) and evolutionary genetics (allowing successful agents to pass on their TimbreGenotype), further deepening the analogy between sound and life.

Appendix A: Key System Parameters

A.1 Core Analysis Parameters

ParameterModuleUnitDescription
bins_per_octLog2SpaceIntResolution of the frequency grid (typ. 48-96).
sigma_centsHarmonicityCentsWidth of harmonic peaks. Lower = stricter intonation.
mirror_weightHarmonicity0.0-1.0Balance between Overtone (Major) and Undertone (Minor) gravity.
roughness_kRoughnessFloatSaturation parameter for roughness mapping. Default: $\approx 0.4286$.
roughness_weightLandscapeFloatWeight of roughness penalty in consonance calculation. Default: 1.0.
vitalityDorsalStream0.0-1.0Self-oscillation energy of the rhythm section.

A.2 Individual / Life Engine Parameters

ParameterModuleDefaultDescription
persistencePitchCore0.5Resistance to movement when satisfied.
explorationPitchCore0.0Random exploration probability.
neighbor_step_centsPitchCore200Step size for pitch search.
tessitura_gravityPitchCore0.1Penalty for distance from initial pitch.
tau_fastPerceptualContext0.5 sBoredom decay time constant.
tau_slowPerceptualContext20.0 sFamiliarity decay time constant.
w_boredomPerceptualContext1.0Weight of boredom penalty.
w_familiarityPerceptualContext0.2Weight of familiarity bonus.
rho_selfPerceptualContext0.15Self-injection ratio for perceptual update.
boredom_gammaPerceptualContext0.5Curvature exponent for boredom.

A.3 Timbre Parameters (TimbreGenotype)

ParameterDefaultDescription
modeHarmonicHarmonic (integer multiples) or Metallic (non-integer).
stiffness0.0Inharmonicity coefficient.
brightness0.6Spectral slope (higher = brighter).
comb0.0Even harmonic attenuation.
damping0.5Energy-dependent high-frequency decay.
vibrato_rate5.0 HzVibrato LFO frequency.
vibrato_depth0.0Vibrato extent.
jitter0.0Pink noise FM strength.
unison0.0Detune amount for chorus effect.

A.4 Phonation Parameters

ParameterModuleDescription
interval.ratePhonationEngineOnset accumulation rate.
interval.refractoryPhonationEngineMinimum gates between onsets.
connect.length_gatesPhonationEngineFixed note duration in theta cycles.
clockPhonationEngineThetaGate or Composite clock source.
social.couplingPhonationEngineStrength of social density influence.

Appendix B: Mathematical Summary

Consonance Fitness Function:

$$ C_{signed} = \text{clip}(H_{01} - w_r \cdot R_{01},; -1,; 1) $$

$$ C_{01} = \frac{C_{signed} + 1}{2} $$

Roughness Saturation Mapping (from reference-normalized ratio $x$ to $R_{01} \in [0,1]$):

$$ R_{01}(x; k) = \begin{cases} 0 & \text{if } x \leq 0 \ x \cdot \frac{1}{1+k} & \text{if } 0 < x < 1 \ 1 - \frac{k}{x+k} & \text{if } x \geq 1 \end{cases} $$

where $k$ is roughness_k (default $\approx 0.4286$). The function is continuous at $x=1$ and saturates to 1 as $x \to \infty$.

Harmonicity Projection (Sibling Algorithm): $$ H[i] = (1-\alpha)\sum_m \left( \sum_k A[i+\log_2(k)] \right)[i-\log_2(m)] + \alpha \sum_m \left( \sum_k A[i-\log_2(k)] \right)[i+\log_2(m)] $$

Roughness Convolution: $$ R_{shape}(z) = \int A(\tau) \cdot K_{plomp}(|z-\tau|_{ERB}) d\tau $$