Technical Note: The Physics of Conchordal

source commit: 50e5ba4

author: Koichi Takahashi

last updated: 2026-02-23

source version: 0.2.0

source snapshot: 2026-02-23T16:35:20+09:00

auto-generated by: claude-opus-4-6

1. Introduction: The Bio-Acoustic Paradigm

Conchordal represents a fundamental divergence from established norms in generative music and computational audio. Where traditional systems rely on symbolic manipulation—operating on grids of quantized pitch (MIDI, Equal Temperament) and discretized time (BPM, measures)—Conchordal functions as a continuous, biologically grounded simulation of auditory perception. It posits that musical structure is not an artifact of abstract composition but an emergent property of acoustic survival.

This technical note serves as an exhaustive reference for the system's architecture, signal processing algorithms, and artificial life strategies. It details how Conchordal synthesizes the principles of psychoacoustics—specifically critical band theory, virtual pitch perception, and neural entrainment—with the dynamics of an autonomous ecosystem. In this environment, sound is treated as a living organism, an "Individual" possessing metabolism, sensory processing capabilities, and the autonomy to navigate a hostile spectral terrain.

The emergent behavior of the system is driven by a unified fitness function: the pursuit of Consonance. Agents within the Conchordal ecosystem do not follow a pre-written score. Instead, they continuously analyze their environment to maximize their "Spectral Comfort"—defined as the minimization of sensory roughness—and their "Harmonic Stability," or the maximization of virtual root strength. The result is a self-organizing soundscape where harmony, rhythm, and timbre evolve organically through the interactions of physical laws rather than deterministic sequencing.

This document explores the three foundational pillars of the Conchordal architecture:

The Psychoacoustic Coordinate System: The mathematical framework of Log2Space and ERB scales that replaces linear Hertz and integer MIDI notes.
The Cognitive Landscape: The real-time DSP pipeline that computes Roughness ($R$) and Harmonicity ($H$) fields from the raw audio stream.
The Life Engine: The agent-based model governing the metabolism, movement, and neural entrainment of the audio entities.

2. The Psychoacoustic Coordinate System

A critical innovation in Conchordal is the rejection of the linear frequency scale ($f$) for internal processing. Human auditory perception is inherently logarithmic; our perception of pitch interval is based on frequency ratios rather than differences. To model this accurately and efficiently, Conchordal establishes a custom coordinate system, Log2Space, which aligns the computational grid with the tonotopic map of the cochlea.

2.1 The Log2 Space Foundation

The Log2Space struct serves as the backbone for all spectral analysis, kernel convolution, and agent positioning within the system. It maps the physical frequency domain ($f$ in Hz) to a perceptual logarithmic domain ($l$).

2.1.1 Mathematical Definition

The transformation from Hertz to the internal log-coordinate is defined as the base-2 logarithm of the frequency. This choice is deliberate: in base-2, an increment of 1.0 corresponds exactly to an octave, the most fundamental interval in pitch perception.

$$ l(f) = \log_2(f) $$

The inverse transformation, used to derive synthesis parameters for the audio thread, is:

$$ f(l) = 2^l $$

The coordinate space is discretized into a grid defined by a resolution parameter, bins_per_oct ($B$). This parameter determines the granularity of the simulation. A typical value of $B=48$ or $B=96$ provides sub-semitone resolution sufficient for continuous pitch gliding and microtonal inflection. The step size $\Delta l$ is constant across the entire spectral range:

$$ \Delta l = \frac{1}{B} $$

2.1.2 Grid Construction and Indexing

The Log2Space structure pre-calculates the center frequencies for all bins spanning the configured range $[f_{min}, f_{max}]$. The number of bins $N$ is determined to ensure complete coverage:

$$ N = \lfloor \frac{\log_2(f_{max}) - \log_2(f_{min})}{\Delta l} \rfloor + 1 $$

The system maintains two parallel vectors for $O(1)$ access during DSP operations:

centers_log2: The logarithmic coordinates $l_i = \log_2(f_{min}) + i \cdot \Delta l$.
centers_hz: The pre-computed linear frequencies $f_i = 2^{l_i}$.

This pre-computation is vital for real-time performance, removing the need for costly log2 and pow calls inside the inner loops of the spectral kernels. The method index_of_freq(hz) provides the quantization logic, mapping an arbitrary float frequency to the nearest bin index.

2.2 Constant-Q Bandwidth Characteristics

The Log2Space inherently enforces a Constant-Q (Constant Quality Factor) characteristic across the spectrum. In signal processing terms, $Q$ is defined as the ratio of the center frequency to the bandwidth: $Q = f / \Delta f$.

In a linear system (like a standard FFT), $\Delta f$ is constant, meaning $Q$ increases with frequency. In Log2Space, the bandwidth $\Delta f_i$ of the $i$-th bin scales proportionally with the center frequency $f_i$. This property mimics the frequency selectivity of the human auditory system, where the ear's ability to resolve frequencies diminishes (in absolute Hz terms) as frequency increases. This alignment allows Conchordal to allocate computational resources efficiently—using high temporal resolution at high frequencies and high spectral resolution at low frequencies—without manual multirate processing.

2.3 The Equivalent Rectangular Bandwidth (ERB) Scale

While Log2Space handles pitch relationships (octaves, harmonics), it does not perfectly model the critical bands of the ear, which are wider at low frequencies than a pure logarithmic mapping suggests. To accurately calculate sensory roughness (dissonance), Conchordal implements the Equivalent Rectangular Bandwidth (ERB) scale based on the Glasberg & Moore (1990) model.

The core/erb.rs module provides the transformation functions used by the Roughness Kernel. The conversion from frequency $f$ (Hz) to ERB-rate units $E$ is given by:

$$ E(f) = 21.4 \log_{10}(0.00437f + 1) $$

The bandwidth of a critical band at frequency $f$ is:

$$ BW_{ERB}(f) = 24.7(0.00437f + 1) $$

This scale is distinct from Log2Space. While Log2Space is the domain for pitch and harmonicity (where relationships are octave-invariant), the roughness calculation requires mapping spectral energy into the ERB domain to evaluate interference. The system effectively maintains a dual-view of the spectrum: one strictly logarithmic for harmonic templates, and one psychoacoustic for dissonance evaluation.

3. The Auditory Landscape: Analyzing the Environment

The "Landscape" is the central data structure in Conchordal. It acts as the shared environment for all agents, a dynamic scalar field representing the psychoacoustic "potential" of every frequency bin. Agents do not interact directly with each other; they interact with the Landscape, which aggregates the spectral energy of the entire population. This decouples the complexity of the simulation from the number of agents ($O(N)$ vs $O(N^2)$).

The Landscape is updated every audio frame (or block) by the Analysis Worker. It synthesizes two primary metrics:

Roughness ($R$): The sensory dissonance caused by rapid beating between proximal partials.
Harmonicity ($H$): The measure of virtual pitch strength and spectral periodicity.

Both metrics are normalized to the $[0, 1]$ range. Consonance is then derived in two layers: a Consonance Kernel that fuses the observables into a single fitness score, and a set of Representation transforms that reshape that score for different downstream consumers.

Layer 1 — Consonance Kernel (bilinear family):

$$ C_{score} = a \cdot H_{01} + b \cdot R_{01} + c \cdot H_{01} R_{01} + d $$

Default coefficients: $a = 1.0$, $b = -1.35$, $c = 1.0$, $d = 0.0$. Because $b < 0$, roughness acts as a penalty; because $c > 0$, high harmonicity attenuates that penalty (the interaction term $c \cdot H_{01} R_{01}$ partially cancels $b \cdot R_{01}$ when $H_{01}$ is large). The bilinear family subsumes the earlier $\alpha H - wR$ formulation as the special case $c = 0$.

Layer 2 — Representations:

Name	Formula	Range	Meaning
$C_{score}$	$aH + bR + cHR + d$	$(-\infty,+\infty)$	raw fitness from the kernel
$C_{level01}$	$\sigma(\beta(C_{score} - \theta))$	$[0,1]$	metabolism gate (sigmoid)
$C_{density_mass}$	$\max(0,;H_{01}(1 - \rho R_{01}))$	$[0,+\infty)$	raw density mass ($\rho$-kernel)
$C_{density_pmf}$	$\text{normalize}(C_{density_mass})$	$[0,1],;\Sigma=1$	pitch-selection PMF
$C_{energy}$	$-C_{score}$	$(-\infty,+\infty)$	energy for minimization

where $\sigma(x) = 1/(1+e^{-x})$, $\beta$ controls sigmoid steepness (default 2.0), and $\theta$ is the sigmoid threshold (default 0.0). The density mass uses a separate $\rho$-kernel with coefficients $a{=}1, b{=}0, c{=}{-}\rho, d{=}0$, so that $C_{density_mass} = H_{01}(1 - \rho R_{01})$ clamped to $\geq 0$; the parameter $\rho$ (consonance_density_roughness_gain, default 1.0) controls how strongly roughness suppresses spawn probability.

Individual agents maintain their own perceptual context (PerceptualContext) which tracks per-agent boredom and familiarity, providing additional score adjustments during pitch selection.

3.1 Non-Stationary Gabor Transform (NSGT)

To populate the Log2Space with spectral data, Conchordal uses a custom implementation of the Non-Stationary Gabor Transform (NSGT). Unlike the Short-Time Fourier Transform (STFT), which uses a fixed window size, the NSGT varies the window length $L$ inversely with frequency to maintain the Constant-Q property derived in Section 2.2.

3.1.1 Kernel-Based Spectral Analysis

The implementation in core/nsgt_kernel.rs utilizes a sparse kernel approach to perform this transform efficiently. For each log-frequency band $k$, a time-domain kernel $h_k$ is precomputed. This kernel combines a complex sinusoid at the band's center frequency $f_k$ with a periodic Hann window $w_k$ of length $L_k \approx Q \cdot f_s / f_k$.

$$ h_k[n] = w_k[n] \cdot e^{-j 2\pi f_k n / f_s} $$

These kernels are transformed into the frequency domain ($K_k[\nu]$) during initialization. To optimize performance, the system sparsifies these frequency kernels, storing only the bins with significant energy.

During runtime, the system performs a single FFT on the input audio buffer to obtain the spectrum $X[\nu]$. The complex coefficient $C_k$ for band $k$ is then computed via the inner product in the frequency domain:

$$ C_k = \frac{1}{N_{fft}} \sum_{\nu} X[\nu] \cdot K_k^*[\nu] $$

This "one FFT, many kernels" approach allows Conchordal to generate a high-resolution, logarithmically spaced spectrum covering 20Hz to 20kHz without the computational overhead of calculating separate DFTs for each band or using recursive filter banks.

3.1.2 Real-Time Temporal Smoothing

The raw spectral coefficients $C_k$ exhibit high variance due to the stochastic nature of the audio input (especially with noise-based agents). To create a stable field for agents to sample, the RtNsgtKernelLog2 struct wraps the NSGT with a temporal smoothing layer.

It implements a per-band leaky integrator (exponential smoothing). Crucially, the time constant $\tau$ is frequency-dependent. Low frequencies, which evolve slowly, are smoothed with a longer $\tau$, while high frequencies, which carry transient details, have a shorter $\tau$.

$$ y_k[t] = (1 - \alpha_k) \cdot |C_k[t]| + \alpha_k \cdot y_k[t-1] $$

where the smoothing factor $\alpha_k$ is derived from the frame interval $\Delta t$:

$$ \alpha_k = e^{-\Delta t / \tau(f_k)} $$

This models the "integration time" of the ear, ensuring that the Landscape reflects a psychoacoustic percept rather than instantaneous signal power.

3.2 Roughness ($R$) Calculation: The Plomp-Levelt Model

Roughness is the sensation of "harshness" or "buzzing" caused by the interference of spectral components that fall within the same critical band but are not sufficiently close to be perceived as a single tone (beating). Conchordal implements a variation of the Plomp-Levelt model via convolution in the ERB domain.

3.2.1 The Interference Kernel

The core of the calculation is the Roughness Kernel, defined in core/roughness_kernel.rs. This kernel $K_{rough}(\Delta z)$ models the interference curve between two partials separated by $\Delta z$ ERB. The curve creates a penalty that rises rapidly as partials separate, peaks at approximately 0.25 ERB (maximum roughness), and then decays as they separate further.

The implementation uses a parameterized function eval_kernel_delta_erb to generate this shape:

$$ g(\Delta z) = e^{-\frac{\Delta z^2}{2\sigma^2}} \cdot (1 - e^{-(\frac{\Delta z}{\sigma_{suppress}})^p}) $$

The second term is a suppression factor that ensures the kernel goes to zero as $\Delta z \to 0$, preventing a single pure tone from generating self-roughness.

3.2.2 Convolutional Approach

Calculating roughness pairwise for all spectral bins ($N^2$ complexity) is computationally prohibitive for real-time applications. Conchordal solves this by treating the Roughness calculation as a linear convolution.

Mapping: The log-spaced amplitude spectrum from the NSGT is mapped (or interpolated) onto a linear ERB grid.
Convolution: This density $A(z)$ is convolved with the pre-calculated roughness kernel $K_{rough}$.

$$ R_{shape}(z) = (A * K_{rough})(z) = \int A(z-\tau) K_{rough}(\tau) d\tau $$

The result $R_{shape}(z)$ represents the raw "Roughness Shape" at frequency $z$. To convert this to a normalized fitness signal, Conchordal applies a physiological saturation mapping.

3.2.3 Physiological Saturation Mapping

Raw roughness values from the convolution have unbounded range. Rather than hard-clamping, Conchordal uses a saturation curve that models the compressive nonlinearity of auditory perception. This mapping converts reference-normalized roughness ratios to the $[0, 1]$ range.

Reference Normalization: The system maintains reference values $r_{ref,peak}$ and $r_{ref,total}$ representing "typical" roughness levels. The reference-normalized ratios are:

$$ x_{peak}(u) = \frac{R_{shape}(u)}{r_{ref,peak}} $$

$$ x_{total} = \frac{R_{shape,total}}{r_{ref,total}} $$

The Saturation Parameter: The parameter roughness_k ($k > 0$) controls the saturation curve's shoulder. The reference ratio $x = 1$ maps to:

$$ R_{ref} = \frac{1}{1+k} $$

Larger $k$ reduces $R_{01}$ for the same input ratio, making the system more tolerant of roughness.

Piecewise Saturation Mapping: The normalized roughness $R_{01}$ is computed from the reference-normalized ratio $x$ as:

$$ R_{01}(x; k) = \begin{cases} 0 & \text{if } x \leq 0 \ x \cdot \frac{1}{1+k} & \text{if } 0 < x < 1 \ 1 - \frac{k}{x+k} & \text{if } x \geq 1 \end{cases} $$

This function is continuous at $x = 1$ (both branches yield $\frac{1}{1+k}$) and saturates asymptotically to 1 as $x \to \infty$. The piecewise structure ensures linear response for low roughness (preserving sensitivity) while compressing extreme values (preventing saturation).

Numerical Safety: The implementation handles edge cases robustly:

$x = \text{NaN} \to 0$
$x = +\infty \to 1$
$x = -\infty \to 0$
Non-finite $k$ is treated as $10^{-6}$

Agents seeking consonance actively avoid peaks in the $R_{01}$ field.

3.3 Harmonicity ($H$): The Sibling Projection Algorithm

While Roughness drives agents away from dissonance (segregation), Harmonicity ($H$) drives them toward fusion—the creation of coherent chords and timbres. Conchordal introduces a novel algorithm termed "Sibling Projection" to compute this field. This algorithm approximates the brain's mechanism of "Common Root" detection (Virtual Pitch) entirely in the frequency domain.

3.3.1 Concept: Virtual Roots

The algorithm posits that any spectral peak at frequency $f$ implies the potential existence of a fundamental frequency (root) at its subharmonics ($f/2, f/3, f/4 \dots$). If multiple spectral peaks share a common subharmonic, that subharmonic represents a strong "Virtual Root".

3.3.2 The Two-Pass Projection

The algorithm operates on the Log2Space spectrum in two passes, utilizing the integer properties of the logarithmic grid:

Downward Projection (Root Search): The current spectral envelope is "smeared" downward. For every bin $i$ with energy, the algorithm adds energy to bins $i - \log_2(k)$ for integers $k \in {1, 2, \dots, N}$.

$$ Roots[i] = \sum_k A[i + \log_2(k)] \cdot w_k $$

Here, $w_k$ is a weighting factor that decays with harmonic index $k$ (e.g., $k^{-\rho}$), reflecting that lower harmonics imply their roots more strongly than higher ones. The result Roots describes the strength of the virtual pitch at every frequency.
Upward Projection (Harmonic Resonance): The system then projects the Roots spectrum back upwards. If a strong root exists at $f_r$, it implies stability for all its natural harmonics ($f_r, 2f_r, 3f_r \dots$).

$$ H[i] = \sum_m Roots[i - \log_2(m)] \cdot w_m $$

Emergent Tonal Stability: Consider an environment with a single tone at 200 Hz.

Step 1 (Down): It projects roots at 100 Hz ($f/2$), 66.6 Hz ($f/3$), 50 Hz ($f/4$), etc.
Step 2 (Up): The 100 Hz root projects stability to 100, 200, 300, 400, 500... Hz.
- 300 Hz is the Perfect 5th of the 100 Hz root.
- 500 Hz is the Major 3rd of the 100 Hz root.

Thus, without any hardcoded knowledge of Western music theory, the system naturally generates stability peaks at the Major 3rd and Perfect 5th relationships, simply as a consequence of the physics of the harmonic series. An agent at 200 Hz creates a "gravity well" at 300 Hz and 500 Hz, inviting other agents to form a major triad.

3.3.3 Mirror Dualism: Overtone vs. Undertone

The implementation in core/harmonicity_kernel.rs includes a profound parameter: mirror_weight ($\alpha$). This parameter blends two distinct projection paths:

Path A (Overtone/Major): The standard "Down-then-Up" projection described above. It creates gravity based on the Overtone Series, favoring Major tonalities.
Path B (Undertone/Minor): An inverted "Up-then-Down" projection. It finds common overtones and projects undertones. This is the theoretical dual of Path A and favors Minor or Phrygian tonalities (the Undertone Series).

$$ H_{final} = (1-\alpha)H_{overtone} + \alpha H_{undertone} $$

By modulating mirror_weight, a user can continuously morph the fundamental physics of the universe from Major-centric to Minor-centric, observing how the ecosystem reorganizes itself in response.

4. The Life Engine: Agents and Autonomy

The "Life Engine" is the agent-based simulation layer that runs atop the DSP landscape. It manages the population of "Individuals," handling their lifecycle, sensory processing, and actuation (audio synthesis).

4.1 The Individual Architecture

The Individual struct (life/individual.rs) is the atomic unit of the ecosystem. It is composed of several components: an AnySoundBody actuator, an ArticulationWrapper (wrapping an ArticulationCore), a PitchController (wrapping a PitchCore), and a PhonationEngine that manages note-level timing. The Individual itself acts as an integration layer, managing lifecycle (metabolism, energy), perceptual context, and the control-plane signals that coordinate the components without coupling them directly.

4.1.1 The SoundBody (Actuator)

The SoundBody trait defines the sound generation capabilities of an agent. It is responsible for rendering the waveform and projecting its spectral footprint back to the system (for the Landscape update).

SineBody: Synthesizes a pure sine tone.
HarmonicBody: Synthesizes a complex tone consisting of a fundamental and a series of partials. This body introduces the concept of a TimbreGenotype, which encodes parameters such as:
- stiffness: The inharmonicity coefficient (stretching the partial series).
- brightness: The spectral slope (decay of higher partials).
- comb: Even harmonic attenuation.
- damping: Frequency-dependent decay rates.
- vibrato_rate / vibrato_depth: LFO-based pitch modulation.
- jitter: 1/f pink noise FM strength for organic fluctuation.
- unison: Detuned copy amount for chorus-like thickening.
- mode: Harmonic (integer multiples) vs. Metallic (non-integer ratios).

The HarmonicBody allows for the evolution of timbre. An agent with high stiffness might find survival difficult in a purely harmonic landscape, forcing it to seek out unique "spectral niches" where its inharmonic partials do not clash with the population.

4.1.2 The Core Stack (Articulation, Pitch)

Behavior is split into focused cores, each defined in a separate file to allow easy extension with new strategies:

ArticulationCore (When/Gate) — life/articulation_core.rs: Manages rhythm, gating, and envelope dynamics. Variants include KuramotoCore (Kuramoto-like synchronization to NeuralRhythms), SequencedCore (fixed-duration envelopes), and DroneCore (slow sway). The ArticulationCore receives control-plane signals from the Individual and decides when to open or close the gate.
PitchCore (Where) — life/pitch_core.rs: Proposes the next target in log-frequency space based on consonance, distance penalties, tessitura gravity, and per-agent perceptual adjustments. Two implementations exist:
- PitchHillClimbPitchCore: Evaluates a discrete set of candidates around the current target, scoring each with consonance minus penalties (distance, tessitura gravity, persistence bias, and perceptual adjustments from PerceptualContext).
- PitchPeakSamplerCore: Samples from consonance peaks in the landscape, offering a more exploratory strategy.
The PitchController wraps the PitchCore with retargeting logic and integration window management.

4.1.3 Control-Plane Signals: Planned and Error

The Individual coordinates its cores through two orthogonal signals rather than direct coupling:

Planned: The PitchCore proposes a target (TargetProposal), and the Individual maintains the "planned" state—next target frequency, expected jump distance, and salience. This represents the agent's intention.
Error: The Individual computes the discrepancy between the SoundBody's current pitch and the planned target (signed cents, absolute cents). This represents the result of prior actions and is available for observation or future extensions (e.g., adaptive articulation). Importantly, the PitchCore does not read the error signal—search remains decoupled from feedback.

This separation keeps each core focused: PitchCore explores the landscape, ArticulationCore shapes the envelope, and the Individual orchestrates timing and state transitions.

4.2 Lifecycle and Metabolism

Agents in Conchordal are governed by energy dynamics modeled on biological metabolism. The LifecycleConfig defines two modes of existence:

Decay: The agent is born with a fixed initial_energy pool. It expends this energy over time (half-life) and dies when it reaches zero. This models transient sounds like plucks or percussion.
Sustain: The agent has a metabolism_rate (energy loss per second) and can gain energy via consonance-dependent recharge.
- Recharge: This is the critical feedback loop. The energy gained per phonation attack is scaled by the agent's $C_{level01}$ (the sigmoid-mapped consonance) via the MetabolismPolicy.
- Survival: An agent in a dissonant (low $C_{level01}$) region "starves"—its energy depletes, its amplitude fades, and it eventually dies. An agent in a consonant (high $C_{level01}$) region "feeds"—it maintains or gains energy, allowing it to sing louder and live longer.

This mechanic creates a Darwinian pressure: Survival of the Consonant. The musical structure emerges because only the agents that find harmonic relationships survive to be heard.

4.3 Pitch Retargeting Logic

Agents are not static; they move through frequency space to improve their fitness. The execution layer applies a retarget gate (theta zero-crossing plus an integration window) and then asks the PitchCore to propose the next target.

Retarget Gate: The Individual integrates time based on current frequency and fires only when a theta crossing aligns with the window. This keeps retargeting rhythmic and scale-sensitive.
Pitch Proposal: The PitchCore (e.g., PitchHillClimbPitchCore) evaluates a discrete set of candidates around the current target. It scores each candidate with consonance minus penalties (distance, tessitura gravity, persistence bias, and per-agent perceptual adjustments from PerceptualContext). The proposal includes a salience score (0..1) reflecting improvement strength.

4.3.1 The Hop Policy

Pitch movement uses a hop policy rather than continuous portamento:

Fade-out: The ArticulationCore closes the gate, fading amplitude to silence.
Snap: The Individual updates the SoundBody's pitch to the new target (discrete jump).
Fade-in: The gate reopens, and the new pitch sounds.

Ordering matters: On the sample where the snap occurs, the pitch is updated before consonance is evaluated, ensuring the Landscape score reflects the agent's actual sounding frequency. The error signal is computed from the pre-snap current pitch to maintain consistency; if post-snap error is needed in the future, it can be added as a separate signal.

These timing-sensitive transitions are guarded by regression tests to prevent subtle breakage.

5. Temporal Dynamics: Neural Rhythms

Conchordal eschews the concept of a master clock or metronome. Instead, time is structured by a continuous modulation field inspired by Neural Oscillations (brainwaves). This is the "Time" equivalent of the "Space" landscape.

5.1 The Modulation Bank

The NeuralRhythms struct manages a bank of resonating filters tuned to physiological frequency bands:

Delta (0.5–4 Hz): The macroscopic "pulse" of the ecosystem. Agents locked to this band play long, phrase-level notes.
Theta (4–8 Hz): The "articulation" rate. Governs syllabic rhythms and medium-speed motifs.
Alpha (8–12 Hz): The "texture" rate. Used for tremolo, vibrato, and shimmering effects.
Beta (15–30 Hz): The "tension" rate. High-speed flutters associated with dissonance or excitement.

5.2 Vitality and Self-Oscillation

Each band is implemented as a Resonator, a damped harmonic oscillator. A key parameter is vitality.

Vitality = 0: The resonator acts as a passive filter. It only rings when excited by an event (e.g., a loud agent spawning) and then decays.
Vitality > 0: The resonator has active gain. It can self-oscillate, maintaining a rhythmic cycle even in the absence of input.

This creates a two-way interaction: The global rhythm drives the agents (entrainment), but the agents also drive the global rhythm (excitation). A loud "kick" agent spawning in the Delta band will "ring" the Delta resonator, causing other agents coupled to that band to synchronize.

5.3 Kuramoto Entrainment

The KuramotoCore ArticulationCore uses a Kuramoto-style model of coupled oscillators.

$$ \frac{d\theta_i}{dt} = \omega_i + \frac{K}{N} \sum_{j=1}^N \sin(\theta_j - \theta_i) $$

In Conchordal, the "coupling" $K$ is to the global NeuralRhythms rather than directly to every other agent (Mean Field approximation).

Sensitivity: Each agent has a sensitivity profile determining which bands (Delta, Theta, etc.) it listens to.
Phase Locking: Agents adjust their internal articulation phase to match the phase of the resonator.

This results in emergent synchronization. Agents spawned at random times will gradually align their attacks to the beat of the Delta or Theta bands, creating coherent rhythmic patterns without a central sequencer.

6. System Architecture and Implementation Details

Conchordal is implemented in Rust to satisfy the stringent requirements of real-time audio (latency < 10ms) alongside heavy numerical analysis (NSGT/Convolution). The architecture uses a concurrent, lock-free design pattern.

6.1 Threading Model

The application creates three primary thread contexts, plus the GUI event loop:

Audio Thread (Real-Time Priority):
- Managed by cpal in audio/output.rs.
- Constraint: Must never block. No Mutexes, no memory allocation.
- Responsibility: Pops mono samples from a lock-free ring buffer and copies them to all output channels. A Limiter (soft-clip or peak-limiter) is applied in-place on the interleaved output.
Analysis Thread (Background Priority):
- Defined in core/analysis_worker.rs, running AnalysisStream from core/stream/analysis.rs.
- Responsibility: Receives audio hops (time-domain chunks), runs the NSGT to produce a log2 power spectrum, then computes both the Harmonicity field (Sibling Projection) and the Roughness field (ERB-domain convolution) in a single pipeline.
- Update Cycle: When analysis is complete, it sends the updated Landscape snapshot back to the worker thread via a bounded SPSC channel.
Worker Thread (Simulation Loop):
- Named "worker" in app.rs.
- Responsibility: Runs the main simulation loop. Each iteration: merges analysis results into the current Landscape, dispatches Conductor events, advances the Population (pitch retargeting, articulation, metabolism), renders audio via ScheduleRenderer, feeds the DorsalStream (rhythm extraction), and pushes mono samples into the ring buffer for the audio thread.
- DorsalStream: Rhythm analysis (core/stream/dorsal.rs) runs synchronously within this loop, processing audio chunks to extract rhythmic energy metrics (e_low, e_mid, e_high, flux) for the NeuralRhythms modulation bank.
App/GUI Thread (Main):
- Runs the eframe/egui visualizer.
- Responsibility: Handles user input, visualizing the Landscape (ui/plots.rs), and displaying simulation metadata. It receives UiFrame snapshots from the worker thread via a bounded channel.

6.2 Data Flow

To maintain data consistency without locking the audio thread, Conchordal uses a multi-channel update strategy for the Landscape.

The Worker Thread renders audio and sends each hop to the Analysis Thread via a bounded channel.
The Analysis Thread runs the full NSGT + Roughness + Harmonicity pipeline and sends the resulting Landscape snapshot back.
The Worker Thread merges the analysis result into the current LandscapeFrame, recomputing the combined Consonance field.
The Population evaluates the current Landscape for pitch selection, metabolism, and agent lifecycle.
The DorsalStream processes audio synchronously within the worker loop to update rhythm metrics, which are stored in landscape.rhythm.
Rendered mono audio is pushed into a lock-free ring buffer consumed by the Audio Thread.

This decoupled architecture ensures that the audio thread always sees a consistent stream of samples, even if the analysis thread lags slightly behind real-time. The analysis thread processes all hops in-order to maintain NSGT time continuity.

6.3 The Conductor: Scripting with Rhai

The Conductor module acts as the interface between the human artist and the ecosystem. It embeds the Rhai scripting language, exposing a high-level API for controlling the simulation.

The ScriptHost struct maps internal Rust functions to Rhai commands:

derive(species): Creates a new species handle from a preset (sine, harmonic, saw, square, noise), allowing method-chaining to configure amp, freq, brain, phonation, timbre, metabolism, adsr, pitch_mode, and pitch_core.
create(species, count): Creates a group of agents from a species handle. Returns a GroupHandle for further configuration.
.place(strategy): Assigns a spawn strategy to a group. Strategies include consonance(root_freq), consonance_density_pmf(min, max), random_log(min, max), and linear(start, end).
wait(seconds): Commits pending groups, then advances the timeline cursor. This is the primary mechanism for shaping temporal structure.
flush(): Commits pending groups without advancing the timeline.
release(group): Marks a group for fade-out release.
scene(name, callback): Marks a named scene boundary. Groups created within the callback are automatically released when the scene ends.
play(callback): Executes a scoped block—groups created inside are released on exit.
parallel([callbacks]): Runs multiple blocks concurrently (timeline branches), advancing the cursor to the latest endpoint.
set_harmonicity_mirror_weight(value): Modulates the mirror_weight parameter in real-time.
set_roughness_k(value): Adjusts the roughness saturation parameter $k$.
set_global_coupling(value): Controls the Kuramoto coupling strength.
seed(value): Sets the random seed for reproducible runs.

Scenario Parsing: Scenarios are loaded from .rhai files. This separation allows users to compose the "Macro-Structure" (the narrative arc, the changing laws of physics) while the "Micro-Structure" (the specific notes and rhythms) emerges from the agents' adaptation to those changes.

7. Case Studies: Analysis of Emergent Behavior

The following examples, derived from the samples/ directory, illustrate how specific parameter configurations lead to complex musical behaviors.

7.1 Case Study: Self-Organizing Rhythm (`samples/02_mechanisms/rhythmic_sync.rhai`)

This script demonstrates the emergent quantization of time.

Phase 1 (The Seed): A single, high-energy agent "Kick" is spawned at 60 Hz. Its periodic articulation excites the Delta band resonator in the NeuralRhythms.
Phase 2 (The Swarm): A cloud of agents is spawned with random phases.
Emergence: Because the agents use KuramotoCore ArticulationCores coupled to the Delta band, they sense the rhythm established by the Kick. Over a period of seconds, their phases drift and lock into alignment with the Kick. The result is a synchronized pulse that was not explicitly programmed into the swarm—it arose from the physics of the coupled oscillators.

7.2 Case Study: Mirror Dualism (`samples/04_ecosystems/mirror_dualism.rhai`)

This script explores the structural role of the mirror_weight parameter.

Setup: An anchor drone is established at C4 (261.63 Hz).
State A (Major): set_harmonicity_mirror_weight(0.0). The system uses the Common Root projection (Overtone Series). Agents seeking consonance cluster around E4 and G4, forming a C Major triad.
State B (Minor): set_harmonicity_mirror_weight(1.0). The system switches to Common Overtone projection (Undertone Series). The "gravity" of the landscape inverts. Agents now find stability at Ab3 and F3 (intervals of a minor sixth and perfect fourth relative to C), creating a Phrygian/Minor texture. This demonstrates that "Tonality" in Conchordal is a manipulable environmental variable, akin to temperature or gravity.

7.3 Case Study: Drift and Flow (`samples/04_ecosystems/drift_flow.rhai`)

This script validates the hop-based movement logic.

Action: A strongly dissonant agent (C#3) is placed next to a strong anchor (C2).
Observation: The C#3 agent makes discrete hops in pitch. It is "pulled" by the Harmonicity field, fading out and snapping to a nearby harmonic "well" (likely E3 or G3).
Dynamics: If per-agent boredom is enabled, the agent will settle at E3 for a few seconds, then "get bored" (local consonance drops due to perceptual adaptation), and hop away again to find a new stable interval. This results in an endless, non-repeating melody generated by simple physical rules of attraction and repulsion.

8. Conclusion

Conchordal successfully establishes a proof-of-concept for Bio-Mimetic Computational Audio. By replacing the rigid abstractions of music theory (notes, grids, BPM) with continuous physiological models (Log2Space, ERB bands, neural oscillation), it creates a system where music is not constructed, but grown.

The technical architecture—anchored by the Log2Space coordinate system and the "Sibling Projection" algorithm—provides a robust mathematical foundation for this new paradigm. The use of Rust ensures that these complex biological simulations can run in real-time, bridging the gap between ALife research and performative musical instruments.

Future development of Conchordal will focus on spatialization (extending the landscape to 3D space) and evolutionary genetics (allowing successful agents to pass on their TimbreGenotype), further deepening the analogy between sound and life.

Appendix A: Key System Parameters

Parameter	Module	Unit	Description
`bins_per_oct`	`Log2Space`	Int	Resolution of the frequency grid (typ. 48-96).
`sigma_cents`	`HarmonicityParams`	Cents	Width of harmonic peaks. Lower = stricter intonation.
`mirror_weight`	`HarmonicityParams`	0.0-1.0	Balance between Overtone (Major) and Undertone (Minor) gravity.
`roughness_k`	`LandscapeParams`	Float	Saturation parameter for roughness mapping. Default: $(1/0.7) - 1 \approx 0.4286$ (so $x=1$ maps to $\approx 0.7$).
`kernel.a`	`ConsonanceKernel`	Float	Harmonicity coefficient (default 1.0).
`kernel.b`	`ConsonanceKernel`	Float	Roughness coefficient (default −1.35; negative penalizes roughness).
`kernel.c`	`ConsonanceKernel`	Float	Interaction coefficient (default 1.0; positive attenuates roughness penalty at high harmonicity).
`kernel.d`	`ConsonanceKernel`	Float	Bias term (default 0.0).
`beta`	`ConsonanceRepresentationParams`	Float	Sigmoid steepness for $C_{level01}$ (default 2.0).
`theta`	`ConsonanceRepresentationParams`	Float	Sigmoid threshold for $C_{level01}$ (default 0.0).
`consonance_density_roughness_gain`	`LandscapeParams`	Float	$\rho$ in density kernel $H(1-\rho R)$ (default 1.0).
`vitality`	`DorsalStream`	0.0-1.0	Self-oscillation energy of the rhythm section.
`persistence`	`PitchHillClimbPitchCore`	0.0-1.0	Resistance to movement/change of an agent (policy bias within pitch selection).

Appendix B: Mathematical Summary

Consonance Kernel (bilinear):

$$ C_{score} = a \cdot H_{01} + b \cdot R_{01} + c \cdot H_{01} R_{01} + d $$

Consonance Level (sigmoid representation):

$$ C_{level01} = \frac{1}{1 + e^{-\beta(C_{score} - \theta)}} $$

Consonance Density Mass ($\rho$-kernel):

$$ C_{density_mass} = \max(0,; H_{01}(1 - \rho R_{01})) $$

Roughness Saturation Mapping (from reference-normalized ratio $x$ to $R_{01} \in [0,1]$):

$$ R_{01}(x; k) = \begin{cases} 0 & \text{if } x \leq 0 \ x \cdot \frac{1}{1+k} & \text{if } 0 < x < 1 \ 1 - \frac{k}{x+k} & \text{if } x \geq 1 \end{cases} $$

where $k$ is roughness_k (default $\approx 0.4286$). The function is continuous at $x=1$ and saturates to 1 as $x \to \infty$.

Harmonicity Projection (Sibling Algorithm): $$ H[i] = (1-\alpha)\sum_m \left( \sum_k A[i+\log_2(k)] \right)[i-\log_2(m)] + \alpha \sum_m \left( \sum_k A[i-\log_2(k)] \right)[i+\log_2(m)] $$

Roughness Convolution: $$ R_{shape}(z) = \int A(\tau) \cdot K_{plomp}(|z-\tau|_{ERB}) d\tau $$

Table of Contents

1. Introduction: The Bio-Acoustic Paradigm

2. The Psychoacoustic Coordinate System

2.1 The Log2 Space Foundation

2.1.1 Mathematical Definition

2.1.2 Grid Construction and Indexing

2.2 Constant-Q Bandwidth Characteristics

2.3 The Equivalent Rectangular Bandwidth (ERB) Scale

3. The Auditory Landscape: Analyzing the Environment

3.1 Non-Stationary Gabor Transform (NSGT)

3.1.1 Kernel-Based Spectral Analysis

3.1.2 Real-Time Temporal Smoothing

3.2 Roughness ($R$) Calculation: The Plomp-Levelt Model

3.2.1 The Interference Kernel

3.2.2 Convolutional Approach

3.2.3 Physiological Saturation Mapping

3.3 Harmonicity ($H$): The Sibling Projection Algorithm

3.3.1 Concept: Virtual Roots

3.3.2 The Two-Pass Projection

3.3.3 Mirror Dualism: Overtone vs. Undertone

4. The Life Engine: Agents and Autonomy

4.1 The Individual Architecture

4.1.1 The SoundBody (Actuator)

4.1.2 The Core Stack (Articulation, Pitch)

4.1.3 Control-Plane Signals: Planned and Error

4.2 Lifecycle and Metabolism

4.3 Pitch Retargeting Logic

4.3.1 The Hop Policy

5. Temporal Dynamics: Neural Rhythms

5.1 The Modulation Bank

5.2 Vitality and Self-Oscillation

5.3 Kuramoto Entrainment

6. System Architecture and Implementation Details

6.1 Threading Model

6.2 Data Flow

6.3 The Conductor: Scripting with Rhai

7. Case Studies: Analysis of Emergent Behavior

7.1 Case Study: Self-Organizing Rhythm (samples/02_mechanisms/rhythmic_sync.rhai)

7.2 Case Study: Mirror Dualism (samples/04_ecosystems/mirror_dualism.rhai)

7.3 Case Study: Drift and Flow (samples/04_ecosystems/drift_flow.rhai)

8. Conclusion

Appendix A: Key System Parameters

Appendix B: Mathematical Summary

7.1 Case Study: Self-Organizing Rhythm (`samples/02_mechanisms/rhythmic_sync.rhai`)

7.2 Case Study: Mirror Dualism (`samples/04_ecosystems/mirror_dualism.rhai`)

7.3 Case Study: Drift and Flow (`samples/04_ecosystems/drift_flow.rhai`)