Table of Contents
- 1. Introduction: The Bio-Acoustic Paradigm
- 2. The Psychoacoustic Coordinate System
- 3. The Auditory Landscape: Analyzing the Environment
- 4. The Life Engine: Agents and Autonomy
- 5. Temporal Dynamics: Neural Rhythms
- 6. System Architecture and Implementation Details
- 7. Case Studies: Analysis of Emergent Behavior
- 8. Conclusion
- Appendix A: Key System Parameters
- Appendix B: Mathematical Summary
1. Introduction: The Bio-Acoustic Paradigm
Conchordal represents a fundamental divergence from established norms in generative music and computational audio. Where traditional systems rely on symbolic manipulation—operating on grids of quantized pitch (MIDI, Equal Temperament) and discretized time (BPM, measures)—Conchordal functions as a continuous, biologically grounded simulation of auditory perception. It posits that musical structure is not an artifact of abstract composition but an emergent property of acoustic survival.
This technical note serves as an exhaustive reference for the system's architecture, signal processing algorithms, and artificial life strategies. It details how Conchordal synthesizes the principles of psychoacoustics—specifically critical band theory, virtual pitch perception, and neural entrainment—with the dynamics of an autonomous ecosystem. In this environment, sound is treated as a living organism, an "Individual" possessing metabolism, sensory processing capabilities, and the autonomy to navigate a hostile spectral terrain.
The emergent behavior of the system is driven by a unified fitness function: the pursuit of Consonance. Agents within the Conchordal ecosystem do not follow a pre-written score. Instead, they continuously analyze their environment to maximize their "Spectral Comfort"—defined as the minimization of sensory roughness—and their "Harmonic Stability," or the maximization of virtual root strength. The result is a self-organizing soundscape where harmony, rhythm, and timbre evolve organically through the interactions of physical laws rather than deterministic sequencing.
This document explores the three foundational pillars of the Conchordal architecture:
- The Psychoacoustic Coordinate System: The mathematical framework of
Log2Spaceand ERB scales that replaces linear Hertz and integer MIDI notes. - The Cognitive Landscape: The real-time DSP pipeline that computes Roughness ($R$) and Harmonicity ($H$) fields from the raw audio stream.
- The Life Engine: The agent-based model governing the metabolism, movement, and neural entrainment of the audio entities.
2. The Psychoacoustic Coordinate System
A critical innovation in Conchordal is the rejection of the linear frequency scale ($f$) for internal processing. Human auditory perception is inherently logarithmic; our perception of pitch interval is based on frequency ratios rather than differences. To model this accurately and efficiently, Conchordal establishes a custom coordinate system, Log2Space, which aligns the computational grid with the tonotopic map of the cochlea.
2.1 The Log2 Space Foundation
The Log2Space struct serves as the backbone for all spectral analysis, kernel convolution, and agent positioning within the system. It maps the physical frequency domain ($f$ in Hz) to a perceptual logarithmic domain ($l$).
2.1.1 Mathematical Definition
The transformation from Hertz to the internal log-coordinate is defined as the base-2 logarithm of the frequency. This choice is deliberate: in base-2, an increment of 1.0 corresponds exactly to an octave, the most fundamental interval in pitch perception.
$$ l(f) = \log_2(f) $$
The inverse transformation, used to derive synthesis parameters for the audio thread, is:
$$ f(l) = 2^l $$
The coordinate space is discretized into a grid defined by a resolution parameter, bins_per_oct ($B$). This parameter determines the granularity of the simulation. A typical value of $B=48$ or $B=96$ provides sub-semitone resolution sufficient for continuous pitch gliding and microtonal inflection. The step size $\Delta l$ is constant across the entire spectral range:
$$ \Delta l = \frac{1}{B} $$
2.1.2 Grid Construction and Indexing
The Log2Space structure pre-calculates the center frequencies for all bins spanning the configured range $[f_{min}, f_{max}]$. The number of bins $N$ is determined to ensure complete coverage:
$$ N = \lfloor \frac{\log_2(f_{max}) - \log_2(f_{min})}{\Delta l} \rfloor + 1 $$
The system maintains two parallel vectors for $O(1)$ access during DSP operations:
centers_log2: The logarithmic coordinates $l_i = \log_2(f_{min}) + i \cdot \Delta l$.centers_hz: The pre-computed linear frequencies $f_i = 2^{l_i}$.
This pre-computation is vital for real-time performance, removing the need for costly log2 and pow calls inside the inner loops of the spectral kernels. The method index_of_freq(hz) provides the quantization logic, mapping an arbitrary float frequency to the nearest bin index.
2.2 Constant-Q Bandwidth Characteristics
The Log2Space inherently enforces a Constant-Q (Constant Quality Factor) characteristic across the spectrum. In signal processing terms, $Q$ is defined as the ratio of the center frequency to the bandwidth: $Q = f / \Delta f$.
In a linear system (like a standard FFT), $\Delta f$ is constant, meaning $Q$ increases with frequency. In Log2Space, the bandwidth $\Delta f_i$ of the $i$-th bin scales proportionally with the center frequency $f_i$. This property mimics the frequency selectivity of the human auditory system, where the ear's ability to resolve frequencies diminishes (in absolute Hz terms) as frequency increases. This alignment allows Conchordal to allocate computational resources efficiently—using high temporal resolution at high frequencies and high spectral resolution at low frequencies—without manual multirate processing.
2.3 The Equivalent Rectangular Bandwidth (ERB) Scale
While Log2Space handles pitch relationships (octaves, harmonics), it does not perfectly model the critical bands of the ear, which are wider at low frequencies than a pure logarithmic mapping suggests. To accurately calculate sensory roughness (dissonance), Conchordal implements the Equivalent Rectangular Bandwidth (ERB) scale based on the Glasberg & Moore (1990) model.
The core/erb.rs module provides the transformation functions used by the Roughness Kernel. The conversion from frequency $f$ (Hz) to ERB-rate units $E$ is given by:
$$ E(f) = 21.4 \log_{10}(0.00437f + 1) $$
The bandwidth of a critical band at frequency $f$ is:
$$ BW_{ERB}(f) = 24.7(0.00437f + 1) $$
This scale is distinct from Log2Space. While Log2Space is the domain for pitch and harmonicity (where relationships are octave-invariant), the roughness calculation requires mapping spectral energy into the ERB domain to evaluate interference. The system effectively maintains a dual-view of the spectrum: one strictly logarithmic for harmonic templates, and one psychoacoustic for dissonance evaluation.
3. The Auditory Landscape: Analyzing the Environment
The "Landscape" is the central data structure in Conchordal. It acts as the shared environment for all agents, a dynamic scalar field representing the psychoacoustic "potential" of every frequency bin. Agents do not interact directly with each other; they interact with the Landscape, which aggregates the spectral energy of the entire population. This decouples the complexity of the simulation from the number of agents ($O(N)$ vs $O(N^2)$).
The Landscape is updated every audio frame (or block) by the Analysis Worker. It synthesizes two primary metrics:
- Roughness ($R$): The sensory dissonance caused by rapid beating between proximal partials.
- Harmonicity ($H$): The measure of virtual pitch strength and spectral periodicity.
Both metrics are normalized to the $[0, 1]$ range. Consonance is then derived in two layers: a Consonance Kernel that fuses the observables into a single fitness score, and a set of Representation transforms that reshape that score for different downstream consumers.
Layer 1 — Consonance Kernel (bilinear family):
$$ C_{score} = a \cdot H_{01} + b \cdot R_{01} + c \cdot H_{01} R_{01} + d $$
Default coefficients: $a = 1.0$, $b = -1.35$, $c = 1.0$, $d = 0.0$. Because $b < 0$, roughness acts as a penalty; because $c > 0$, high harmonicity attenuates that penalty (the interaction term $c \cdot H_{01} R_{01}$ partially cancels $b \cdot R_{01}$ when $H_{01}$ is large). The bilinear family subsumes the earlier $\alpha H - wR$ formulation as the special case $c = 0$.
Layer 2 — Representations:
| Name | Formula | Range | Meaning |
|---|---|---|---|
| $C_{score}$ | $aH + bR + cHR + d$ | $(-\infty,+\infty)$ | raw fitness from the kernel |
| $C_{level01}$ | $\sigma(\beta(C_{score} - \theta))$ | $[0,1]$ | metabolism gate (sigmoid) |
| $C_{density_mass}$ | $\max(0,;H_{01}(1 - \rho R_{01}))$ | $[0,+\infty)$ | raw density mass ($\rho$-kernel) |
| $C_{density_pmf}$ | $\text{normalize}(C_{density_mass})$ | $[0,1],;\Sigma=1$ | pitch-selection PMF |
| $C_{energy}$ | $-C_{score}$ | $(-\infty,+\infty)$ | energy for minimization |
where $\sigma(x) = 1/(1+e^{-x})$, $\beta$ controls sigmoid steepness (default 2.0), and $\theta$ is the sigmoid threshold (default 0.0). The density mass uses a separate $\rho$-kernel with coefficients $a{=}1, b{=}0, c{=}{-}\rho, d{=}0$, so that $C_{density_mass} = H_{01}(1 - \rho R_{01})$ clamped to $\geq 0$; the parameter $\rho$ (consonance_density_roughness_gain, default 1.0) controls how strongly roughness suppresses spawn probability.
Individual agents maintain their own perceptual context (PerceptualContext) which tracks per-agent boredom and familiarity, providing additional score adjustments during pitch selection.
3.1 Non-Stationary Gabor Transform (NSGT)
To populate the Log2Space with spectral data, Conchordal uses a custom implementation of the Non-Stationary Gabor Transform (NSGT). Unlike the Short-Time Fourier Transform (STFT), which uses a fixed window size, the NSGT varies the window length $L$ inversely with frequency to maintain the Constant-Q property derived in Section 2.2.
3.1.1 Kernel-Based Spectral Analysis
The implementation in core/nsgt_kernel.rs utilizes a sparse kernel approach to perform this transform efficiently. For each log-frequency band $k$, a time-domain kernel $h_k$ is precomputed. This kernel combines a complex sinusoid at the band's center frequency $f_k$ with a periodic Hann window $w_k$ of length $L_k \approx Q \cdot f_s / f_k$.
$$ h_k[n] = w_k[n] \cdot e^{-j 2\pi f_k n / f_s} $$
These kernels are transformed into the frequency domain ($K_k[\nu]$) during initialization. To optimize performance, the system sparsifies these frequency kernels, storing only the bins with significant energy.
During runtime, the system performs a single FFT on the input audio buffer to obtain the spectrum $X[\nu]$. The complex coefficient $C_k$ for band $k$ is then computed via the inner product in the frequency domain:
$$ C_k = \frac{1}{N_{fft}} \sum_{\nu} X[\nu] \cdot K_k^*[\nu] $$
This "one FFT, many kernels" approach allows Conchordal to generate a high-resolution, logarithmically spaced spectrum covering 20Hz to 20kHz without the computational overhead of calculating separate DFTs for each band or using recursive filter banks.
3.1.2 Real-Time Temporal Smoothing
The raw spectral coefficients $C_k$ exhibit high variance due to the stochastic nature of the audio input (especially with noise-based agents). To create a stable field for agents to sample, the RtNsgtKernelLog2 struct wraps the NSGT with a temporal smoothing layer.
It implements a per-band leaky integrator (exponential smoothing). Crucially, the time constant $\tau$ is frequency-dependent. Low frequencies, which evolve slowly, are smoothed with a longer $\tau$, while high frequencies, which carry transient details, have a shorter $\tau$.
$$ y_k[t] = (1 - \alpha_k) \cdot |C_k[t]| + \alpha_k \cdot y_k[t-1] $$
where the smoothing factor $\alpha_k$ is derived from the frame interval $\Delta t$:
$$ \alpha_k = e^{-\Delta t / \tau(f_k)} $$
This models the "integration time" of the ear, ensuring that the Landscape reflects a psychoacoustic percept rather than instantaneous signal power.
3.2 Roughness ($R$) Calculation: The Plomp-Levelt Model
Roughness is the sensation of "harshness" or "buzzing" caused by the interference of spectral components that fall within the same critical band but are not sufficiently close to be perceived as a single tone (beating). Conchordal implements a variation of the Plomp-Levelt model via convolution in the ERB domain.
3.2.1 The Interference Kernel
The core of the calculation is the Roughness Kernel, defined in core/roughness_kernel.rs. This kernel $K_{rough}(\Delta z)$ models the interference curve between two partials separated by $\Delta z$ ERB. The curve creates a penalty that rises rapidly as partials separate, peaks at approximately 0.25 ERB (maximum roughness), and then decays as they separate further.
The implementation uses a parameterized function eval_kernel_delta_erb to generate this shape:
$$ g(\Delta z) = e^{-\frac{\Delta z^2}{2\sigma^2}} \cdot (1 - e^{-(\frac{\Delta z}{\sigma_{suppress}})^p}) $$
The second term is a suppression factor that ensures the kernel goes to zero as $\Delta z \to 0$, preventing a single pure tone from generating self-roughness.
3.2.2 Convolutional Approach
Calculating roughness pairwise for all spectral bins ($N^2$ complexity) is computationally prohibitive for real-time applications. Conchordal solves this by treating the Roughness calculation as a linear convolution.
- Mapping: The log-spaced amplitude spectrum from the NSGT is mapped (or interpolated) onto a linear ERB grid.
- Convolution: This density $A(z)$ is convolved with the pre-calculated roughness kernel $K_{rough}$.
$$ R_{shape}(z) = (A * K_{rough})(z) = \int A(z-\tau) K_{rough}(\tau) d\tau $$
The result $R_{shape}(z)$ represents the raw "Roughness Shape" at frequency $z$. To convert this to a normalized fitness signal, Conchordal applies a physiological saturation mapping.
3.2.3 Physiological Saturation Mapping
Raw roughness values from the convolution have unbounded range. Rather than hard-clamping, Conchordal uses a saturation curve that models the compressive nonlinearity of auditory perception. This mapping converts reference-normalized roughness ratios to the $[0, 1]$ range.
Reference Normalization: The system maintains reference values $r_{ref,peak}$ and $r_{ref,total}$ representing "typical" roughness levels. The reference-normalized ratios are:
$$ x_{peak}(u) = \frac{R_{shape}(u)}{r_{ref,peak}} $$
$$ x_{total} = \frac{R_{shape,total}}{r_{ref,total}} $$
The Saturation Parameter: The parameter roughness_k ($k > 0$) controls the saturation curve's shoulder. The reference ratio $x = 1$ maps to:
$$ R_{ref} = \frac{1}{1+k} $$
Larger $k$ reduces $R_{01}$ for the same input ratio, making the system more tolerant of roughness.
Piecewise Saturation Mapping: The normalized roughness $R_{01}$ is computed from the reference-normalized ratio $x$ as:
$$ R_{01}(x; k) = \begin{cases} 0 & \text{if } x \leq 0 \ x \cdot \frac{1}{1+k} & \text{if } 0 < x < 1 \ 1 - \frac{k}{x+k} & \text{if } x \geq 1 \end{cases} $$
This function is continuous at $x = 1$ (both branches yield $\frac{1}{1+k}$) and saturates asymptotically to 1 as $x \to \infty$. The piecewise structure ensures linear response for low roughness (preserving sensitivity) while compressing extreme values (preventing saturation).
Numerical Safety: The implementation handles edge cases robustly:
- $x = \text{NaN} \to 0$
- $x = +\infty \to 1$
- $x = -\infty \to 0$
- Non-finite $k$ is treated as $10^{-6}$
Agents seeking consonance actively avoid peaks in the $R_{01}$ field.
3.3 Harmonicity ($H$): The Sibling Projection Algorithm
While Roughness drives agents away from dissonance (segregation), Harmonicity ($H$) drives them toward fusion—the creation of coherent chords and timbres. Conchordal introduces a novel algorithm termed "Sibling Projection" to compute this field. This algorithm approximates the brain's mechanism of "Common Root" detection (Virtual Pitch) entirely in the frequency domain.
3.3.1 Concept: Virtual Roots
The algorithm posits that any spectral peak at frequency $f$ implies the potential existence of a fundamental frequency (root) at its subharmonics ($f/2, f/3, f/4 \dots$). If multiple spectral peaks share a common subharmonic, that subharmonic represents a strong "Virtual Root".
3.3.2 The Two-Pass Projection
The algorithm operates on the Log2Space spectrum in two passes, utilizing the integer properties of the logarithmic grid:
-
Downward Projection (Root Search): The current spectral envelope is "smeared" downward. For every bin $i$ with energy, the algorithm adds energy to bins $i - \log_2(k)$ for integers $k \in {1, 2, \dots, N}$.
$$ Roots[i] = \sum_k A[i + \log_2(k)] \cdot w_k $$
Here, $w_k$ is a weighting factor that decays with harmonic index $k$ (e.g., $k^{-\rho}$), reflecting that lower harmonics imply their roots more strongly than higher ones. The result
Rootsdescribes the strength of the virtual pitch at every frequency. -
Upward Projection (Harmonic Resonance): The system then projects the
Rootsspectrum back upwards. If a strong root exists at $f_r$, it implies stability for all its natural harmonics ($f_r, 2f_r, 3f_r \dots$).$$ H[i] = \sum_m Roots[i - \log_2(m)] \cdot w_m $$
Emergent Tonal Stability: Consider an environment with a single tone at 200 Hz.
- Step 1 (Down): It projects roots at 100 Hz ($f/2$), 66.6 Hz ($f/3$), 50 Hz ($f/4$), etc.
- Step 2 (Up): The 100 Hz root projects stability to 100, 200, 300, 400, 500... Hz.
- 300 Hz is the Perfect 5th of the 100 Hz root.
- 500 Hz is the Major 3rd of the 100 Hz root.
Thus, without any hardcoded knowledge of Western music theory, the system naturally generates stability peaks at the Major 3rd and Perfect 5th relationships, simply as a consequence of the physics of the harmonic series. An agent at 200 Hz creates a "gravity well" at 300 Hz and 500 Hz, inviting other agents to form a major triad.
3.3.3 Mirror Dualism: Overtone vs. Undertone
The implementation in core/harmonicity_kernel.rs includes a profound parameter: mirror_weight ($\alpha$). This parameter blends two distinct projection paths:
- Path A (Overtone/Major): The standard "Down-then-Up" projection described above. It creates gravity based on the Overtone Series, favoring Major tonalities.
- Path B (Undertone/Minor): An inverted "Up-then-Down" projection. It finds common overtones and projects undertones. This is the theoretical dual of Path A and favors Minor or Phrygian tonalities (the Undertone Series).
$$ H_{final} = (1-\alpha)H_{overtone} + \alpha H_{undertone} $$
By modulating mirror_weight, a user can continuously morph the fundamental physics of the universe from Major-centric to Minor-centric, observing how the ecosystem reorganizes itself in response.
4. The Life Engine: Agents and Autonomy
The "Life Engine" is the agent-based simulation layer that runs atop the DSP landscape. It manages the population of "Individuals," handling their lifecycle, sensory processing, and actuation (audio synthesis).
4.1 The Individual Architecture
The Individual struct (life/individual.rs) is the atomic unit of the ecosystem. It is composed of several components:
- An
AnySoundBodyactuator (synthesis backend). - An
ArticulationWrapper(wrapping anArticulationCore). - A
PitchController(wrapping aPitchCore). - A
PhonationEnginethat manages note-level timing and command queuing. - An optional
VoiceAdsrenvelope for attack-decay-sustain-release shaping. - Lifecycle and metabolism tracking (energy, age, perceptual context).
The Individual itself acts as an integration layer, managing the control-plane signals that coordinate the components without coupling them directly.
4.1.1 The SoundBody (Actuator)
The BodyMethod enum defines three synthesis body types, each projecting a distinct spectral footprint onto the Landscape:
Sine: A pure sine tone via a single oscillator. Minimal spectral interference; useful as anchors and calibration probes.Harmonic: A complex tone with aTimbreGenotypegoverning its partial structure. Parameters include:stiffness: Inharmonicity coefficient (stretching the partial series).brightness: Spectral slope (decay of higher partials).comb: Even harmonic attenuation.damping: Frequency-dependent decay rates.vibrato_rate/vibrato_depth: LFO-based pitch modulation.jitter: 1/f pink noise FM strength for organic fluctuation.unison: Detuned copy amount for chorus-like thickening.mode: Harmonic (integer multiples) vs. Metallic (non-integer ratios).
Modal: Resonator-based synthesis viaModalEngine, supporting arbitrary mode frequency ratios and decay times. Mode patterns can be specified as harmonic, odd harmonics, power-law, stiff string, or custom ratios.
Sound generation is dispatched through the AnyBackend enum:
Oscillator(OscillatorBank): A struct-of-arrays layout for cache-efficient additive synthesis. HandlesSineandHarmonicbodies. Pitch refresh occurs every 64 samples; motion/vibrato refresh every 8 samples.Resonator(ModalEngine): A Damped Modified Coupled Form resonator bank. HandlesModalbodies. Mode coefficients are rebuilt every 64 samples on pitch change.
The HarmonicBody allows for the evolution of timbre. An agent with high stiffness might find survival difficult in a purely harmonic landscape, forcing it to seek out unique "spectral niches" where its inharmonic partials do not clash with the population.
4.1.2 The Core Stack
Behavior is split into three focused cores plus the PhonationEngine, each defined in a separate file:
-
ArticulationCore (When/Gate) —
life/articulation_core.rs: Manages rhythm, gating, and envelope dynamics. Three variants exist:KuramotoCore: Coupled oscillator with an energy/vitality model, rhythm coupling modes (TemporalOnly,TemporalTimesVitality), rhythm reward (metabolism bonus for phase match), and autonomous attack capability. Fields includeenergy,energy_cap,vitality_level,vitality_exponent,sensitivity(delta/theta/alpha/beta), andk_omegacoupling strength.SequencedCore: Fixed-duration gate patterns.DroneCore: Sustained output with optional sway modulation.
-
PitchCore (Where) —
life/pitch_core.rs: Proposes the next target in log-frequency space. Two implementations:PitchHillClimbPitchCore: Local search with crowding penalties. Parameters:neighbor_step_log2,tessitura_gravity,landscape_weight,move_cost_coeff,move_cost_exp,improvement_threshold,exploration,persistence,anneal_temp. Crowding:crowding_strength,crowding_sigma_cents,crowding_sigma_from_roughness(derives sigma from the roughness kernel's critical band width). Leave-self-out analysis supportsApproxHarmonicsandExactScanmodes.PitchPeakSamplerPitchCore: Probabilistic peak sampling withwindow_cents,top_k,temperature,sigma_cents.
-
PhonationEngine —
life/phonation_engine.rs: Manages note-level command scheduling. IssuesNoteCmd(NoteOn, NoteOff, Update) to theScheduleRenderer. UsesThetaGridfor gate-synchronized onset timing. Configuration is viaPhonationSpec:- When:
Once(single trigger),Pulse { rate_hz, sync, social }(repeated triggers). - Duration:
WhileAlive,Gates(n),Field { hold_min_theta, hold_max_theta, curve_k, curve_x0, drop_gain }.
- When:
4.1.3 The Sound Pipeline
Audio rendering is handled by ScheduleRenderer (life/schedule_renderer.rs), which maintains a HashMap<VoiceKey, Voice> of active voices.
The Voice struct (life/sound/voice.rs) combines:
- A backend (
AnyBackend:OscillatorBankorModalEngine). - An optional
RenderModulatorfor articulation envelope shaping. - An ADSR envelope: linear attack ramp, exponential decay to sustain level, constant sustain, linear release ramp.
- Smoothed pitch and amplitude transitions with configurable time constants.
- Continuous drive for sustained excitation.
The processing flow proceeds as follows: the PhonationEngine emits NoteCmd commands; the ScheduleRenderer creates, updates, or releases Voice instances accordingly; each Voice renders through its backend with ADSR shaping; the results are mixed to mono output.
4.1.4 Control-Plane Signals: Planned and Error
The Individual coordinates its cores through two orthogonal signals rather than direct coupling:
- Planned: The PitchCore proposes a target (
TargetProposal), and the Individual maintains the "planned" state—next target frequency, expected jump distance, and salience. This represents the agent's intention. - Error: The Individual computes the discrepancy between the SoundBody's current pitch and the planned target (signed cents, absolute cents). This represents the result of prior actions and is available for observation or future extensions (e.g., adaptive articulation). Importantly, the PitchCore does not read the error signal—search remains decoupled from feedback.
This separation keeps each core focused: PitchCore explores the landscape, ArticulationCore shapes the envelope, and the Individual orchestrates timing and state transitions.
4.2 Lifecycle and Metabolism
Agents in Conchordal are governed by energy dynamics modeled on biological metabolism. The LifecycleConfig defines two modes of existence:
- Decay: The agent is born with a fixed
initial_energypool. It expends this energy over time (half-life) and dies when it reaches zero. This models transient sounds like plucks or percussion. - Sustain: The agent has a
metabolism_rate(energy loss per second) and can gain energy via consonance-dependent recharge throughMetabolismPolicy.- Recharge: Energy gained per phonation attack is scaled by $C_{level01}$.
- Action Cost: An optional cost for pitch movement, penalizing excessive frequency hopping.
- Rhythm Reward: An optional
MetabolismRhythmRewardprovides a metabolic bonus for phase-matched attacks, configured viarho_tandAttackPhaseMatchmetric.
This mechanic creates a Darwinian pressure: Survival of the Consonant. Agents in dissonant (low $C_{level01}$) regions starve—energy depletes, amplitude fades, and they die. Agents in consonant (high $C_{level01}$) regions thrive—they maintain or gain energy, allowing them to sing louder and live longer. The musical structure emerges because only agents that find harmonic relationships survive to be heard.
4.3 Pitch Retargeting Logic
Agents are not static; they move through frequency space to improve their fitness. The execution layer applies a retarget gate (theta zero-crossing plus an integration window) and then asks the PitchCore to propose the next target.
4.3.1 Pitch Application Modes
Two modes govern how a new pitch target is applied:
- GateSnap (default): Discrete hop at gate boundaries. The ArticulationCore closes the gate, fading amplitude to silence; the Individual updates the SoundBody's pitch to the new target (discrete jump); the gate reopens and the new pitch sounds. Ordering matters: on the sample where the snap occurs, the pitch is updated before consonance is evaluated, ensuring the Landscape score reflects the agent's actual sounding frequency.
- Glide: Smooth continuous pitch transition with a configurable time constant $\tau$. The SoundBody interpolates exponentially toward the target frequency, producing portamento effects. Suited for drone-like species or slow melodic movement.
4.3.2 Crowding and Leave-Self-Out
The crowding system prevents agents from collapsing to identical frequencies. Rather than using ad-hoc constants, it employs an analytical roughness complement: a Gaussian penalty centered on each occupied frequency, with width $\sigma$ that can be derived from the roughness kernel's critical band width (crowding_sigma_from_roughness). A pairwise split bias further prevents frequency degeneracy.
When evaluating landscape fitness, an agent can subtract its own spectral contribution via leave-self-out analysis. Two modes are supported:
ApproxHarmonics: Fast approximation using ~24 cent Gaussian subtraction.ExactScan: Full ERB grid scan for precise spectral subtraction.
These timing-sensitive transitions and crowding evaluations are guarded by regression tests to prevent subtle breakage.
5. Temporal Dynamics: Neural Rhythms
Conchordal eschews the concept of a master clock or metronome. Instead, time is structured by a continuous modulation field inspired by Neural Oscillations (brainwaves). This is the "Time" equivalent of the "Space" landscape.
5.1 The Modulation Bank
The NeuralRhythms struct manages a bank of resonating filters tuned to physiological frequency bands:
- Delta (0.5--4 Hz): The macroscopic "pulse" of the ecosystem. Agents locked to this band play long, phrase-level notes.
- Theta (4--8 Hz): The "articulation" rate. Governs syllabic rhythms and medium-speed motifs.
- Alpha (8--12 Hz): The "texture" rate. Used for tremolo, vibrato, and shimmering effects.
- Beta (15--30 Hz): The "tension" rate. High-speed flutters associated with dissonance or excitement.
5.2 DorsalStream and Rhythm Extraction
The DorsalStream (core/stream/dorsal.rs) performs real-time rhythm analysis using a 3-Band Crossover Flux architecture:
- Low band (~200 Hz crossover): Captures bass and kick energy.
- Mid band (~3 kHz crossover): Captures vocal and melodic energy.
- High band: Captures high-frequency transients.
The DorsalMetrics struct provides per-band energy (e_low, e_mid, e_high) and spectral flux values, which feed into the NeuralRhythms modulation bank. This creates a closed loop: agents produce audio, the DorsalStream extracts rhythmic structure from that audio, and the NeuralRhythms modulate agent behavior in return.
5.3 Vitality and Self-Oscillation
Each band is implemented as a Resonator, a damped harmonic oscillator. A key parameter is vitality.
Vitality = 0: The resonator acts as a passive filter. It only rings when excited by an event (e.g., a loud agent spawning) and then decays.Vitality > 0: The resonator has active gain. It can self-oscillate, maintaining a rhythmic cycle even in the absence of input.
This creates a two-way interaction: The global rhythm drives the agents (entrainment), but the agents also drive the global rhythm (excitation). A loud "kick" agent spawning in the Delta band will "ring" the Delta resonator, causing other agents coupled to that band to synchronize.
5.4 Kuramoto Entrainment
The KuramotoCore ArticulationCore uses a Kuramoto-style model of coupled oscillators:
$$ \frac{d\theta_i}{dt} = \omega_i + \frac{K}{N} \sum_{j=1}^N \sin(\theta_j - \theta_i) $$
In Conchordal, the "coupling" $K$ is to the global NeuralRhythms rather than directly to every other agent (Mean Field approximation).
The KuramotoCore implements a full energy/vitality model with several interacting subsystems:
- Energy Pool: Bounded by
energy_cap. Energy is consumed by attacks and regenerated through metabolism. - Vitality:
vitality_levelandvitality_exponentcontrol self-oscillation strength. Higher vitality allows the oscillator to maintain phase coherence independently. - Rhythm Coupling Modes:
TemporalOnly: Phase coupling only—the agent locks to the NeuralRhythm phase regardless of its internal state.TemporalTimesVitality { lambda_v, v_floor }: Coupling strength is modulated by vitality, creating a feedback loop where healthy agents synchronize more strongly.
- Rhythm Reward: An optional
MetabolismRhythmRewardwith parameterrho_tandAttackPhaseMatchmetric provides a metabolic bonus for phase-matched onsets, linking rhythmic conformity to survival. - Autonomous Attack: Self-triggered attacks when phase conditions align with thresholds (
env_open,magnitude,alpha), enabling the oscillator to initiate sound events without external commands. - Effective Coupling: The actual coupling strength is computed as:
$$ K_{eff} = \omega_{target} \cdot K_{global} \cdot s_\theta \cdot |\theta_{mag}| \cdot \theta_\alpha \cdot g_{env} \cdot a_{env} $$
where $s_\theta$ is the agent's theta sensitivity, $|\theta_{mag}|$ and $\theta_\alpha$ are the oscillator magnitude and alpha, and $g_{env}$, $a_{env}$ are the envelope gate and amplitude. Public helper functions kuramoto_k_eff() and kuramoto_phase_step() are exposed for external simulation use (e.g., paper experiments).
6. System Architecture and Implementation Details
Conchordal is implemented in Rust to satisfy the stringent requirements of real-time audio (latency < 10ms) alongside heavy numerical analysis (NSGT/Convolution). The architecture uses a concurrent, lock-free design pattern.
6.1 Threading Model
The application creates three primary thread contexts, plus the GUI event loop:
-
Audio Thread (Real-Time Priority):
- Managed by
cpalinaudio/output.rs. - Constraint: Must never block. No Mutexes, no memory allocation.
- Responsibility: Pops mono samples from a lock-free ring buffer and copies them to all output channels. A
Limiter(soft-clip or peak-limiter) is applied in-place on the interleaved output.
- Managed by
-
Analysis Thread (Background Priority):
- Defined in
core/analysis_worker.rs, runningAnalysisStreamfromcore/stream/analysis.rs. - Responsibility: Receives audio hops (time-domain chunks), runs the NSGT to produce a log2 power spectrum, then computes both the Harmonicity field (Sibling Projection) and the Roughness field (ERB-domain convolution) in a single pipeline.
- Update Cycle: When analysis is complete, it sends the updated Landscape snapshot back to the worker thread via a bounded SPSC channel.
- Defined in
-
Worker Thread (Simulation Loop):
- Named
"worker"inapp.rs. - Responsibility: Runs the main simulation loop. Each iteration: merges analysis results into the current Landscape, dispatches Conductor events, advances the Population (pitch retargeting, articulation, metabolism), renders audio via
ScheduleRenderer(which processesPhonationBatchvectors ofNoteCmdand maintains theVoicepool), feeds theDorsalStreamfor rhythm extraction, and pushes mono samples into the ring buffer for the audio thread.
- Named
-
App/GUI Thread (Main):
- Runs the
eframe/eguivisualizer. - Responsibility: Handles user input, visualizing the Landscape (
ui/plots.rs), and displaying simulation metadata. It receivesUiFramesnapshots from the worker thread via a bounded channel.
- Runs the
6.2 Data Flow
To maintain data consistency without locking the audio thread, Conchordal uses a multi-channel update strategy for the Landscape:
- The Worker Thread renders audio and sends each hop to the Analysis Thread via a bounded channel.
- The Analysis Thread runs the full NSGT + Roughness + Harmonicity pipeline and sends the resulting
Landscapesnapshot back. - The Worker Thread merges the analysis result into the current
LandscapeFrame, recomputing the combined Consonance field. - The
Populationevaluates the current Landscape for pitch selection, metabolism, and agent lifecycle. - The
PhonationEngineemitsNoteCmdbatches; theScheduleRenderercreates, updates, or releasesVoiceinstances accordingly and renders audio through ADSR-shaped backends. - The
DorsalStreamprocesses the rendered audio synchronously to update rhythm metrics (DorsalMetrics), stored inlandscape.rhythm. - Rendered mono audio is pushed into a lock-free ring buffer consumed by the Audio Thread.
This decoupled architecture ensures that the audio thread always sees a consistent stream of samples, even if the analysis thread lags slightly behind real-time. The analysis thread processes all hops in-order to maintain NSGT time continuity.
6.3 The Conductor: Scripting with Rhai
The Conductor module acts as the interface between the human artist and the ecosystem. It embeds the Rhai scripting language, exposing a tiered API for controlling the simulation.
6.3.1 Species Configuration
Species are configured via a SpeciesHandle builder pattern. A species begins with a preset and is refined through method chaining:
Presets: sine, harmonic, saw, square, noise, modal.
Derivation: derive(parent) clones an existing species for modification, enabling inheritance-style composition.
Body: amp(v), freq(v), brightness(v), spread(v), voices(n), modes(pattern).
Pitch: pitch_mode("free"|"lock"), pitch_core("hill_climb"|"peak_sampler"), pitch_apply("gate_snap"|"glide"), pitch_glide(tau), landscape_weight(v), neighbor_step_cents(v), tessitura_gravity(v), exploration(v), persistence(v), anneal_temp(v), move_cost(v), improvement_threshold(v), proposal_interval(sec), window_cents(v), top_k(n), temperature(v), sigma_cents(v), random_candidates(n), global_peaks(n), ratio_candidates(n).
Crowding: crowding(strength) (auto-sigma from roughness kernel), crowding(strength, sigma_cents), crowding_target(same, other), leave_self_out(bool), leave_self_out_mode("approx"|"exact"), leave_self_out_harmonics(n).
Brain/Phonation: brain("entrain"|"seq"|"drone"), sustain(), repeat(), once(), pulse(rate), while_alive(), gates(n), field(), sync(depth), social(coupling), field_window(min,max), field_curve(k,x0), field_drop(gain).
Lifecycle: metabolism(rate), adsr(a,d,s,r).
Rhythm: rhythm_coupling("temporal"), rhythm_coupling_vitality(lambda_v, v_floor), rhythm_reward(rho_t, "attack_phase_match").
Respawn: respawn_random(), respawn_hereditary(sigma_oct).
6.3.2 Mode Patterns
Modal synthesis mode patterns are specified via constructor functions with optional modifiers:
harmonic_modes(),odd_modes(),power_modes(beta),stiff_string_modes(stiffness),custom_modes(ratios),modal_table(name),landscape_density_modes(),landscape_peaks_modes().
Modifiers: .count(n), .range(min, max), .jitter(cents), .seed(s).
6.3.3 Spawn Strategies
Spawn strategies determine initial frequency placement: consonance(root), consonance_density_pmf(min, max), random_log(min, max), linear(start, end). Modifiers: .range(min, max), .min_dist(d).
6.3.4 Group Operations
create(species, count): Instantiates a group of agents. Returns aGroupHandle..place(strategy): Assigns a spawn strategy to a group.release(group): Marks a group for fade-out release.
Groups support live-patching of pitch parameters, amplitude, and timbre during execution.
6.3.5 Control Flow
wait(sec): Commits pending groups, then advances the timeline cursor.flush(): Commits pending groups without advancing the timeline.seed(n): Sets the random seed for reproducible runs.scene(name, callback): Marks a named scene boundary; groups created within the callback are automatically released when the scene ends.play(callback): Executes a scoped block—groups created inside are released on exit.parallel([callbacks]): Runs multiple blocks concurrently (timeline branches), advancing the cursor to the latest endpoint.
6.3.6 Global Parameters
set_harmonicity_mirror_weight(v): Modulates themirror_weightparameter in real-time.set_roughness_k(v): Adjusts the roughness saturation parameter $k$.set_global_coupling(v): Controls the Kuramoto coupling strength.set_pitch_objective("consonance"|"dissonance"): Inverts the fitness function for adversarial experiments.
Scenario Parsing: Scenarios are loaded from .rhai files. This separation allows users to compose the "Macro-Structure" (the narrative arc, the changing laws of physics) while the "Micro-Structure" (the specific notes and rhythms) emerges from the agents' adaptation to those changes.
7. Case Studies: Analysis of Emergent Behavior
The following examples, derived from the samples/ directory, illustrate how specific parameter configurations lead to complex musical behaviors.
7.1 Case Study: Self-Organizing Rhythm (samples/02_mechanisms/rhythmic_sync.rhai)
This script demonstrates the emergent quantization of time.
- Phase 1 (The Seed): A single, high-energy agent "Kick" is spawned at 60 Hz. Its periodic articulation excites the Delta band resonator in the
NeuralRhythms. - Phase 2 (The Swarm): A cloud of agents is spawned with random phases.
- Emergence: Because the agents use
KuramotoCoreArticulationCores coupled to the Delta band, they sense the rhythm established by the Kick. Over a period of seconds, their phases drift and lock into alignment with the Kick. The result is a synchronized pulse that was not explicitly programmed into the swarm—it arose from the physics of the coupled oscillators.
7.2 Case Study: Mirror Dualism (samples/04_ecosystems/mirror_dualism.rhai)
This script explores the structural role of the mirror_weight parameter.
- Setup: An anchor drone is established at C4 (261.63 Hz).
- State A (Major):
set_harmonicity_mirror_weight(0.0). The system uses the Common Root projection (Overtone Series). Agents seeking consonance cluster around E4 and G4, forming a C Major triad. - State B (Minor):
set_harmonicity_mirror_weight(1.0). The system switches to Common Overtone projection (Undertone Series). The "gravity" of the landscape inverts. Agents now find stability at Ab3 and F3 (intervals of a minor sixth and perfect fourth relative to C), creating a Phrygian/Minor texture. This demonstrates that "Tonality" in Conchordal is a manipulable environmental variable, akin to temperature or gravity.
7.3 Case Study: Drift and Flow (samples/04_ecosystems/drift_flow.rhai)
This script validates the hop-based movement logic.
- Action: A strongly dissonant agent (C#3) is placed next to a strong anchor (C2).
- Observation: The C#3 agent makes discrete hops in pitch. It is "pulled" by the Harmonicity field, fading out and snapping to a nearby harmonic "well" (likely E3 or G3).
- Dynamics: If per-agent boredom is enabled, the agent will settle at E3 for a few seconds, then "get bored" (local consonance drops due to perceptual adaptation), and hop away again to find a new stable interval. This results in an endless, non-repeating melody generated by simple physical rules of attraction and repulsion.
8. Conclusion
Conchordal establishes a foundation for Bio-Mimetic Computational Audio. By replacing the rigid abstractions of music theory (notes, grids, BPM) with continuous physiological models (Log2Space, ERB bands, neural oscillation), it creates a system where music is not constructed, but grown.
The paper "Conchordal: Emergent Harmony via Direct Cognitive Coupling in a Psychoacoustic Landscape" (arXiv:2603.25637) validated the psychoacoustic landscape as an effective ALife terrain through controlled experiments demonstrating self-organization, selection, synchronization, and hereditary accumulation. These results confirm that the Roughness-Harmonicity-Consonance pipeline and the Kuramoto entrainment model produce musically coherent emergent behavior under a range of initial conditions.
Version 0.3.0 extends the architecture with modal synthesis (damped resonator banks), ADSR envelopes for per-voice amplitude shaping, the PhonationEngine note scheduling system for decoupled timing control, and expanded crowding/pitch control (leave-self-out analysis, roughness-derived sigma, simulated annealing). The Rhai scripting API has been substantially expanded to expose these capabilities through a tiered builder-pattern interface.
The technical architecture—anchored by the Log2Space coordinate system and the "Sibling Projection" algorithm—provides a robust mathematical foundation for this paradigm. The use of Rust ensures that these complex biological simulations can run in real-time, bridging the gap between ALife research and performative musical instruments.
Future development will focus on integrating paper findings into the main binary (v0.4.0), spatialization (extending the landscape to 3D space), and evolutionary genetics (allowing successful agents to pass on their TimbreGenotype), further deepening the analogy between sound and life.
Appendix A: Key System Parameters
| Parameter | Module | Unit | Description |
|---|---|---|---|
bins_per_oct | Log2Space | Int | Resolution of the frequency grid (typ. 48-96). |
sigma_cents | HarmonicityParams | Cents | Width of harmonic peaks. Lower = stricter intonation. |
mirror_weight | HarmonicityParams | 0.0-1.0 | Balance between Overtone (Major) and Undertone (Minor) gravity. |
roughness_k | LandscapeParams | Float | Saturation parameter for roughness mapping. Default: $(1/0.7) - 1 \approx 0.4286$ (so $x=1$ maps to $\approx 0.7$). |
kernel.a | ConsonanceKernel | Float | Harmonicity coefficient (default 1.0). |
kernel.b | ConsonanceKernel | Float | Roughness coefficient (default -1.35; negative penalizes roughness). |
kernel.c | ConsonanceKernel | Float | Interaction coefficient (default 1.0; positive attenuates roughness penalty at high harmonicity). |
kernel.d | ConsonanceKernel | Float | Bias term (default 0.0). |
beta | ConsonanceRepresentationParams | Float | Sigmoid steepness for $C_{level01}$ (default 2.0). |
theta | ConsonanceRepresentationParams | Float | Sigmoid threshold for $C_{level01}$ (default 0.0). |
consonance_density_roughness_gain | LandscapeParams | Float | $\rho$ in density kernel $H(1-\rho R)$ (default 1.0). |
vitality | DorsalStream | 0.0-1.0 | Self-oscillation energy of the rhythm section. |
persistence | PitchHillClimbPitchCore | 0.0-1.0 | Resistance to movement/change (policy bias within pitch selection). |
crowding_strength | PitchHillClimbPitchCore | Float | Strength of frequency-space crowding avoidance. |
crowding_sigma_cents | PitchHillClimbPitchCore | Cents | Width of crowding penalty Gaussian (default 60). |
leave_self_out | PitchHillClimbPitchCore | Bool | Whether to subtract own spectral contribution during evaluation. |
anneal_temp | PitchHillClimbPitchCore | Float | Simulated annealing temperature for pitch proposals. |
attack_step | KuramotoCore | Float | Envelope attack step size. |
decay_rate | KuramotoCore | Float | Envelope decay rate. |
k_omega | KuramotoCore | Float | Coupling strength scaling for Kuramoto phase step. |
Appendix B: Mathematical Summary
Consonance Kernel (bilinear):
$$ C_{score} = a \cdot H_{01} + b \cdot R_{01} + c \cdot H_{01} R_{01} + d $$
Consonance Level (sigmoid representation):
$$ C_{level01} = \frac{1}{1 + e^{-\beta(C_{score} - \theta)}} $$
Consonance Density Mass ($\rho$-kernel):
$$ C_{density_mass} = \max(0,; H_{01}(1 - \rho R_{01})) $$
Roughness Saturation Mapping (from reference-normalized ratio $x$ to $R_{01} \in [0,1]$):
$$ R_{01}(x; k) = \begin{cases} 0 & \text{if } x \leq 0 \ x \cdot \frac{1}{1+k} & \text{if } 0 < x < 1 \ 1 - \frac{k}{x+k} & \text{if } x \geq 1 \end{cases} $$
where $k$ is roughness_k (default $\approx 0.4286$). The function is continuous at $x=1$ and saturates to 1 as $x \to \infty$.
Harmonicity Projection (Sibling Algorithm): $$ H[i] = (1-\alpha)\sum_m \left( \sum_k A[i+\log_2(k)] \right)[i-\log_2(m)] + \alpha \sum_m \left( \sum_k A[i-\log_2(k)] \right)[i+\log_2(m)] $$
Roughness Convolution: $$ R_{shape}(z) = \int A(\tau) \cdot K_{plomp}(|z-\tau|_{ERB}) d\tau $$