Lean Rust wav2small local inference service

Rust 73.1%
Python 17.1%
Shell 7.6%
Dockerfile 2.2%

Find a file

turnercore 872cf29930 Add concurrency backpressure and ONNX HTTP tests		2026-06-13 10:37:20 +02:00
models	Build wav2small Rust inference service	2026-06-11 01:14:56 +02:00
scripts	Add concurrency backpressure and ONNX HTTP tests	2026-06-13 10:37:20 +02:00
src	Add concurrency backpressure and ONNX HTTP tests	2026-06-13 10:37:20 +02:00
tests	Add concurrency backpressure and ONNX HTTP tests	2026-06-13 10:37:20 +02:00
tools/export_onnx	Verify wav2small ONNX export path	2026-06-11 01:53:40 +02:00
.dockerignore	Build wav2small Rust inference service	2026-06-11 01:14:56 +02:00
.gitattributes	Initial commit	2026-06-11 01:06:09 +02:00
.gitignore	Build wav2small Rust inference service	2026-06-11 01:14:56 +02:00
Cargo.lock	Add concurrency backpressure and ONNX HTTP tests	2026-06-13 10:37:20 +02:00
Cargo.toml	Add concurrency backpressure and ONNX HTTP tests	2026-06-13 10:37:20 +02:00
docker-compose.yml	Add concurrency backpressure and ONNX HTTP tests	2026-06-13 10:37:20 +02:00
Dockerfile	Add concurrency backpressure and ONNX HTTP tests	2026-06-13 10:37:20 +02:00
README.md	Add concurrency backpressure and ONNX HTTP tests	2026-06-13 10:37:20 +02:00

README.md

wav2small-rs

Lean local HTTP inference service for perceived vocal affect scoring with audeering/wav2small.

The service accepts raw 16 kHz mono signed 16-bit little-endian PCM, keeps one ONNX Runtime session loaded at startup, and returns arousal, dominance, and valence scores. Scores are perceived vocal affect signals, not factual emotional state.

Model

The Hugging Face repository audeering/wav2small currently publishes model.safetensors and does not expose an official .onnx or quantized .onnx file through the model API. Its reference implementation documents output order as 0=arousal, 1=dominance, 2=valence. Generate models/wav2small.onnx once with:

cd tools/export_onnx
python3 -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
python export.py --output ../../models/wav2small.onnx

Python, PyTorch, Transformers, and Librosa are only used for export. The production service does not include them.

The exporter validates the generated ONNX with ONNX Runtime and expects input input_values as float32 [batch, samples] and output scores as float32 [batch, 3].

No Hugging Face API key is required for the public audeering/wav2small model. If Hugging Face changes access controls or you use a private mirror, set the standard HF_TOKEN or HUGGINGFACE_HUB_TOKEN environment variable before running the exporter.

Licensing note: the upstream model card declares cc-by-nc-sa-4.0 and says the model is for research purpose only. Use commercially only with appropriate licensing from audEERING.

Run

cargo run --release

Defaults:

BIND_ADDR=127.0.0.1:8715
MODEL_PATH=/models/wav2small.onnx
MAX_AUDIO_SECONDS=10
ORT_NUM_THREADS=1
SESSION_POOL_SIZE=2
MAX_CONCURRENT_REQUESTS=2
RUST_LOG=wav2small_rs=info

For local development with a model in this checkout:

MODEL_PATH=models/wav2small.onnx cargo run --release

Docker

docker compose build
docker compose up -d

The compose file mounts ./models read-only at /models and binds the host port to 127.0.0.1:8715 by default. A commented LAN binding is included.

The runtime container runs as a non-root numeric user and contains no Python, PyTorch, or Rust toolchain.

Endpoints

curl -s http://127.0.0.1:8715/health
curl -s http://127.0.0.1:8715/ready
curl -s http://127.0.0.1:8715/metrics

Analyze a WAV by converting it to raw PCM on the fly:

scripts/curl_pcm.sh sample.wav

Raw PCM requests must use Content-Type: application/octet-stream.

Equivalent raw command:

ffmpeg -i sample.wav -ac 1 -ar 16000 -f s16le -acodec pcm_s16le - \
  | curl -s -X POST \
      -H "Content-Type: application/octet-stream" \
      --data-binary @- \
      http://127.0.0.1:8715/analyze_pcm_s16le

Example response:

{
  "ok": true,
  "model": "audeering/wav2small",
  "sample_rate": 16000,
  "duration_seconds": 1.25,
  "inference_ms": 2.8,
  "total_ms": 3.5,
  "arousal": 0.72,
  "dominance": 0.58,
  "valence": 0.31
}

Benchmark

With the service running:

scripts/bench_pcm.sh

The script sends 0.5s, 1s, 3s, and 10s deterministic speech-like PCM clips with harmonics, envelope changes, breath noise, and short pauses. It runs 100 requests per duration by default and prints p50/p95 total latency, p50/p95 inference latency, realtime factor, and Docker RSS when available. Treat it as a repeatable service benchmark, not a substitute for latency tests on real deployment audio.

Audio Daemon Integration

Configure the daemon to downmix and resample microphone windows to 16 kHz mono s16le. POST each bounded PCM window directly to /analyze_pcm_s16le as application/octet-stream. Keep windows between 0.25s and MAX_AUDIO_SECONDS, and reuse the local service instead of launching a model process per clip.