Lean Rust wav2small local inference service
  • Rust 73.1%
  • Python 17.1%
  • Shell 7.6%
  • Dockerfile 2.2%
Find a file
2026-06-13 10:37:20 +02:00
models Build wav2small Rust inference service 2026-06-11 01:14:56 +02:00
scripts Add concurrency backpressure and ONNX HTTP tests 2026-06-13 10:37:20 +02:00
src Add concurrency backpressure and ONNX HTTP tests 2026-06-13 10:37:20 +02:00
tests Add concurrency backpressure and ONNX HTTP tests 2026-06-13 10:37:20 +02:00
tools/export_onnx Verify wav2small ONNX export path 2026-06-11 01:53:40 +02:00
.dockerignore Build wav2small Rust inference service 2026-06-11 01:14:56 +02:00
.gitattributes Initial commit 2026-06-11 01:06:09 +02:00
.gitignore Build wav2small Rust inference service 2026-06-11 01:14:56 +02:00
Cargo.lock Add concurrency backpressure and ONNX HTTP tests 2026-06-13 10:37:20 +02:00
Cargo.toml Add concurrency backpressure and ONNX HTTP tests 2026-06-13 10:37:20 +02:00
docker-compose.yml Add concurrency backpressure and ONNX HTTP tests 2026-06-13 10:37:20 +02:00
Dockerfile Add concurrency backpressure and ONNX HTTP tests 2026-06-13 10:37:20 +02:00
README.md Add concurrency backpressure and ONNX HTTP tests 2026-06-13 10:37:20 +02:00

wav2small-rs

Lean local HTTP inference service for perceived vocal affect scoring with audeering/wav2small.

The service accepts raw 16 kHz mono signed 16-bit little-endian PCM, keeps one ONNX Runtime session loaded at startup, and returns arousal, dominance, and valence scores. Scores are perceived vocal affect signals, not factual emotional state.

Model

The Hugging Face repository audeering/wav2small currently publishes model.safetensors and does not expose an official .onnx or quantized .onnx file through the model API. Its reference implementation documents output order as 0=arousal, 1=dominance, 2=valence. Generate models/wav2small.onnx once with:

cd tools/export_onnx
python3 -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
python export.py --output ../../models/wav2small.onnx

Python, PyTorch, Transformers, and Librosa are only used for export. The production service does not include them.

The exporter validates the generated ONNX with ONNX Runtime and expects input input_values as float32 [batch, samples] and output scores as float32 [batch, 3].

No Hugging Face API key is required for the public audeering/wav2small model. If Hugging Face changes access controls or you use a private mirror, set the standard HF_TOKEN or HUGGINGFACE_HUB_TOKEN environment variable before running the exporter.

Licensing note: the upstream model card declares cc-by-nc-sa-4.0 and says the model is for research purpose only. Use commercially only with appropriate licensing from audEERING.

Run

cargo run --release

Defaults:

BIND_ADDR=127.0.0.1:8715
MODEL_PATH=/models/wav2small.onnx
MAX_AUDIO_SECONDS=10
ORT_NUM_THREADS=1
SESSION_POOL_SIZE=2
MAX_CONCURRENT_REQUESTS=2
RUST_LOG=wav2small_rs=info

For local development with a model in this checkout:

MODEL_PATH=models/wav2small.onnx cargo run --release

Docker

docker compose build
docker compose up -d

The compose file mounts ./models read-only at /models and binds the host port to 127.0.0.1:8715 by default. A commented LAN binding is included.

The runtime container runs as a non-root numeric user and contains no Python, PyTorch, or Rust toolchain.

Endpoints

curl -s http://127.0.0.1:8715/health
curl -s http://127.0.0.1:8715/ready
curl -s http://127.0.0.1:8715/metrics

Analyze a WAV by converting it to raw PCM on the fly:

scripts/curl_pcm.sh sample.wav

Raw PCM requests must use Content-Type: application/octet-stream.

Equivalent raw command:

ffmpeg -i sample.wav -ac 1 -ar 16000 -f s16le -acodec pcm_s16le - \
  | curl -s -X POST \
      -H "Content-Type: application/octet-stream" \
      --data-binary @- \
      http://127.0.0.1:8715/analyze_pcm_s16le

Example response:

{
  "ok": true,
  "model": "audeering/wav2small",
  "sample_rate": 16000,
  "duration_seconds": 1.25,
  "inference_ms": 2.8,
  "total_ms": 3.5,
  "arousal": 0.72,
  "dominance": 0.58,
  "valence": 0.31
}

Benchmark

With the service running:

scripts/bench_pcm.sh

The script sends 0.5s, 1s, 3s, and 10s deterministic speech-like PCM clips with harmonics, envelope changes, breath noise, and short pauses. It runs 100 requests per duration by default and prints p50/p95 total latency, p50/p95 inference latency, realtime factor, and Docker RSS when available. Treat it as a repeatable service benchmark, not a substitute for latency tests on real deployment audio.

Audio Daemon Integration

Configure the daemon to downmix and resample microphone windows to 16 kHz mono s16le. POST each bounded PCM window directly to /analyze_pcm_s16le as application/octet-stream. Keep windows between 0.25s and MAX_AUDIO_SECONDS, and reuse the local service instead of launching a model process per clip.