- Rust 73.1%
- Python 17.1%
- Shell 7.6%
- Dockerfile 2.2%
| models | ||
| scripts | ||
| src | ||
| tests | ||
| tools/export_onnx | ||
| .dockerignore | ||
| .gitattributes | ||
| .gitignore | ||
| Cargo.lock | ||
| Cargo.toml | ||
| docker-compose.yml | ||
| Dockerfile | ||
| README.md | ||
wav2small-rs
Lean local HTTP inference service for perceived vocal affect scoring with audeering/wav2small.
The service accepts raw 16 kHz mono signed 16-bit little-endian PCM, keeps one ONNX Runtime session loaded at startup, and returns arousal, dominance, and valence scores. Scores are perceived vocal affect signals, not factual emotional state.
Model
The Hugging Face repository audeering/wav2small currently publishes model.safetensors and does not expose an official .onnx or quantized .onnx file through the model API. Its reference implementation documents output order as 0=arousal, 1=dominance, 2=valence. Generate models/wav2small.onnx once with:
cd tools/export_onnx
python3 -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
python export.py --output ../../models/wav2small.onnx
Python, PyTorch, Transformers, and Librosa are only used for export. The production service does not include them.
The exporter validates the generated ONNX with ONNX Runtime and expects input input_values as float32 [batch, samples] and output scores as float32 [batch, 3].
No Hugging Face API key is required for the public audeering/wav2small model. If Hugging Face changes access controls or you use a private mirror, set the standard HF_TOKEN or HUGGINGFACE_HUB_TOKEN environment variable before running the exporter.
Licensing note: the upstream model card declares cc-by-nc-sa-4.0 and says the model is for research purpose only. Use commercially only with appropriate licensing from audEERING.
Run
cargo run --release
Defaults:
BIND_ADDR=127.0.0.1:8715
MODEL_PATH=/models/wav2small.onnx
MAX_AUDIO_SECONDS=10
ORT_NUM_THREADS=1
SESSION_POOL_SIZE=2
MAX_CONCURRENT_REQUESTS=2
RUST_LOG=wav2small_rs=info
For local development with a model in this checkout:
MODEL_PATH=models/wav2small.onnx cargo run --release
Docker
docker compose build
docker compose up -d
The compose file mounts ./models read-only at /models and binds the host port to 127.0.0.1:8715 by default. A commented LAN binding is included.
The runtime container runs as a non-root numeric user and contains no Python, PyTorch, or Rust toolchain.
Endpoints
curl -s http://127.0.0.1:8715/health
curl -s http://127.0.0.1:8715/ready
curl -s http://127.0.0.1:8715/metrics
Analyze a WAV by converting it to raw PCM on the fly:
scripts/curl_pcm.sh sample.wav
Raw PCM requests must use Content-Type: application/octet-stream.
Equivalent raw command:
ffmpeg -i sample.wav -ac 1 -ar 16000 -f s16le -acodec pcm_s16le - \
| curl -s -X POST \
-H "Content-Type: application/octet-stream" \
--data-binary @- \
http://127.0.0.1:8715/analyze_pcm_s16le
Example response:
{
"ok": true,
"model": "audeering/wav2small",
"sample_rate": 16000,
"duration_seconds": 1.25,
"inference_ms": 2.8,
"total_ms": 3.5,
"arousal": 0.72,
"dominance": 0.58,
"valence": 0.31
}
Benchmark
With the service running:
scripts/bench_pcm.sh
The script sends 0.5s, 1s, 3s, and 10s deterministic speech-like PCM clips with harmonics, envelope changes, breath noise, and short pauses. It runs 100 requests per duration by default and prints p50/p95 total latency, p50/p95 inference latency, realtime factor, and Docker RSS when available. Treat it as a repeatable service benchmark, not a substitute for latency tests on real deployment audio.
Audio Daemon Integration
Configure the daemon to downmix and resample microphone windows to 16 kHz mono s16le. POST each bounded PCM window directly to /analyze_pcm_s16le as application/octet-stream. Keep windows between 0.25s and MAX_AUDIO_SECONDS, and reuse the local service instead of launching a model process per clip.