Skip to content

zephyraoss/chromaforge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Chromaforge

Chromaforge is a Go CLI that reconstructs the AcoustID fingerprint SQLite database used by Chromakopia. It is designed for the one-time initial build on an Azure L16s_v3 VM with local NVMe scratch space and a managed disk for the finished database.

Incremental updates are handled by Chromakopia, not this repository.

Commands

chromaforge build

  • Replays the AcoustID daily JSON update archive from https://data.acoustid.org/
  • Builds a fresh SQLite database with the libSQL driver
  • Uses a local cache directory beside --db unless --cache-dir is set
  • Places SQLite temp files under the cache path by default; --temp-dir overrides it
  • Prefetches upcoming AcoustID archive files in background download workers while replay/index work is running
  • --download-workers controls that background download concurrency
  • --gomaxprocs, --decode-workers, and --workers let you tune CPU/core usage explicitly
  • --cache-size and --mmap-size tune replay/write memory, while --index-cache-size and --index-mmap-size tune the later index-build phase
  • On first Ctrl+C, finishes the current day, saves resume progress beside --db, and exits cleanly; a second Ctrl+C aborts immediately
  • Supports --soft-heap-limit to cap SQLite heap usage for the process
  • Uses unsafe bulk-load mode with journaling disabled during replay/index builds, then finalizes the database back to WAL
  • Defers the final acoustid unique index and idx_hash until bulk inserts complete
  • Supports --skip-validate so build completion is not blocked on validation
  • Optionally rsyncs the final .db to the configured output path
  • Optionally triggers Azure VM self-deallocation

chromaforge validate

  • Verifies the final tables and indexes exist
  • Performs sampled acoustid and hash spot checks without ORDER BY RANDOM()
  • Skips PRAGMA quick_check by default for speed
  • Supports --quick-check when you want the slower SQLite consistency pass
  • Supports --full-integrity-check when you want the slowest full PRAGMA integrity_check
  • Supports --count-rows when you want exact COUNT(*) scans instead of the fast default

chromaforge backfill-metadata

  • Replays archive metadata into an existing database without rebuilding sub_fingerprints
  • Fills missing mb_id and duration values in place
  • Uses --decode-workers to parallelize fingerprint JSON decode/filter work while keeping SQLite writes sequential
  • Uses a separate resume file beside --db so interrupted backfills can continue later
  • Leaves existing fingerprint hashes and indexes intact

chromaforge match

  • Accepts a raw Chromaprint fingerprint with --fingerprint or --fingerprint-file
  • Accepts fpcalc -raw output directly, including DURATION=...
  • Uses the same sampled sub-fingerprint hashing the builder stored in SQLite
  • Applies a small duration filter by default when query duration is known
  • Returns the top local candidate matches ranked by aligned hash hits

chromaforge version

  • Prints version metadata injected at build time

Requirements

  • Go 1.24+
  • Network access to https://data.acoustid.org/
  • CGO-enabled builds

rsync is only required when using --output.

Build

go build ./cmd/chromaforge

Example:

chromaforge build \
  --db /mnt/nvme/chromakopia.db \
  --gomaxprocs 12 \
  --download-workers 12 \
  --temp-dir /mnt/nvme/.chromaforge-tmp \
  --cache-size 4294967296 \
  --mmap-size 4294967296 \
  --index-cache-size 2147483648 \
  --index-mmap-size 2147483648 \
  --workers 16 \
  --decode-workers 16 \
  --batch-size 500 \
  --skip-validate \
  --soft-heap-limit 2147483648

Azure VM example with copy + self-deallocate:

chromaforge build \
  --db /mnt/nvme/chromakopia.db \
  --output /mnt/disk/chromakopia.db \
  --gomaxprocs 12 \
  --download-workers 12 \
  --temp-dir /mnt/nvme/.chromaforge-tmp \
  --cache-size 4294967296 \
  --mmap-size 4294967296 \
  --index-cache-size 2147483648 \
  --index-mmap-size 2147483648 \
  --workers 16 \
  --decode-workers 16 \
  --batch-size 500 \
  --soft-heap-limit 2147483648 \
  --self-deallocate

Validation

chromaforge validate --db /mnt/disk/chromakopia.db

Quick check example:

chromaforge validate \
  --db /mnt/disk/chromakopia.db \
  --quick-check

Full validation example:

chromaforge validate \
  --db /mnt/disk/chromakopia.db \
  --full-integrity-check \
  --count-rows \
  --timeout 0

Metadata Backfill

chromaforge backfill-metadata \
  --db /mnt/disk/chromakopia.db \
  --gomaxprocs 32 \
  --decode-workers 32 \
  --download-workers 16

Matching

Raw fingerprint example:

chromaforge match \
  --db /mnt/disk/chromakopia.db \
  --fingerprint '123,456,789,101112'

fpcalc -raw example:

fpcalc -raw song.mp3 | chromaforge match \
  --db /mnt/disk/chromakopia.db \
  --fingerprint-file -

Disable duration filtering:

fpcalc -raw song.mp3 | chromaforge match \
  --db /mnt/disk/chromakopia.db \
  --fingerprint-file - \
  --duration-window 0

Azure Build VM

Deploy only the build path from this repo:

  1. Create the resource group.
  2. Create the managed disk that will persist chromakopia.db.
  3. Create a user-assigned managed identity for the build VM.
  4. Grant that identity Virtual Machine Contributor scoped to the VM or an appropriate parent scope.
  5. Create the L16s_v3 VM.
  6. Attach the managed disk.
  7. Paste deploy/cloud-init.yaml into the VM Custom data field.

The build VM downloads the latest chromaforge binary, mounts the managed disk and local NVMe, runs the build, copies the resulting database with rsync, and then asks Azure to deallocate the VM.

Docker

The included Dockerfile provides a reproducible build image:

docker build -t chromaforge:latest .

Notes

  • The final database contains only fingerprints and sub_fingerprints, plus idx_hash.
  • Build-time replay state is held outside the final schema.
  • track_meta-update files are ignored because title and artist are no longer stored in the database.
  • Metadata backfill and duplicate-acoustid ingest only fill missing metadata fields; they do not overwrite existing non-empty values.

License

Apache License 2.0. See LICENSE.

About

A simple way to build an AcoustID database.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors