LinTO-diarization

LinTO-diarization is an API for Speaker Diarization (segmenting an audio stream into homogeneous segments according to the speaker identity), with some capabilities for Speaker Identification when audio samples of known speakers are provided.

LinTO-diarization can currently work with several technologies. The following families of technologies are currently supported (please refer to respective documentation for more details):

LinTO-diarization can either be used as a standalone transcription service or deployed within a micro-services infrastructure using a message broker connector.

Benchmark

The speaker-diarization-benchmark repository benchmarks the pyannote and simple integrations in terms of accuracy (Diarization Error Rate), memory usage, and processing time.

Speaker identification

Speaker identification matches diarized speakers against reference voiceprints stored in a Qdrant vector database. It is enabled as soon as QDRANT_HOST is set (see .envdefault for related variables: QDRANT_PORT, QDRANT_API_KEY, SPEAKER_ID_MIN_SIMILARITY, SPEAKER_ID_MAX_ENROLL_DURATION, SPEAKER_ID_MIN_ENROLL_DURATION).

Multi-collection mode (recommended)

Speakers are enrolled at runtime through Celery tasks, into per-organization Qdrant collections named spkid_{organizationId}_{collectionId}:

Task	Arguments	Result
`voiceprint_compute_task`	`audio_files` (paths relative to `/opt/audio`)	`{vector, model_id, dim, duration_used, files_used}`
`speaker_upsert_task`	`collection, speaker_id, name, vector, model_id`	`{status, point_id, created_collection}`
`speaker_delete_task`	`collection, speaker_ids`	`{status, deleted}`
`collection_drop_task`	`collection`	`{status, existed}`

Identification is then requested per diarization, by passing a JSON object as speaker_names (4th argument of diarization_task, or form field of POST /diarization in HTTP mode):

{
  "collections": ["spkid_64ff…_65aa…", "spkid_64ff…_65bb…"],
  "speakers": "*",
  "minSimilarity": 0.5
}

collections (required): Qdrant collections to search;
speakers (optional, default "*"): restrict to a list of enrolled speaker ids (e.g. ["label:65cc…", "user:64dd…"]);
minSimilarity (optional): similarity threshold; defaults to SPEAKER_ID_MIN_SIMILARITY (0.5).

Identified speakers have their spk_id replaced by the enrolled name (with a spk_id_score field in speakers); unidentified speakers keep their original tag (spk1, spk2, ...).

Filesystem mode (deprecated)

The legacy enrollment mode, where reference speaker audio samples are mounted under /opt/speaker_samples (SPEAKER_SAMPLES_FOLDER) and loaded into a single collection (QDRANT_COLLECTION_NAME) at startup, is still supported but deprecated. It requires both SPEAKER_SAMPLES_FOLDER to exist and QDRANT_COLLECTION_NAME to be set. In this mode, speaker_names is a string: "*" (all enrolled speakers), "speaker1|speaker2", or a JSON list of names.

Quick test

Below are examples of how to test diarization with "simple_diarizer", on Linux OS with docker installed.

"PyAnnote" is the recommended diarization method. In what follow, you can replace "pyannote" by "simple" or "pybk" to try other methods.

HTTP Server

If you want to use speaker identification, make sure Qdrant is running. First, create a custom bridge network so the diarization container can communicate with qdrant :

docker network create diarization_network

You can start Qdrant using the following Docker command:

docker run 
    --name qdrant \
    --network diarization_network \
    -p 6333:6333 \  # Qdrant default port
    -v ./qdrant_storage:/qdrant/storage:z \
    qdrant/qdrant

If needed, build docker image

docker build . -t linto-diarization-pyannote:latest -f pyannote/Dockerfile

Launch docker container (and keep it running)

If you want to enable speaker identification, make sure to mount reference speaker audio samples to /opt/speaker_samples.

docker run -it --rm \
    --name linto-diarization \
    --network diarization_network \
    -p 8080:80 \
    -v ./data/speakers_samples:/opt/speaker_samples \ # Reference speaker samples. Enables speaker identification
    --shm-size=1gb --tmpfs /run/user/0 \
    --env SERVICE_MODE=http \
    --env QDRANT_HOST=qdrant \ # Only specify if enabling speaker identification
    --env QDRANT_PORT=6333 \ # Only specify if enabling speaker identification
    --env QDRANT_COLLECTION_NAME=speaker_embeddings \ # Only specify if enabling speaker identification
    --env QDRANT_RECREATE_COLLECTION=true \ # Only specify if enabling speaker identification
    --env SERVICE_MODE=http \
    linto-diarization-pyannote:latest

Alternatively, you can use docker-compose :

services:
  qdrant:
    image: qdrant/qdrant
    container_name: qdrant
    ports:
      - "6333:6333"  # Qdrant default port
    volumes:
      - ./qdrant_storage:/qdrant/storage:z

  diarization_app: 
    build: 
      context : .
      dockerfile: pyannote/Dockerfile
    container_name: diarization_app
    shm_size: '1gb'
    stdin_open: true
    tty: true     
    ports :
      - 8080:80
    environment:
      - QDRANT_HOST
      - QDRANT_PORT
      - QDRANT_COLLECTION_NAME
      - QDRANT_RECREATE_COLLECTION
      - SERVICE_MODE
      - SERVICE_NAME
      - SERVICES_BROKER
      - CONCURRENCY
    volumes:
      - ./data/speakers_samples:/opt/speaker_samples # Reference Speaker samples : This enables speaker identification
    depends_on:
      - qdrant  # Ensure Qdrant starts before the app
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

Run it using this command :

docker compose up

Open the swagger in a browser: http://localhost:8080/docs Unfold /diarization route and click "Try it out". Then
- Choose a file
- Specify either speaker_count (Fixed number of speaker) or max_speaker (Max number of speakers)
- Click Execute

Celery worker

In the following we assume we want to test on an audio that is in $HOME/test.wav

If needed, build docker image

docker build . -t linto-diarization-pyannote:latest -f pyannote/Dockerfile

If you want to use speaker identification, make sure Qdrant is running. You can start Qdrant using the following Docker command:

docker run 
    -p 6333:6333 \  # Qdrant default port
    -v ./qdrant_storage:/qdrant/storage:z \
    qdrant/qdrant

Run Redis server

docker run -it --rm \
    -p 6379:6379 \
    redis/redis-stack-server:latest \
    redis-server /etc/redis-stack.conf --protected-mode no --bind 0.0.0.0 --loglevel debug

Launch docker container, attaching the volume where is the audio file on which you will test

docker run -it --rm \
    -v $HOME:$HOME \
    --env SERVICE_MODE=task \
    --env SERVICE_NAME=diarization \
    --env SERVICES_BROKER=redis://172.17.0.1:6379 \
    --env BROKER_PASS= \
    --env CONCURRENCY=2 \
    --env QDRANT_HOST=localhost \
    --env QDRANT_PORT=6333 \
    --env QDRANT_COLLECTION_NAME=speaker_embeddings \
    --env QDRANT_RECREATE_COLLECTION=true \
    linto-diarization-pyannote:latest

Testing with a given audio file can be done using python3 (with packages celery and redis installed). For example with the following command for the file $HOME/test.wav with 2 speakers

pip3 install redis celery # if not installed yet

python3 -c "\
import celery; \
import os; \
worker = celery.Celery(broker='redis://localhost:6379/0', backend='redis://localhost:6379/1'); \
print(worker.send_task('diarization_task', (os.environ['HOME']+'/test.wav', 2, None), queue='diarization').get());\
"

License

This project is developped under the AGPLv3 License (see LICENSE).

The diarization backends bundle third-party pretrained models distributed under their own licenses. In particular, the PyAnnote backend uses pyannote/speaker-diarization-community-1 (licensed under CC BY 4.0); see pyannote/README.md for attribution details.

Name		Name	Last commit message	Last commit date
Latest commit History 320 Commits
.github/workflows		.github/workflows
celery_app		celery_app
document		document
http_server		http_server
identification		identification
pyannote		pyannote
pybk		pybk
simple		simple
test		test
.envdefault		.envdefault
.gitignore		.gitignore
Jenkinsfile		Jenkinsfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
docker-entrypoint.sh		docker-entrypoint.sh
healthcheck.sh		healthcheck.sh
wait-for-it.sh		wait-for-it.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LinTO-diarization

Benchmark

Speaker identification

Multi-collection mode (recommended)

Filesystem mode (deprecated)

Quick test

HTTP Server

Celery worker

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

LinTO-diarization

Benchmark

Speaker identification

Multi-collection mode (recommended)

Filesystem mode (deprecated)

Quick test

HTTP Server

Celery worker

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages