Verbal flashcards API implementation by jakso535 · Pull Request #544 · zeeguu/api

jakso535 · 2026-04-18T14:11:19Z

Add backend support for Verbal Flashcards, pronunciation checking, and language-specific ASR workers

Why this Pull Request exists

This PR adds the backend required for a new spoken vocabulary exercise called Verbal Flashcards.

The new backend work does four jobs:

expose verbal flashcards as a new exercise source,
transcribe uploaded learner audio,
evaluate the transcription against the expected answer with learner-friendly tolerance,
store the final outcome inside Zeeguu's normal exercise and scheduling system.

The result is not just “speech-to-text attached to flashcards.” It is a full backend workflow for spoken practice.

High-level architecture

The implementation is split into three parts:

1. Main Zeeguu API endpoint layer

This is the part that serves flashcards, receives answers, and integrates results into the existing scheduling model.

2. ASR service client layer

This decides which ASR worker should handle a transcription request based on language.

3. Dedicated ASR worker microservice

Each worker loads a single speech-recognition model and exposes transcription over HTTP.

This separation is a strong design choice because it keeps the core application free from model-specific dependencies and makes multilingual expansion much easier.

What is included in this PR

1. A new verbal flashcards endpoint module

The main new backend module introduces the feature's business logic, API routes, matching logic, and persistence behavior.

2. Feature-flag support

The feature is gated behind a dedicated feature toggle so it can be rolled out safely.

3. An ASR client for routing by language

The API can forward audio to different workers depending on the learner's studied language.

4. A dedicated ASR worker service

A small Flask microservice loads one ASR model and handles transcription requests.

5. Gunicorn config for the worker

The worker is packaged with deployment settings for production-style serving.

6. Tests

The feature includes focused tests for:

endpoint behavior,
feature gating,
flashcard selection,
fuzzy matching,
worker routing,
error handling,
answer submission.

File-by-file explanation

`verbal_flashcards.py`

This is the main backend implementation.

Core responsibilities

Feature access control

The module checks whether the current user is allowed to use verbal flashcards. If not, it returns a feature-disabled response.

Flashcard generation

The function get_flashcard_collection(user) converts the learner's scheduled study words into spoken flashcards.

Important behavior:

only words at level 3 or above are included,
duplicate source words are removed,
each returned card includes both prompt and expected answer fields,
cards are derived from existing user words rather than from a separate flashcard database.

This is a smart choice because verbal practice is being layered onto Zeeguu's existing learning model instead of duplicating vocabulary state.

Schedule recovery for higher-level words

The helper _ensure_schedule_for_verbal_flashcard(user_word) makes sure a word has a schedule row even if it is not currently in the normal exercise pipeline.

This matters because verbal flashcards can target mature words. The feature therefore needs a way to write outcomes back into the existing scheduling system without resetting the learner's progress.

New API endpoints in `verbal_flashcards.py`

`GET /verbal_flashcards`

Returns the learner's currently available verbal flashcards.

What it supports

optional limit
optional offset
feature-gate enforcement
pagination metadata in the response

Why it matters

This is the entry point for the frontend session. It turns scheduled Zeeguu study words into a frontend-ready spoken exercise list.

`POST /verbal_flashcards/transcribe`

Accepts an uploaded audio file and returns a transcription.

Main behavior

validates that a file exists,
validates filename presence,
checks whether the feature is enabled,
reads the learner's learned language,
sends the audio to the correct ASR worker,
returns the transcribed text.

Error handling

It returns specific status codes for different failure modes:

400 for missing or invalid file input,
503 when no ASR worker is configured for the language,
502 when the worker request fails,
500 for unexpected internal errors.

`POST /verbal_flashcards/check_pronunciation`

Accepts:

user_speech
expected_text

and returns structured pronunciation analysis without storing progress.

Why this endpoint is useful

It separates evaluation from persistence.

That gives the frontend a safe way to:

show immediate feedback,
decide whether the learner should retry,
avoid writing partial attempts into the database too early.

This separation is one of the strongest architectural decisions in the PR.

`POST /verbal_flashcards/submit`

Accepts the final attempt result and records it as an exercise outcome.

Expected payload includes

flashcard_id
user_answer
is_correct
answer_source
response_time_ms
optional session_id

Important behavior

validates required fields,
resolves the flashcard for the current user,
coerces and validates timing/session data,
recalculates accuracy when a user answer is present,
allows fuzzy acceptance to override an initially false is_correct,
ensures a schedule row exists,
writes the result as a normal Zeeguu exercise outcome,
tags the exercise source as Verbal Flashcards.

This endpoint is what makes the feature part of Zeeguu's real learning pipeline rather than just a standalone pronunciation demo.

Matching and evaluation logic

ASR output is noisy, especially for foreign-language learners. If the system demanded exact string equality, the feature would feel unfair and would reject many answers that are close enough to be pedagogically useful.

This PR solves that problem by combining several layers of normalization and fuzzy matching.

`sanitize_spoken_text`

Normalizes casing, punctuation, and spacing while preserving Danish characters.

`canonical_danish_form`

Converts common alternative spellings into stable Danish written forms, such as:

aa -> å
ae -> æ
oe -> ø

`asr_tolerant_danish_form`

Applies even more permissive transformations for ASR comparison, such as:

æ -> e
ø -> o
å -> a
simplifying some initial and final consonant patterns.

This is the feature's “grace for language learners” layer.

Fuzzy algorithms included

Damerau-Levenshtein distance

Used to measure edit distance with support for insertion, deletion, substitution, and transposition.

This is useful for small ASR spelling mistakes and near-miss transcriptions.

Jaro and Jaro-Winkler similarity

Used to capture similarity in a more flexible way, especially for short strings.

Boundary-aware Jaro-Winkler

The implementation also compares reversed strings so that dropped initial sounds are not punished too harshly.

Length-aware acceptance threshold

Short words use stricter thresholds, while longer words can tolerate more variation.

That is a very sensible design choice because one-letter differences matter much more in very short words.

Word-level accuracy analysis

The evaluation does not just compare full phrases.

Instead, for each expected word it:

looks for the best unmatched spoken candidate,
computes similarity scores,
records whether the word matched,
records whether it was exact, normalized, fuzzy, or close,
builds a word-by-word result structure.

The response includes:

overall accuracy,
accepted accuracy,
accepted word count,
acceptance boolean,
feedback label,
detailed word matches.

This is what makes the frontend's word breakdown possible.

`user_feature_toggles.py`

This file adds the feature toggle for verbal_flashcards.

How the toggle works

The feature is enabled only for users whose stored invitation code appears in the environment variable:

VERBAL_FLASHCARDS_INVITE_CODES

Why this matters

This is a good rollout strategy for an experimental feature because it allows:

controlled access,
language-specific pilots,
staged testing with selected cohorts or invite codes.

`asr_service_client.py`

This file is the bridge between the main API and the ASR workers.

Main responsibilities

Parse worker mappings

It accepts configuration strings such as:

da=http://asr-da:5002
de=http://asr-de:5002
fr=http://asr-fr:5002

and converts them into a language-to-URL map.

Resolve the correct worker

Given a learner language code, it selects the correct worker URL.

Fallback local development setup

If nothing is configured, it falls back to a local Danish worker mapping.

Perform the transcription request

It sends the audio file plus language_code to the selected worker and normalizes request failures into specific exceptions.

Why this matters

This file is what turns the system from “one speech service” into a language-aware worker architecture.

`app.py` (ASR worker microservice)

This file implements the dedicated ASR worker.

Purpose

Each worker instance owns exactly one language model.

That means the main API does not run speech recognition directly. Instead, it forwards the request to the worker that owns the model for the learner's language.

Main environment variables

ASR_LANGUAGE_CODE
ASR_MODEL_NAME
ASR_WORKER_NAME
ASR_SERVICE_PORT

Model loading

At startup, the worker tries to import:

nemo.collections.asr
pydub

and then loads the configured model using ASRModel.from_pretrained(...).

Audio preprocessing

Before transcription, the worker:

reads the uploaded audio,
converts it to mono,
resamples it to 16 kHz,
exports it to a temporary WAV file,
runs transcription on that normalized file.

This is important because speech models are much more reliable when audio is normalized to the format they expect.

Worker endpoints

`GET /health`

Returns worker status, language, name, and whether the model is loaded.

`POST /transcribe`

Accepts a file upload plus optional language_code, validates that the request matches the worker's configured language, and returns a transcription.

Why this matters

This worker is small, focused, and operationally clear. It isolates model dependencies and makes future multilingual scaling straightforward.

`gunicorn.conf.py`

This file provides deployment config for the ASR worker.

What it sets

bind address
worker count
thread count
timeouts
log output
startup and ready hooks

Why it matters

This turns the worker from a dev-only Flask app into something that is ready to run as a standalone service.

`test_verbal_flashcards.py`

This file provides targeted coverage for the feature.

What the tests verify

Feature availability

users get 404 when the feature is disabled.

Flashcard generation

only level-3+ words are returned,
duplicate origin words are removed,
pagination works.

Text normalization and matching

Danish sanitization behaves correctly,
canonical spelling normalization works,
ASR-tolerant forms work,
fuzzy matching accepts common learner/ASR variants,
multi-word answers can still be accepted when word order differs.

Endpoint behavior

pronunciation checking validates required fields,
transcription endpoint returns expected data,
missing worker configuration returns 503,
invalid session ids return 400.

Answer submission

exercise outcomes are recorded,
fuzzy acceptance can override a false incoming correctness flag,
invalid response times are coerced safely,
non-eligible words return 404.

This commit includes all implementation details of the verbal flashcards feature. If issues arise, it could be because some local-config was left out. Co-Authored-By: Spon <[email protected]>

mircealungu

Review from a walkthrough of the branch. Most findings are inline below; the main blocker and a few minor items are summarized here.

🚨 Blocker — endpoint routes are never registered

zeeguu/api/endpoints/__init__.py is the module that imports every endpoint file so the @api.route(...) decorators execute. verbal_flashcards isn't imported there, so in production none of /verbal_flashcards, /verbal_flashcards/transcribe, /verbal_flashcards/submit, /verbal_flashcards/check_pronunciation exist on the running Flask app.

The tests pass only because test_verbal_flashcards.py's autouse fixture does monkeypatch.setattr("zeeguu.api.endpoints.verbal_flashcards...", ...), which force-imports the module as a side effect and runs the decorators. In production that side-effect import never happens.

Fix: add from . import verbal_flashcards to zeeguu/api/endpoints/__init__.py.

Minor items

transcribe_audio_endpoint's generic except Exception echoes str(e) back to the client, which can leak internal details (paths, stack-ish info). Log internally and return a generic "Transcription failed".
default.env — the trailing blank-line deletions are unrelated churn; worth isolating or dropping.
VERBAL_FLASHCARD_EXERCISE_SOURCE = "Verbal Flashcards" — check the existing rows in exercise_source; other sources may be snake_case and this would create a slightly inconsistent entry.

mircealungu · 2026-04-22T17:45:02Z

 de_core_news_md @ https://github.com/explosion/spacy-models/releases/download/de_core_news_md-3.7.0/de_core_news_md-3.7.0-py3-none-any.whl
 da_core_news_md @ https://github.com/explosion/spacy-models/releases/download/da_core_news_md-3.7.0/da_core_news_md-3.7.0-py3-none-any.whl
 nltk
+nemo_toolkit[asr]


This pulls torch and several GB of model-tooling dependencies into the main API image. The dedicated asr_service/ already lists nemo_toolkit[asr] in its own requirements.txt — that's where it belongs. The main API only proxies audio to the worker; it just needs requests. Please remove this line.

mircealungu · 2026-04-22T17:45:02Z


 # Database
-mysqlclient
+mysqlclient==2.2.7


This pin is unrelated to verbal flashcards. Either split it into its own PR or add a line in the PR description explaining why it's needed.

mircealungu · 2026-04-22T17:45:02Z

+logger = logging.getLogger(__name__)
+
+DEFAULT_ASR_SERVICE_TIMEOUT = float(os.environ.get("ASR_SERVICE_TIMEOUT", "30"))
+LOCAL_DEV_ASR_SERVICE_URLS = "da=http://127.0.0.1:5002"


Falling back silently to http://127.0.0.1:5002 when ASR_SERVICE_URLS is unset means a misconfigured production deploy quietly tries localhost and then fails with a 502 on connection refused. Prefer to only use this fallback in a dev context (e.g. when FLASK_ENV == 'development') and raise ASRServiceNotConfigured otherwise.

mircealungu · 2026-04-22T17:45:02Z

@@ -0,0 +1,756 @@
+import traceback


This file is 756 lines and mixes four concerns: routes, flashcard-from-scheduled-words mapping, Danish text normalization, and fuzzy matching. Consider splitting into:

zeeguu/core/verbal_flashcards/text_normalization.py (canonical + ASR-tolerant)

zeeguu/core/verbal_flashcards/fuzzy_match.py (DL + JW + score_word_match + calculate_accuracy)

zeeguu/core/verbal_flashcards/flashcard_selection.py (collection + _ensure_schedule_for_verbal_flashcard)

and keep this file as a thin route layer.

mircealungu · 2026-04-22T17:45:02Z

+FUZZY_ACCEPTANCE_BUFFER = 0.08
+
+
+def canonical_danish_form(word):


These normalizers are Danish-specific but live behind generic-sounding names — sanitize_spoken_text, score_word_match, fuzzy_match_threshold all implicitly assume Danish. When this is extended to de/fr you'll need a per-language normalizer registry. Worth setting up the abstraction now (e.g. normalizer_for(language_code)) even while only Danish is implemented, so future languages don't require refactoring every call site.

mircealungu · 2026-04-22T17:45:02Z

+
+    schedule = FourLevelsPerWord(user_word=user_word)
+    schedule.next_practice_time = datetime.now()
+    schedule.consecutive_correct_answers = 0


This commits mid-request, and then report_exercise_outcome commits again a few lines later in the submit flow. Two commits per submit leaves a window where partial state lands on exception between them. Prefer db_session.flush() here and let the downstream report_exercise_outcome commit the aggregate.

mircealungu · 2026-04-22T17:45:02Z

+    Transcribe audio by routing the request to the dedicated ASR worker that
+    owns the model for the user's learned language.
+    """
+    audio_bytes = audio_file.read()


claude says: no file size limit on the upload — audio_file.read() reads the full body into memory unconditionally. A large or malicious POST can exhaust worker memory. Either set MAX_CONTENT_LENGTH on the Flask app or validate Content-Length before reading. please check.

mircealungu · 2026-04-22T17:45:02Z

+        user, feature_gate = _current_verbal_flashcards_user()
+        if feature_gate:
+            return feature_gate
+        flashcard = _find_flashcard_for_user(user, flashcard_id)


claude says: the flashcard id is str(bookmark.id) and _find_flashcard_for_user re-runs get_flashcard_collection(user) — which calls BasicSRSchedule.user_words_to_study(user) — on every submit. If the schedule state changes between the initial GET and submit (e.g. a sibling exercise moves the word out of the 'to study' set), submit returns 404 for a card the user just practiced. Resolve by user_word_id + ownership check directly instead of rescanning the session collection. That also fixes the perf concern of re-running the scheduler query once per submit. please check.

mircealungu · 2026-04-22T17:45:02Z

+        )
+    except ValueError as exc:
+        return jsonify({"error": str(exc)}), 400
+    except Exception as exc:


Status codes between /health and /transcribe disagree: when the model fails to load, /health returns 200 with 'status': 'degraded', but /transcribe raises RuntimeError which hits this generic except Exception → 500. Return 503 when ASR_AVAILABLE is false (or asr_model is None) so orchestrators and load balancers can route around a bad pod.

mircealungu

Two follow-up inline comments: worker model-load semantics, and the Flask-config fallback in the ASR client (context: comparing against readability vs stanza conventions).

mircealungu · 2026-04-22T17:46:22Z

+
+
+bind = os.environ.get("GUNICORN_BIND", "0.0.0.0:5002")
+workers = int(os.environ.get("GUNICORN_WORKERS", "1"))


preload_app = False + workers = 1 is fine today, but the model is loaded at module import time in app.py (asr_model = ASRModel.from_pretrained(...)). With preload_app = True, gunicorn imports the app once in the master process and then forks workers — on Linux the forked children share the loaded model weights via copy-on-write, so one load serves N workers. With preload_app = False, each worker loads its own copy. If you ever bump workers above 1 for throughput, memory doubles needlessly. Flipping to preload_app = True now makes that a config change, not a code change.

mircealungu · 2026-04-22T17:46:22Z

+    """Return configured worker URLs, falling back to the local Danish worker."""
+    raw_value = os.environ.get("ASR_SERVICE_URLS", "")
+
+    if not raw_value and has_app_context():


This has_app_context() / current_app.config.get(...) branch is a third config source that matches neither existing convention in Zeeguu:

readability_server is hardcoded in zeeguu/core/content_retriever/parse_with_readability_server.py:12

stanza is env-only (STANZA_SERVICE_URL in docker-compose.yml)

Hardcoding doesn't fit here because ASR is language-sharded and the map will grow. The env-var map (ASR_SERVICE_URLS) you already use matches the stanza pattern and is the right shape. Suggest dropping this Flask-config branch so env is the single source of truth.

mircealungu

Follow-up on #14 — concrete suggestion you can apply directly.

mircealungu · 2026-04-22T17:47:10Z

+def configured_asr_service_urls():
+    """Return configured worker URLs, falling back to the local Danish worker."""
+    raw_value = os.environ.get("ASR_SERVICE_URLS", "")
+
+    if not raw_value and has_app_context():
+        raw_value = current_app.config.get("ASR_SERVICE_URLS", "")
+
+    if not raw_value:
+        raw_value = LOCAL_DEV_ASR_SERVICE_URLS
+
+    return parse_asr_service_urls(raw_value)


Matches the existing convention used by the stanza service in this codebase: it reads its URL from a single env var (STANZA_SERVICE_URL, set in docker-compose.yml) with no Flask-config fallback. The other external service — readability_server — hardcodes its URL in zeeguu/core/content_retriever/parse_with_readability_server.py and reads no env var at all. Neither of them uses current_app.config, so the branch you added here is a third pattern that nothing else in the codebase follows.

Hardcoding (readability's approach) won't work for ASR because the map has to grow per language (da → asr_da, de → asr_de, …), so env-var-as-map is the right shape. Just drop the Flask-config branch so env is the single source of truth, which mirrors stanza.

Suggested replacement for this function:

Suggested change

def configured_asr_service_urls():

"""Return configured worker URLs, falling back to the local Danish worker."""

raw_value = os.environ.get("ASR_SERVICE_URLS", "")

if not raw_value and has_app_context():

raw_value = current_app.config.get("ASR_SERVICE_URLS", "")

if not raw_value:

raw_value = LOCAL_DEV_ASR_SERVICE_URLS

return parse_asr_service_urls(raw_value)

def configured_asr_service_urls():

"""Return configured worker URLs, falling back to the local Danish worker."""

raw_value = os.environ.get("ASR_SERVICE_URLS", "") or LOCAL_DEV_ASR_SERVICE_URLS

return parse_asr_service_urls(raw_value)

Once applied, the from flask import has_app_context, current_app line at the top of the file becomes unused — remove it in the same commit.

Co-Authored-By: Spon <[email protected]>

…u-api-verbal-flashcards into verbal-flashcards

Co-Authored-By: Spon <[email protected]>

Functionality has been split into four modules, all with distinct responsibilities. The endpoint class now functions as a slimmer routing class with more general logic.

…of commits

…en scaling

…u-api-verbal-flashcards into verbal-flashcards

mircealungu · 2026-04-29T18:13:29Z

      ZEEGUU_DATA_FOLDER: /zeeguu-data/
      ZEEGUU_RESOURCES_FOLDER: /zeeguu-data/
      STANZA_SERVICE_URL: http://stanza:5001
+      ASR_SERVICE_URLS: "${ASR_SERVICE_URLS:-da=http://asr_da:5002}"


The current shape da=http://asr_da:5002 is the worst of both worlds — language is in the host name (asr_da) AND in the env-var key (da=) AND there's an explicit non-default port. Pick a direction:

Option 1: one ASR container, multiple languages. Container is named asr. Language is just a request parameter. URL becomes da=http://asr (worker listens on 80 inside the container). Adding German later is just de=http://asr.

Option 2: one container per language. Container asr_da, with asr_de joining later — fine. But drop the explicit port: let each worker listen on 80, so the URL is just da=http://asr_da.

Either way, :5002 is leaking an internal implementation detail into config. Same shape also appears in default.env:31 — fix both. Not a hard blocker, but worth deciding now while there's only one entry.

mircealungu · 2026-04-29T18:13:42Z

+def _current_verbal_flashcards_user():
+    user = User.find_by_id(flask.g.user_id)
+    return user, _ensure_verbal_flashcards_enabled(user)
+


_current_verbal_flashcards_user() returns (user, None) when the user passes the gate and (user, response_tuple) when they don't, leading every endpoint to do if feature_gate: return feature_gate. The function name promises "current user" but it's actually computing a 404 response on the side. Cleaner to split:

def _current_verbal_flashcards_user(): return User.find_by_id(flask.g.user_id)

…and call _ensure_verbal_flashcards_enabled(user) explicitly at the top of each endpoint. Less clever, easier to read.

mircealungu · 2026-04-29T18:13:52Z

+    try:
+        _ensure_request_audio_size_is_allowed()
+
+        if "file" not in request.files:
+            return json_result({"error": "No audio file provided"}), 400
+
+        audio_file = request.files["file"]
+        if audio_file.filename == "":
+            return json_result({"error": "Empty filename"}), 400
+
+        user, feature_gate = _current_verbal_flashcards_user()
+        if feature_gate:
+            return feature_gate


The feature gate runs after audio-size enforcement, file-presence check, and filename check. Cheap to flip — gate first, then validate. Two reasons:

Saves reading a possibly-large audio body for a non-allowlisted user.

Leaks "this feature exists" less to outsiders.

Same pattern in submit_answer (gate after JSON parse, line 247) and check_pronunciation (gate after JSON parse, line 306). Please move all three gates to the top of the endpoint.

mircealungu · 2026-04-29T18:14:02Z

+    except Exception as e:
+        log(f"Get flashcards error: {e}")
+        traceback.print_exc()
+        return json_result({"error": str(e)}), 500


Echoing str(e) to the client can leak ORM internals, file paths, and other implementation details. The transcribe endpoint already does this right (returns a static \"Transcription endpoint error\" on line 162). Please align the other three the same way: log with detail, respond with a generic message.

Same issue in:

submit_answer (lines 282-284)

check_pronunciation (lines 329-331)

mircealungu · 2026-04-29T18:14:15Z

+        flashcard_id = str(data.get("flashcard_id")) if data.get("flashcard_id") is not None else None
+        user_answer = data.get("user_answer", "")
+        is_correct = data.get("is_correct")
+        answer_source = data.get("answer_source", "unknown")
+        response_time = data.get("response_time_ms", 0)
+        session_id = data.get("session_id")
+
+        if not flashcard_id or is_correct is None:
+            return json_result({"error": "flashcard_id and is_correct are required"}), 400


Two small things on the flashcard_id handling here:

data.get("flashcard_id") is called twice.

The truthiness check not flashcard_id would reject flashcard_id == "0" (zero is falsy) — never an issue today since IDs are positive, but it's a footgun.

Cleaner:

flashcard_id = data.get("flashcard_id") if flashcard_id is None or is_correct is None: return json_result({"error": "flashcard_id and is_correct are required"}), 400 flashcard_id = str(flashcard_id)

…ecks, or JSON parsing

mircealungu · 2026-04-29T19:49:09Z

Forward-compatibility for a likely Whisper switch

The per-language container architecture works for today's Danish-only experiment, but it's worth noting that scaling beyond Danish probably won't mean adding more Parakeet workers — there's no parakeet-rnnt-110m for the other 14 Zeeguu languages. The realistic next step is a single multilingual model (Whisper-large or Faster-Whisper-large) covering all languages in one container.

The good news: most of this PR's design layers are forward-compatible.

✅ Main API treats ASR as a black box — only knows "language_code in, transcription out".
✅ Text normalization is a per-language registry, not hardcoded into callers.
✅ Fuzzy matching is ASR-agnostic.
✅ Domain layer (flashcard_selection, submission) doesn't know ASR exists.

The contract POST /transcribe { audio, language_code } → { transcription } is exactly right — Whisper, Parakeet, AWS Transcribe all fit it. Don't change the contract.

What does leak Parakeet/per-language assumptions and is worth tightening:

1. ASR_SERVICE_URLS shape (da=URL,de=URL,fr=URL,...) is the strongest leak. The format mandates per-language URL configuration even when there's only one backend. With Whisper you'd have to configure 15 entries pointing to the same URL, which is awkward and signals "we built around the wrong abstraction." Forward-compatible shape:

ASR_SERVICE_URL=http://asr                  # common case — one backend, all languages
ASR_LANGUAGE_OVERRIDES=da=http://asr-da     # only when you really need a per-language override

Default to single URL; per-language overrides are the exception.

2. Worker's ASR_LANGUAGE_CODE + rejection at app.py:73-77 encodes "this worker handles exactly one language." Trivial loosening: ASR_SUPPORTED_LANGUAGES (comma-separated, defaults to one). The rejection logic becomes requested_language_code not in SUPPORTED. No operational difference today, no lock-in tomorrow.

3. README in asr_service/ documenting the contract: "this is one possible ASR backend; the contract is POST /transcribe with audio + language_code, returning {transcription}". Future-you (or whoever swaps in Whisper) reads that and knows what surface to preserve.

None of these are PR-blockers. The current design ships fine for the Danish experiment. They're the kind of thing where doing them now costs ~30 minutes and doing them later (after another language is added with the wrong shape) costs hours of cleanup.

…g is configured

…exercise flow

JW is still maintained as part of diagnostics as it is an interesting metric to maybe keep an eye on in the future. But it does not measure correctness anymore

jakso535 · 2026-05-02T17:25:03Z

Verbal Flashcards Change Summary

Flashcard Response Shape

Reduced each verbal flashcard payload to the fields used by the frontend:
- id
- prompt
- answer
Removed duplicated/unused fields from the response shape, including:
- expectedText
- bookmark_id
- user_word_id
- level
- from
- to
- origin
- translation
Updated the frontend verbal flashcards page to read card.answer instead of
card.expectedText.
Removed the frontend fallback that compared spoken text against the prompt
when the answer was missing.
Changed submission lookup so the backend resolves the submitted flashcard by
bookmark directly instead of depending on the public flashcard dictionary.
Removed an unused flashcard lookup helper after confirming it had no callers.

Endpoint Safety And Error Handling

Moved verbal flashcard feature-gate checks to the top of the relevant
endpoints before parsing bodies or validating uploaded audio.
Stopped returning raw exception text to clients in verbal flashcard endpoint
errors.
Replaced public str(exc) responses with generic, stable error messages.
Kept detailed error information in server-side logs where useful.
Updated submit_answer validation so flashcard_id is read once, checked
explicitly against None, and then coerced to string.
Kept session_id validation explicit while avoiding exception echoing.

Flashcard Selection And Submission Flow

Made the level-3 word requirement configurable with
VERBAL_FLASHCARDS_REQUIRE_LEVEL_3.
Restored production default behavior to require level-3 words.
Allowed experiments to include lower-level words by setting the env var to
false.
Made the level gate consistent between flashcard selection and submission.
Replaced commented-out experiment code with the config flag.
Switched schedule creation to the canonical scheduler helper:
FourLevelsPerWord.find_or_create(db.session, user_word).
Removed hand-rolled schedule field initialization.
Changed predictable bad flashcard data handling to return None, while
avoiding blanket exception swallowing in the selection loop.

Text Normalization

Updated the per-language normalizer registry.
Made unknown normalizer languages fail loudly instead of silently using Danish
rules.
Updated tests to assert that unknown languages raise an unsupported-language
error.
Kept Danish-specific normalization isolated behind the registry.

Fuzzy Matching And Pronunciation Scoring

Reworked scoring to use Optimal String Alignment edit distance as the
acceptance criterion.
Renamed/documented the previous Damerau-Levenshtein implementation as OSA /
restricted edit distance.
Kept Jaro-Winkler as a diagnostic signal rather than part of the acceptance
decision.
Simplified the acceptance policy:
- words of length 1-2 require exact match
- words of length 3+ allow one OSA edit
Added/updated regression tests for accepted and rejected edit-distance cases.
Removed the duplicate wordAccuracy response field and kept accuracy.
Added a three-tier feedback message policy:
- all words accepted: success-style feedback
- no words accepted: "Didn't catch that, try again"
- partial match: "Very close, try again"

ASR Client Configuration

Removed Flask app-config fallback from the ASR service client.
Kept ASR configuration as environment-driven, matching the existing service
convention used elsewhere in the codebase.
Replaced deprecated FLASK_ENV local-dev detection with FLASK_DEBUG.
Parsed ASR configuration at module import instead of on every transcription
call.
Changed ASR routing config from language-map-first:
```
ASR_SERVICE_URLS=da=http://asr
```
to the forward-compatible shape:
```
ASR_SERVICE_URL=http://asr
ASR_LANGUAGE_OVERRIDES=
```
Kept per-language overrides available only for exceptions, for example:
```
ASR_LANGUAGE_OVERRIDES=da=http://asr-da
```
Added a local debug fallback to http://127.0.0.1:5002 only when
FLASK_DEBUG=1.
Updated start.py to set FLASK_DEBUG=1 for local python start.py runs so
the local ASR worker fallback works.

ASR Worker Architecture

Kept the ASR worker contract generic:
- request: audio file plus language_code
- response: { "success": true, "transcription": "..." }
Added ASR_SUPPORTED_LANGUAGES, allowing:
- one language such as da
- a comma-separated list
- * for a future multilingual worker
Changed the worker health endpoint to return HTTP 503 when the model is not
loaded.
Updated health payload from singular worker_language to
worker_languages.
Kept Docker production routing on the service name asr without leaking
:5002 into production config.
Kept direct local worker execution available on port 5002.
Unified Docker worker port handling around ASR_SERVICE_PORT.
Kept gunicorn bound to ASR_SERVICE_PORT.
Added ASR service documentation describing the worker contract and config
model.

ASR Worker Dependencies And NeMo Output

Pinned ASR worker dependencies to the versions currently verified locally:
- nemo_toolkit[asr]==2.7.3
- huggingface_hub==0.36.2
- pydub==0.25.1
Verified the pinned NeMo Danish Parakeet model returns list[Hypothesis].
Replaced shotgun transcription output parsing with an explicit parser for the
pinned output shape.
Made unexpected NeMo output shapes fail loudly instead of being guessed at.
Added ASR worker tests for the accepted Hypothesis shape and rejected legacy
shapes.

Docker And Environment

Updated docker-compose.yml so the main API uses:

ASR_SERVICE_URL=http://asr
ASR_LANGUAGE_OVERRIDES=

Updated default.env to the same config shape.
Kept the ASR service container named asr.
Kept internal container port usage at 80 for production-style Docker
routing.
Added ASR_SUPPORTED_LANGUAGES=da to the current ASR worker environment.
Confirmed docker compose config is valid after the config changes.

Tests And Verification

Restored previously commented-out tests by updating their assertions instead
of leaving coverage disabled.
Added tests for:
- lower-level experiment flashcards
- ASR config parsing and routing
- local debug ASR fallback
- ASR worker health status codes
- ASR worker supported-language checks
- ASR worker generic error responses
- NeMo output shape extraction
- OSA scoring behavior
Repeatedly verified the focused suites during the work:
- asr_service/test_app.py
- zeeguu/api/test/test_verbal_flashcards.py

Notes

The local direct ASR worker port 5002 remains only for development.
Production ASR routing should use ASR_SERVICE_URL and optional
ASR_LANGUAGE_OVERRIDES.

Co-Authored-By: Spon <[email protected]>

jakso535 · 2026-05-03T09:53:06Z

Verbal flashcards now use Meaning pairs as possible answer also. This can in some cases fix translations errors of:

"Ball" being translated to "bolden" instead of "bold", where the bookmark variant contains "bold".

Co-Authored-By: Spon <[email protected]>

mircealungu · 2026-05-05T13:42:58Z

+        texts.append(cleaned_text)
+
+
+def answer_variants_for_bookmark(bookmark):


Nice idea for the inflection case — bold / bolden sharing the cue "ball" is exactly where strict matching frustrates learners.

But the current rule (any non-INVALID Meaning with same origin-language, translation-language, and lowercased translation text) can't distinguish inflectional variants from genuine homonyms. Concrete Danish example, for an English speaker:

spring → forår (the season)

spring → fjeder (a mechanical spring)

spring → kilde (a water source)

All three would now be accepted as correct answers for the cue "spring", regardless of which sense the learner actually scheduled. If they specifically picked spring → fjeder to drill engineering vocabulary, getting credit for kilde undermines the study.

To do this reliably, I think we need a meaning-family / inflection-group relationship in the data model — something that marks bold and bolden as forms of the same lemma, while keeping fjeder and kilde as unrelated meanings that just happen to share an English cue. Matching on cue text alone over-accepts on homonyms.

Until that data exists, one safer scoping option: limit variants to the same user_word.meaning.origin.lemma (if available), or only accept variants when the edit distance between origin contents is small (which would catch bold ↔ bolden but reject fjeder ↔ kilde). Worth thinking about before this lands as the default scoring policy.

Good catch! This should be fixed now. The intermediary decision is to accept variants only if they are 2 or fewer edits from the original bookmark :)

jakso535 and others added 2 commits April 18, 2026 16:09

Verbal flashcards API implementation

f96fb42

This commit includes all implementation details of the verbal flashcards feature. If issues arise, it could be because some local-config was left out. Co-Authored-By: Spon <[email protected]>

Merge branch 'master' into verbal-flashcards

8dba6cb

mircealungu reviewed Apr 22, 2026

View reviewed changes

jakso535 and others added 20 commits April 23, 2026 08:42

Endpoints are now registered

3da8727

Co-Authored-By: Spon <[email protected]>

Merge branch 'verbal-flashcards' of https://github.com/jakso535/zeegu…

f05d5d2

…u-api-verbal-flashcards into verbal-flashcards

Removal of unessesary requirements

4a62a09

Co-Authored-By: Spon <[email protected]>

error message from transcribe_audio_endpoint is not echoed to client

ff64bff

Co-Authored-By: Spon <[email protected]>

The asr service now only falls back to localhost in dev

85f4a94

Co-Authored-By: Spon <[email protected]>

Refactor of verbal flashcards backend

63cad9a

Functionality has been split into four modules, all with distinct responsibilities. The endpoint class now functions as a slimmer routing class with more general logic.

Normalizer abstraction to make language additions easier.

e6c7ba2

ensure_schedule_for_verbal_flashcard now flushes now flushes instead …

ee4d15e

…of commits

Maximum audio file implemented

1357614

Fixed BasicSRSchedule.user_words_to_study(user) call on every submit

292815a

Improved error handling in asr service

bd166cb

preload app set to true, to minimize unessecary memory consumption wh…

e014a8a

…en scaling

removed the Flask app-config branch

364de5d

Merge branch 'master' into verbal-flashcards

b8bb576

Invite codes now work the same way as gamification

1777c5b

updated invite code

190aa1b

Disable level 3 requirement for experiments

cfff72b

Merge branch 'verbal-flashcards' of https://github.com/jakso535/zeegu…

6622c56

…u-api-verbal-flashcards into verbal-flashcards

Comment out flashcards tests that rely on level 3 condition

569e0ad

Merge branch 'verbal-flashcards' of https://github.com/jakso535/zeegu…

2e486cd

…u-api-verbal-flashcards into verbal-flashcards

mircealungu reviewed Apr 29, 2026

View reviewed changes

jakso535 added 3 commits April 29, 2026 21:19

cut 11-field card dict plus general clean up

691541f

unauthorized users now get rejected before audio-size checks, file ch…

bdf90fd

…ecks, or JSON parsing

remove str(e) client responses

2ba839b

jakso535 added 17 commits April 30, 2026 11:28

change port and start up of asr service

3aebdeb

Added specific route to make tts work locally

f77cb41

Changed feedback to feel less misleading

43606c1

get user and feature gate now split

8478d3d

improved flashcard_id handling

cc6fa30

level 3 requirement is now an env var

318b0f5

All flashcard tests work again

387866b

_verbal_flashcard_from_bookmark now explicitly returns None if nothin…

b15e318

…g is configured

Removed danish fallback. New language must now be deliberatly registered

94b5c88

Custom scheduling removed. Flashcards now use the existing canonical …

d98ad6b

…exercise flow

Scoring is now based on 1 edit allowed for length >= 3 with OSA

7a56093

JW is still maintained as part of diagnostics as it is an interesting metric to maybe keep an eye on in the future. But it does not measure correctness anymore

Removed wordAccuracy from the verbal flashcard scoring response

3f11793

asr_service_client.py now uses FLASK_DEBUG == "1"

16711ef

import-time ASR URL map

80dddd6

Pin flashcard imports. Added more explicit parsing

3f03b5a

New worker architecture, to setup unified scalable model

261ae72

Removed last cases of str(exc) in error calls

0416c1b

jakso535 and others added 4 commits May 2, 2026 21:06

small change to tts prompt

a7484e3

Fixed bug with exercise flow

d4fdae9

flashcards now use bookmark variants as answer possibilities

987d12f

Co-Authored-By: Spon <[email protected]>

padding for asr audio and microphone clipping fix

35a2c52

Co-Authored-By: Spon <[email protected]>

Feature hidden from non danish learners

86a1458

Co-Authored-By: Spon <[email protected]>

mircealungu reviewed May 5, 2026

View reviewed changes

Variants only accepted when within 2 edits of bookmark

dd0b444

		FUZZY_ACCEPTANCE_BUFFER = 0.08


		def canonical_danish_form(word):



		bind = os.environ.get("GUNICORN_BIND", "0.0.0.0:5002")
		workers = int(os.environ.get("GUNICORN_WORKERS", "1"))

		texts.append(cleaned_text)


		def answer_variants_for_bookmark(bookmark):

Conversation

jakso535 commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add backend support for Verbal Flashcards, pronunciation checking, and language-specific ASR workers

Why this Pull Request exists

High-level architecture

1. Main Zeeguu API endpoint layer

2. ASR service client layer

3. Dedicated ASR worker microservice

What is included in this PR

1. A new verbal flashcards endpoint module

2. Feature-flag support

3. An ASR client for routing by language

4. A dedicated ASR worker service

5. Gunicorn config for the worker

6. Tests

File-by-file explanation

verbal_flashcards.py

Core responsibilities

Feature access control

Flashcard generation

Schedule recovery for higher-level words

New API endpoints in verbal_flashcards.py

GET /verbal_flashcards

What it supports

Why it matters

POST /verbal_flashcards/transcribe

Main behavior

Error handling

POST /verbal_flashcards/check_pronunciation

Why this endpoint is useful

POST /verbal_flashcards/submit

Expected payload includes

Important behavior

Matching and evaluation logic

sanitize_spoken_text

canonical_danish_form

asr_tolerant_danish_form

Fuzzy algorithms included

Damerau-Levenshtein distance

Jaro and Jaro-Winkler similarity

Boundary-aware Jaro-Winkler

Length-aware acceptance threshold

Word-level accuracy analysis

user_feature_toggles.py

How the toggle works

Why this matters

asr_service_client.py

Main responsibilities

Parse worker mappings

Resolve the correct worker

Fallback local development setup

Perform the transcription request

Why this matters

app.py (ASR worker microservice)

Purpose

Main environment variables

Model loading

Audio preprocessing

Worker endpoints

GET /health

POST /transcribe

Why this matters

gunicorn.conf.py

What it sets

Why it matters

test_verbal_flashcards.py

What the tests verify

Feature availability

Flashcard generation

Text normalization and matching

Endpoint behavior

Answer submission

Uh oh!

mircealungu left a comment

Choose a reason for hiding this comment

🚨 Blocker — endpoint routes are never registered

Minor items

Uh oh!

Choose a reason for hiding this comment

jakso535 commented Apr 18, 2026 •

edited

Loading

`verbal_flashcards.py`

New API endpoints in `verbal_flashcards.py`

`GET /verbal_flashcards`

`POST /verbal_flashcards/transcribe`

`POST /verbal_flashcards/check_pronunciation`

`POST /verbal_flashcards/submit`

`sanitize_spoken_text`

`canonical_danish_form`

`asr_tolerant_danish_form`

`user_feature_toggles.py`

`asr_service_client.py`

`app.py` (ASR worker microservice)

`GET /health`

`POST /transcribe`

`gunicorn.conf.py`

`test_verbal_flashcards.py`

mircealungu Apr 22, 2026 •

edited

Loading