Skip to content

NiXTheDev/regexYbot

Repository files navigation

regexYbot

A fast, efficient, and feature-rich Telegram bot built with grammY, Bun, and SQLite. It provides powerful regex-based substitution (sed style) with a focus on performance, scalability, and robust error handling.

Features

  • Sed-Style Substitution: Use s/pattern/replacement/flags commands to perform regex substitutions on messages within the chat history or on specific replies.
  • Edit Support: Edit your s/.../.../ commands, and the bot will automatically update its corresponding reply with the new substitution result.
  • High-Performance Worker Pool: Regex operations are offloaded to a pool of Bun Worker threads, ensuring the bot remains responsive even under heavy load or with complex patterns.
  • Dynamic Worker Pool V2: Optional advanced worker pool with dynamic scaling, health monitoring, and automatic idle worker termination.
  • Performance Timing: Use the p flag (e.g., s/pattern/repl/p) to measure and display the execution time of the substitution chain.
  • Regex Pattern Caching: LRU cache with TTL for compiled regex patterns, significantly improving performance for repeated patterns.
  • Per-User Rate Limiting: Configurable rate limiting to prevent spam and abuse (default: 30 commands/minute per user).
  • Health Monitoring: Real-time health metrics with automatic status detection (healthy/degraded/unhealthy).
  • Configurable Logging: Features a custom, module-based logger with configurable levels (none, debug, info, warn, error, fatal) and a customizable output template.
  • Target Protection: Prevents s/.../.../ commands from operating on other s/.../.../ command messages, avoiding unintended behavior.
  • Runtime Safety: Includes a configurable timeout (default 60 seconds) for regex execution to prevent hanging on potentially malicious or extremely slow patterns.
  • Opportunistic Cleanup: Automatically removes message history and bot reply mappings older than 48 hours on every bot update for efficiency.
  • Error Resilience: Handles Telegram API errors gracefully (e.g., "message is not modified", flood control) and avoids resending identical messages unnecessarily.
  • Custom Error Hierarchy: Granular error types with user-friendly messages (RegexError, RateLimitError, WorkerError, etc.).
  • Circuit Breaker Pattern: Prevents cascading failures by stopping requests to failing services.
  • Multi-Language Support: Full i18n with 11 languages including English, German, Spanish, Italian, Polish, Swedish, Russian, Ukrainian, Japanese, Korean, and Chinese (Simplified).
  • Grouping Support: Fully supports regex capture groups ((\w+)) and referencing them in the replacement string using $1(modern way), or \1(old regexbot, legacy way), with support for mixed syntax

Internationalization (i18n)

regexYbot supports multiple languages with automatic detection based on Telegram user settings:

Supported Languages:

  • 🇺🇸 English (default)
  • 🇩🇪 German (Deutsch)
  • 🇪🇸 Spanish (Español)
  • 🇮🇹 Italian (Italiano)
  • 🇵🇱 Polish (Polski)
  • 🇸🇪 Swedish (Svenska)
  • 🇷🇺 Russian (Русский)
  • 🇺🇦 Ukrainian (Українська)
  • 🇯🇵 Japanese (日本語)
  • 🇰🇷 Korean (한국어)
  • 🇨🇳 Chinese Simplified (简体中文)

Language Commands:

  • /language - Show your current language
  • /language list - List all available languages
  • /language set <code> - Change language (e.g., /language set de)

The bot automatically detects your language from Telegram settings. If your language isn't supported, it falls back to English. Translations are stored in the locales/ directory using the Fluent format. Contributions for new languages or improvements are welcome!

Commands

  • /start: Get a greeting message and a brief guide on how to use the bot.
  • /privacy: Displays the bot's privacy policy.
  • s/find/replace/flags: Performs a regex substitution.
    • Example: s/old/new/gi replaces all occurrences of "old" (case-insensitive) with "new".
    • Example with Groups: s/(\w+) (\w+)/$2 $1/(modern way), or s/(\w+) (\w+)/\2 \1/(old regexbot, legacy way) swaps the first two words in a message, regexy supports both modes at the same time, mixing(/$2 \1/) is supported too.
    • Example with Performance: s/complex_pattern/replacement/gip performs a global, case-insensitive substitution and prints the execution time.

Environment Variables

Configure the bot's behavior with the following environment variables:

Variable Required Description Default Value
TOKEN Yes Your Telegram bot token.
BASE_URL No Base URL for the Telegram Bot API, useful for local testing. https://api.telegram.org
LOG_LEVEL No Sets the minimum log level.
Available levels: none, debug, info, warn, error, fatal.
debug (development)
info (production)
LOG_TEMPLATE No Customizes the log output format. [{level}: {module}]: {message}
NODE_ENV No Set to production to default the log level to info.
WORKER_TIMEOUT_MS No Maximum time a regex operation can run before being terminated (milliseconds). 60000
WORKER_POOL_MIN_WORKERS No Minimum number of workers to maintain. 0
WORKER_POOL_MAX_WORKERS No Maximum number of workers allowed. 8
WORKER_POOL_INITIAL_WORKERS No Number of workers to spawn at startup. 1
WORKER_POOL_IDLE_TIMEOUT_MS No Time before idle workers are terminated. 300000
WORKER_POOL_IDLE_CHECK_INTERVAL_MS No How often to check for idle workers. 60000
GRACEFUL_DRAIN No Enable graceful drain on shutdown. Processes pending tasks instead of rejecting them. false
GRACEFUL_DRAIN_TIMEOUT_MS No Maximum time to spend draining queue during shutdown (milliseconds). Max 9500ms for Docker compatibility. 8000
MAX_CHAIN_LENGTH No Maximum number of sed commands that can be chained together. 5
MAX_MESSAGE_LENGTH No Maximum length of the bot's response message. 4096
CLEANUP_INTERVAL_MS No How often to clean up old message history (milliseconds). 172800000 (48 hours)
MAX_HISTORY_PER_CHAT No Maximum number of messages to keep in history per chat. 20
HISTORY_QUERY_LIMIT No Maximum number of messages to search when finding a target. 10
RETRY_MAX_RETRIES No Maximum number of retries for Telegram API calls. 3
RETRY_MAX_DELAY_MS No Maximum delay between retries for Telegram API calls (milliseconds). 30000
RATE_LIMIT_ENABLED No Enable per-user rate limiting to prevent spam. true
RATE_LIMIT_COMMANDS_PER_MINUTE No Maximum number of commands a user can send per minute. 30
CACHE_ENABLED No Enable LRU caching for compiled regex patterns. true
CACHE_MAX_SIZE No Maximum number of entries in the regex pattern cache. 1000
CACHE_TTL_MS No Time-to-live for cached patterns in milliseconds. 300000 (5 min)
ENABLE_FILE_HEALTHCHECK No Enable file-based healthcheck for Docker environments. false
LIVENESS_FILE No Path to the liveness file when healthcheck is enabled. /tmp/bot-alive
LIVENESS_INTERVAL_MS No How often to update the liveness file (milliseconds). 30000

Quick Start (Binary)

Pre-built binaries are available for Linux and Windows:

  1. Download the latest release from GitHub Releases

    • Linux: regexybot-linux-x64.tar.gz
    • Windows: regexybot-windows-x64.zip
  2. Extract the binary:

    # Linux
    tar -xzf regexybot-linux-x64.tar.gz
    
    # Windows (PowerShell)
    Expand-Archive regexybot-windows-x64.zip
  3. Run with your bot token:

    # Linux/macOS
    TOKEN=your_telegram_bot_token ./regexybot-linux-x64
    
    # Windows (Command Prompt)
    set TOKEN=your_telegram_bot_token
    regexybot-windows-x64.exe
    
    # Windows (PowerShell)
    $env:TOKEN="your_telegram_bot_token"
    .\regexybot-windows-x64.exe

Note: Binaries are self-contained and don't require Bun or Node.js to be installed.

Setup & Run (From Source)

  1. Ensure you have Bun installed.
  2. Clone this repository.
  3. Set your Telegram bot token in an environment variable:
    export TOKEN="YOUR_TELEGRAM_BOT_TOKEN"
  4. Run the bot from the project's root directory:
    bun index.ts

Data Persistence

Important: This bot uses an in-memory SQLite database (sqlite://:memory:) by default. This means:

  • All message history and reply mappings are ephemeral - they are lost when the bot restarts
  • The retention window (48 hours by default) is designed to support Telegram's edit window and reply-less sed behavior
  • No persistent storage is required or used
  • For production deployments, this design is intentional - the bot does not store any data permanently

If you need persistent storage (not recommended for this use case), you would need to modify index.ts to use a file-based SQLite database instead.

Project Structure

The project is organized into several modules for clarity and maintainability:

  • index.ts: The main application entry point and bot wiring. Thin composition root that orchestrates other modules.
  • config.ts: Centralized configuration with typed env var loading and validation.
  • database.ts: Database service layer with DatabaseService class for message history and reply tracking.
  • workerPool.ts: Worker pool management for concurrent regex processing.
  • sed.ts: Sed command parsing and handling logic (parseSedCommands, SedHandler).
  • hellspawn.ts: The worker script that performs the actual regex substitution in separate threads.
  • logger.ts: A custom, configurable logging utility.
  • types.ts: Contains shared TypeScript types and interfaces.
  • utils.ts: Houses shared helper functions (regex patterns, escaping, flag normalization).

Tech Stack

Branching Strategy & Releases

This project uses a two-branch workflow:

main branch (Stable)

  • Contains production-ready code
  • Merges happen from dev via pull requests
  • Docker images are tagged with:
    • release - stable release marker
    • latest - floats to most recent build (becomes stable after merge)
    • Version numbers from package.json (e.g., 0.1.7.1, 0.1.7, 0.1)
    • Git commit hash

dev branch (Development)

  • Active development happens here
  • Feature branches merge into dev
  • Docker images are tagged with:
    • dev - latest development build
    • next - upcoming release preview
    • latest - floats to most recent build (overwritten by dev activity)
    • dev-<version> - version-specific dev build (e.g., dev-0.1.7.1)

Workflow

  1. Create feature branches from dev
  2. Open PRs targeting dev
  3. When ready for release, open PR from dev to main
  4. After merging to main, Docker images are built with release tags

Docker Deployment

Graceful Shutdown

The bot supports graceful shutdown for Docker deployments:

Default Behavior (Immediate Shutdown):

  • On SIGTERM/SIGINT, immediately stops accepting updates
  • Queued tasks are rejected
  • Fast shutdown suitable for most use cases

Graceful Drain Mode (Optional): Enable with GRACEFUL_DRAIN=true to process pending tasks before shutting down:

environment:
  - GRACEFUL_DRAIN=true
  - GRACEFUL_DRAIN_TIMEOUT_MS=8000

Important considerations:

  • Graceful drain must complete within Docker's stop grace period (default: 10s)
  • Default drain timeout is 8000ms (8s) to fit within Docker's grace period
  • Maximum recommended: 9500ms (9.5s) to avoid SIGKILL
  • If queue is too large to drain in time, remaining tasks are lost
  • Useful for deployments where you don't want to lose pending operations

Adjusting Docker Grace Period: If you need more time for graceful drain, increase the container's grace period:

services:
  regexybot:
    stop_grace_period: 20s # Increase from default 10s
    environment:
      - GRACEFUL_DRAIN=true
      - GRACEFUL_DRAIN_TIMEOUT_MS=18000 # 18s (under 20s grace period)

Testing Graceful Shutdown

A test script is provided to verify graceful shutdown behavior:

cd docker
./test-graceful-shutdown.sh

This tests:

  • Immediate shutdown behavior
  • Graceful drain with pending tasks
  • Docker Compose stop/restart scenarios
  • SIGINT vs SIGTERM handling

License

MIT

About

regexbot, but in TS with grammY (vibecoded from scratch btw)

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors