regexYbot

A fast, efficient, and feature-rich Telegram bot built with grammY, Bun, and SQLite. It provides powerful regex-based substitution (sed style) with a focus on performance, scalability, and robust error handling.

Features

Sed-Style Substitution: Use s/pattern/replacement/flags commands to perform regex substitutions on messages within the chat history or on specific replies.
Edit Support: Edit your s/.../.../ commands, and the bot will automatically update its corresponding reply with the new substitution result.
High-Performance Worker Pool: Regex operations are offloaded to a pool of Bun Worker threads, ensuring the bot remains responsive even under heavy load or with complex patterns.
Dynamic Worker Pool V2: Optional advanced worker pool with dynamic scaling, health monitoring, and automatic idle worker termination.
Performance Timing: Use the p flag (e.g., s/pattern/repl/p) to measure and display the execution time of the substitution chain.
Regex Pattern Caching: LRU cache with TTL for compiled regex patterns, significantly improving performance for repeated patterns.
Per-User Rate Limiting: Configurable rate limiting to prevent spam and abuse (default: 30 commands/minute per user).
Health Monitoring: Real-time health metrics with automatic status detection (healthy/degraded/unhealthy).
Configurable Logging: Features a custom, module-based logger with configurable levels (none, debug, info, warn, error, fatal) and a customizable output template.
Target Protection: Prevents s/.../.../ commands from operating on other s/.../.../ command messages, avoiding unintended behavior.
Runtime Safety: Includes a configurable timeout (default 60 seconds) for regex execution to prevent hanging on potentially malicious or extremely slow patterns.
Opportunistic Cleanup: Automatically removes message history and bot reply mappings older than 48 hours on every bot update for efficiency.
Error Resilience: Handles Telegram API errors gracefully (e.g., "message is not modified", flood control) and avoids resending identical messages unnecessarily.
Custom Error Hierarchy: Granular error types with user-friendly messages (RegexError, RateLimitError, WorkerError, etc.).
Circuit Breaker Pattern: Prevents cascading failures by stopping requests to failing services.
Multi-Language Support: Full i18n with 11 languages including English, German, Spanish, Italian, Polish, Swedish, Russian, Ukrainian, Japanese, Korean, and Chinese (Simplified).
Grouping Support: Fully supports regex capture groups ((\w+)) and referencing them in the replacement string using $1(modern way), or \1(old regexbot, legacy way), with support for mixed syntax

Internationalization (i18n)

regexYbot supports multiple languages with automatic detection based on Telegram user settings:

Supported Languages:

🇺🇸 English (default)
🇩🇪 German (Deutsch)
🇪🇸 Spanish (Español)
🇮🇹 Italian (Italiano)
🇵🇱 Polish (Polski)
🇸🇪 Swedish (Svenska)
🇷🇺 Russian (Русский)
🇺🇦 Ukrainian (Українська)
🇯🇵 Japanese (日本語)
🇰🇷 Korean (한국어)
🇨🇳 Chinese Simplified (简体中文)

Language Commands:

/language - Show your current language
/language list - List all available languages
/language set <code> - Change language (e.g., /language set de)

The bot automatically detects your language from Telegram settings. If your language isn't supported, it falls back to English. Translations are stored in the locales/ directory using the Fluent format. Contributions for new languages or improvements are welcome!

Commands

/start: Get a greeting message and a brief guide on how to use the bot.
/privacy: Displays the bot's privacy policy.
s/find/replace/flags: Performs a regex substitution.
- Example: s/old/new/gi replaces all occurrences of "old" (case-insensitive) with "new".
- Example with Groups: s/(\w+) (\w+)/$2 $1/(modern way), or s/(\w+) (\w+)/\2 \1/(old regexbot, legacy way) swaps the first two words in a message, regexy supports both modes at the same time, mixing(/$2 \1/) is supported too.
- Example with Performance: s/complex_pattern/replacement/gip performs a global, case-insensitive substitution and prints the execution time.

Environment Variables

Configure the bot's behavior with the following environment variables:

Variable	Required	Description	Default Value
`TOKEN`	Yes	Your Telegram bot token.	—
`BASE_URL`	No	Base URL for the Telegram Bot API, useful for local testing.	`https://api.telegram.org`
`LOG_LEVEL`	No	Sets the minimum log level. Available levels: `none`, `debug`, `info`, `warn`, `error`, `fatal`.	`debug` (development) `info` (production)
`LOG_TEMPLATE`	No	Customizes the log output format.	`[{level}: {module}]: {message}`
`NODE_ENV`	No	Set to `production` to default the log level to `info`.	—
`WORKER_TIMEOUT_MS`	No	Maximum time a regex operation can run before being terminated (milliseconds).	60000
`WORKER_POOL_MIN_WORKERS`	No	Minimum number of workers to maintain.	0
`WORKER_POOL_MAX_WORKERS`	No	Maximum number of workers allowed.	8
`WORKER_POOL_INITIAL_WORKERS`	No	Number of workers to spawn at startup.	1
`WORKER_POOL_IDLE_TIMEOUT_MS`	No	Time before idle workers are terminated.	300000
`WORKER_POOL_IDLE_CHECK_INTERVAL_MS`	No	How often to check for idle workers.	60000
`GRACEFUL_DRAIN`	No	Enable graceful drain on shutdown. Processes pending tasks instead of rejecting them.	`false`
`GRACEFUL_DRAIN_TIMEOUT_MS`	No	Maximum time to spend draining queue during shutdown (milliseconds). Max 9500ms for Docker compatibility.	8000
`MAX_CHAIN_LENGTH`	No	Maximum number of sed commands that can be chained together.	5
`MAX_MESSAGE_LENGTH`	No	Maximum length of the bot's response message.	4096
`CLEANUP_INTERVAL_MS`	No	How often to clean up old message history (milliseconds).	172800000 (48 hours)
`MAX_HISTORY_PER_CHAT`	No	Maximum number of messages to keep in history per chat.	20
`HISTORY_QUERY_LIMIT`	No	Maximum number of messages to search when finding a target.	10
`RETRY_MAX_RETRIES`	No	Maximum number of retries for Telegram API calls.	3
`RETRY_MAX_DELAY_MS`	No	Maximum delay between retries for Telegram API calls (milliseconds).	30000
`RATE_LIMIT_ENABLED`	No	Enable per-user rate limiting to prevent spam.	`true`
`RATE_LIMIT_COMMANDS_PER_MINUTE`	No	Maximum number of commands a user can send per minute.	30
`CACHE_ENABLED`	No	Enable LRU caching for compiled regex patterns.	`true`
`CACHE_MAX_SIZE`	No	Maximum number of entries in the regex pattern cache.	1000
`CACHE_TTL_MS`	No	Time-to-live for cached patterns in milliseconds.	300000 (5 min)
`ENABLE_FILE_HEALTHCHECK`	No	Enable file-based healthcheck for Docker environments.	`false`
`LIVENESS_FILE`	No	Path to the liveness file when healthcheck is enabled.	`/tmp/bot-alive`
`LIVENESS_INTERVAL_MS`	No	How often to update the liveness file (milliseconds).	30000

Quick Start (Binary)

Pre-built binaries are available for Linux and Windows:

Download the latest release from GitHub Releases
- Linux: regexybot-linux-x64.tar.gz
- Windows: regexybot-windows-x64.zip

Extract the binary:

# Linux
tar -xzf regexybot-linux-x64.tar.gz

# Windows (PowerShell)
Expand-Archive regexybot-windows-x64.zip

Run with your bot token:

# Linux/macOS
TOKEN=your_telegram_bot_token ./regexybot-linux-x64

# Windows (Command Prompt)
set TOKEN=your_telegram_bot_token
regexybot-windows-x64.exe

# Windows (PowerShell)
$env:TOKEN="your_telegram_bot_token"
.\regexybot-windows-x64.exe

Note: Binaries are self-contained and don't require Bun or Node.js to be installed.

Setup & Run (From Source)

Ensure you have Bun installed.
Clone this repository.
Set your Telegram bot token in an environment variable:
```
export TOKEN="YOUR_TELEGRAM_BOT_TOKEN"
```
Run the bot from the project's root directory:
```
bun index.ts
```

Data Persistence

Important: This bot uses an in-memory SQLite database (sqlite://:memory:) by default. This means:

All message history and reply mappings are ephemeral - they are lost when the bot restarts
The retention window (48 hours by default) is designed to support Telegram's edit window and reply-less sed behavior
No persistent storage is required or used
For production deployments, this design is intentional - the bot does not store any data permanently

If you need persistent storage (not recommended for this use case), you would need to modify index.ts to use a file-based SQLite database instead.

Project Structure

The project is organized into several modules for clarity and maintainability:

index.ts: The main application entry point and bot wiring. Thin composition root that orchestrates other modules.
config.ts: Centralized configuration with typed env var loading and validation.
database.ts: Database service layer with DatabaseService class for message history and reply tracking.
workerPool.ts: Worker pool management for concurrent regex processing.
sed.ts: Sed command parsing and handling logic (parseSedCommands, SedHandler).
hellspawn.ts: The worker script that performs the actual regex substitution in separate threads.
logger.ts: A custom, configurable logging utility.
types.ts: Contains shared TypeScript types and interfaces.
utils.ts: Houses shared helper functions (regex patterns, escaping, flag normalization).

Tech Stack

grammY: Modern Telegram Bot Framework.
@grammyjs/runner: For concurrent update processing.
@grammyjs/commands: For structured command handling.
Bun: High-performance JavaScript runtime.
bun:sqlite: Bun's native, fast SQLite driver.
Bun Worker API: For parallel, non-blocking regex execution.

Branching Strategy & Releases

This project uses a two-branch workflow:

`main` branch (Stable)

Contains production-ready code
Merges happen from dev via pull requests
Docker images are tagged with:
- release - stable release marker
- latest - floats to most recent build (becomes stable after merge)
- Version numbers from package.json (e.g., 0.1.7.1, 0.1.7, 0.1)
- Git commit hash

`dev` branch (Development)

Active development happens here
Feature branches merge into dev
Docker images are tagged with:
- dev - latest development build
- next - upcoming release preview
- latest - floats to most recent build (overwritten by dev activity)
- dev-<version> - version-specific dev build (e.g., dev-0.1.7.1)

Workflow

Create feature branches from dev
Open PRs targeting dev
When ready for release, open PR from dev to main
After merging to main, Docker images are built with release tags

Docker Deployment

Graceful Shutdown

The bot supports graceful shutdown for Docker deployments:

Default Behavior (Immediate Shutdown):

On SIGTERM/SIGINT, immediately stops accepting updates
Queued tasks are rejected
Fast shutdown suitable for most use cases

Graceful Drain Mode (Optional): Enable with GRACEFUL_DRAIN=true to process pending tasks before shutting down:

environment:
  - GRACEFUL_DRAIN=true
  - GRACEFUL_DRAIN_TIMEOUT_MS=8000

Important considerations:

Graceful drain must complete within Docker's stop grace period (default: 10s)
Default drain timeout is 8000ms (8s) to fit within Docker's grace period
Maximum recommended: 9500ms (9.5s) to avoid SIGKILL
If queue is too large to drain in time, remaining tasks are lost
Useful for deployments where you don't want to lose pending operations

Adjusting Docker Grace Period: If you need more time for graceful drain, increase the container's grace period:

services:
  regexybot:
    stop_grace_period: 20s # Increase from default 10s
    environment:
      - GRACEFUL_DRAIN=true
      - GRACEFUL_DRAIN_TIMEOUT_MS=18000 # 18s (under 20s grace period)

Testing Graceful Shutdown

A test script is provided to verify graceful shutdown behavior:

cd docker
./test-graceful-shutdown.sh

This tests:

Immediate shutdown behavior
Graceful drain with pending tasks
Docker Compose stop/restart scenarios
SIGINT vs SIGTERM handling

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 210 Commits
.githooks		.githooks
.github		.github
docker		docker
locales		locales
src		src
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.env.example		.env.example
.gitignore		.gitignore
.npmrc		.npmrc
.prettierignore		.prettierignore
.prettierrc		.prettierrc
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
TESTING.md		TESTING.md
bun.lock		bun.lock
bunfig.toml		bunfig.toml
eslint.config.js		eslint.config.js
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

regexYbot

Features

Internationalization (i18n)

Commands

Environment Variables

Quick Start (Binary)

Setup & Run (From Source)

Data Persistence

Project Structure

Tech Stack

Branching Strategy & Releases

`main` branch (Stable)

`dev` branch (Development)

Workflow

Docker Deployment

Graceful Shutdown

Testing Graceful Shutdown

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

regexYbot

Features

Internationalization (i18n)

Commands

Environment Variables

Quick Start (Binary)

Setup & Run (From Source)

Data Persistence

Project Structure

Tech Stack

Branching Strategy & Releases

main branch (Stable)

dev branch (Development)

Workflow

Docker Deployment

Graceful Shutdown

Testing Graceful Shutdown

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`main` branch (Stable)

`dev` branch (Development)

Packages