A robust, flexible, and lightweight open source User & Entity Behavior Analytics (UEBA) framework used for Security Analytics. Developed with luv by Data Scientists & Security Analysts from the Cyber Security Industry.
- Problem
- Solution
- Architecture
- Tech Stack
- Features
- Rule Canvas
- Model Library
- Model Execution Sandbox
- Authentication and Access Control
- LLM Assistant
- Getting Started
- Development
- Makefile Reference
- Testing
- White Paper
- Community
- License
Many UBA platforms typically use a "black box" approach to data science practices, which may work best for security analysts who are not interested in the nuts and bolts of the underlying models being used to generate anomalies, baselines, and cases. These platforms view their models as IP.
OpenUBA takes an "open-model" approach, and is designed for the small subset of security analysts who have authentic curiosity about what models are doing, and how they work under the hood. We believe in the scientific computing community, and its contributions over the years (libraries, toolkits, etc). In security, rule/model transparency is key, for compliance, response/investigation, and decision making.
OpenUBA also makes use of a community-driven marketplace for models, similar to a plugin-store, where plugins are security models. This marketplace is where users of OpenUBA can install security models for their own use cases. Model developers can also upload their models, enabling other users to reuse them, whether for free or compensation -- the choice is up to the model developer to make.
OpenUBA v0.0.2 is a Kubernetes-native platform with a modular, cloud-native architecture. All components are containerized and deployable to a Kind cluster for development or a production Kubernetes cluster. The system is designed to remain lightweight -- no always-on per-model services, no heavy pipeline orchestrators, just the minimum infrastructure needed to run security analytics at scale.
| Layer | Description |
|---|---|
| Frontend | Next.js 14 React application with TailwindCSS, shadcn/ui components, and real-time GraphQL subscriptions |
| Backend API | FastAPI application exposing REST endpoints with JWT authentication, model orchestration, rule engine, and scheduling |
| GraphQL | PostGraphile auto-generates a full GraphQL API from the PostgreSQL schema, enabling subscriptions and efficient querying |
| Operator | Custom Kubernetes operator (Kopf) watches UBATraining and UBAInference CRDs and creates ephemeral Jobs |
| Data Layer | PostgreSQL (system of record), Elasticsearch (search/analytics), Apache Spark (distributed compute), backed by Persistent Volumes |
| Execution Plane | Ephemeral K8s Jobs using framework-specific Docker images (sklearn, pytorch, tensorflow, networkx) for JIT model training and inference |
| Component | Technology |
|---|---|
| Framework | Next.js 14.0.4 (App Router) |
| Language | TypeScript 5.3 |
| UI System | TailwindCSS 3.4, Radix UI primitives, class-variance-authority |
| Data Layer | Apollo Client 3.8 (GraphQL), Axios 1.6 (REST) |
| Real-time | GraphQL subscriptions via graphql-ws 5.14 |
| Charts | Recharts 3.5 |
| Rule Canvas | @xyflow/react 12.10 (flow-based node editor) |
| State | Zustand 4.5 (UI state), Apollo cache (server state) |
| Markdown | react-markdown 10.1, react-syntax-highlighter 16.1 |
| Command Palette | cmdk 0.2 |
| Icons | lucide-react 0.309 |
| Component | Technology |
|---|---|
| Framework | FastAPI 0.104 (Uvicorn 0.24 ASGI) |
| Language | Python 3.9 (typed, Pydantic 2.5) |
| ORM | SQLAlchemy 2.0.23 |
| Auth | JWT (python-jose 3.3), bcrypt via passlib 1.7 |
| Scheduling | APScheduler 3.10 |
| GraphQL | PostGraphile (auto-schema from PostgreSQL) |
| Data Engines | PySpark 3.5, Elasticsearch client 8.11 |
| Container Clients | docker-py 6.1, kubernetes-client 28.1 |
| Component | Technology |
|---|---|
| Database | PostgreSQL 15 (Alpine) |
| Search | Elasticsearch 8.11.0 |
| Compute | Apache Spark 3.5.0 (Master + Worker) |
| Orchestration | Kubernetes (Kind for dev, any cluster for prod) |
| Operator | Custom OpenUBA Operator (Kopf, Python) |
| Containers | Docker (framework-specific model runner images) |
| Node.js Runtime | Node 18 (Alpine, multi-stage frontend build) |
| Framework | Runner Image | Serialization |
|---|---|---|
| scikit-learn | model-runner:sklearn |
joblib |
| PyTorch | model-runner:pytorch |
torch.save |
| TensorFlow / Keras | model-runner:tensorflow |
SavedModel |
| NetworkX | model-runner:networkx |
pickle |
- Model management with full lifecycle (install, train, infer)
- Model library with community and internally driven models
- Multi-registry support (GitHub, OpenUBA Hub, HuggingFace, Kubeflow, local filesystem)
- Model version control and artifact tracking
- Feedback loop for continuous model training
- "Shadow mode" for model and risk score experimentation
- Cryptographic hash verification at install and before every execution
- Framework-agnostic: supports sklearn, PyTorch, TensorFlow, Keras, NetworkX, Spark MLlib, and more
- "White-box" model standard -- every model is inspectable and auditable
- Threshold-based and deviation-based detection rules
- Flow-graph rule logic with visual canvas for building complex rule circuits
- Rules compose model outputs with logical operators, serialized deterministically to the database
- Rule-triggered alerts linked to anomalies and cases
- Alerts can be enabled or disabled per-rule
- Modern Next.js + shadcn/ui interface with dark mode default
- Real-time updates via GraphQL subscriptions
- Global time range selector, command palette, and keyboard navigation
- Modular components with responsive layout
- Pages: Home, Models, Anomalies, Cases, Data, Entities, Rules, Alerts, Schedules, Settings, Users
- JWT authentication with role-based access control (admin, manager, triage, analyst)
- Per-page granular permissions (read/write) configurable by admins
- Persistent notifications system
- Audit logging for compliance
- Case management with anomaly linking and timeline
- Anomaly detection result browsing, filtering, and acknowledgment
- Entity management and risk tracking
- Data source management with ingestion status monitoring
- SIEM-agnostic architecture with flexible dataset support
- Integrated LLM assistant for contextual analysis
- Alerting and notification system
- Cron-based scheduling for automated model execution
OpenUBA includes a visual flow-based rule builder for creating detection logic. Rules compose model outputs with logical operators on an interactive canvas, similar to tools like n8n or Node-RED but purpose-built for security analytics. Analysts can wire together registered models, define threshold conditions, and chain logical gates to express complex detection criteria -- all without writing code.
Each rule's flow graph is serialized deterministically into the database as a structured JSON object, making rules fully reproducible, version-trackable, and auditable. When a rule's conditions are met, it fires an alert that can be linked to anomalies and cases.
OpenUBA implements a model library and marketplace for hosting "ready-to-use" security models, both developed by the core team and the community. The official model catalog is served from openuba.org/registry/models.json, backed by the openuba-model-hub repository. Developers can also host their own model registries or install models from any GitHub repository or local filesystem.
The library tab in the dashboard lets analysts browse, search, inspect, and install models with a single click. Clicking a model opens a detail modal showing its metadata, parameters, tags, dependencies, and full source code -- fetched directly from GitHub. Installation downloads the model files, verifies their integrity, writes them to the model library on disk, and registers them in PostgreSQL.
| Model | Framework | Description |
|---|---|---|
basic_model |
Python | Baseline example model for getting started |
model_sklearn |
scikit-learn | Isolation Forest anomaly detection |
model_pytorch |
PyTorch | Neural network-based behavior analysis |
model_tensorflow |
TensorFlow | Deep learning behavior model |
model_keras |
Keras | High-level API behavior model |
model_networkx |
NetworkX | Graph-based entity relationship analysis |
model_1 |
Python | General-purpose analytics model |
Models follow a simple Python interface. No heavy SDKs or complex pipeline definitions required -- model authors write straightforward Python logic using familiar libraries:
class Model:
def train(self, ctx):
# Train model, return summary
...
def infer(self, ctx):
# Run inference, return risk scores as DataFrame
...Each model package is a directory containing a MODEL.py, an optional model.yaml manifest, and an optional requirements.txt. The runner handles all I/O, database access, and framework-specific serialization (joblib for sklearn, torch.save for PyTorch, SavedModel for TensorFlow).
The model registry uses a pluggable adapter pattern. Each adapter implements model discovery, listing, and downloading for its backend:
| Adapter | Source | Description |
|---|---|---|
| OpenUBA Hub | openuba.org |
Official model catalog with cached JSON registry (5-min TTL) |
| GitHub | Any repo | Clone and install models from GitHub repositories |
| Local Filesystem | model_library/ |
Scan locally installed models |
| HuggingFace | HF Hub | Model hub API integration (planned) |
Models can consume data from multiple sources through built-in data loader modules:
| Loader | Module | Description |
|---|---|---|
| Local CSV | local_pandas |
Reads CSV files via pandas |
| Elasticsearch | es |
Queries Elasticsearch indices |
| Spark | spark |
Distributed data via PySpark |
| Source Groups | source_group |
Aggregated multi-source loading |
Every model execution runs inside an isolated Docker container or Kubernetes Job, separate from the main API. This provides:
- Security -- untrusted model code cannot compromise the core system
- Isolation -- each model gets its own environment with the right dependencies
- Reliability -- a misbehaving model is contained; resource limits prevent it from exhausting system resources
- Scalability -- multiple models can run in parallel as separate K8s Jobs
No long-lived per-model services. Every training and inference run is an ephemeral Job that spins up, executes, writes results, and exits. The only long-lived pieces are the operator, the backend, and the database.
The custom OpenUBA operator watches for UBATraining and UBAInference custom resources and creates Kubernetes Jobs with the appropriate framework-specific runner image. Input and output data flows through shared Persistent Volumes.
OpenUBA v0.0.2 includes a complete authentication and role-based access control system:
| Role | Access |
|---|---|
| Admin | Full read/write access to all pages, user management, permission configuration |
| Manager | Read access to all pages |
| Triage | Home, rules, alerts, entities, cases only |
| Analyst | Home, data, models (read/write), rules (read/write), alerts, entities (read/write), anomalies (read/write) |
Default credentials: openuba / password (admin). Change immediately after first login.
An always-available LLM chat overlay is built into the interface. It supports multiple providers:
| Provider | Type |
|---|---|
| Ollama | Local (default) |
| OpenAI | Cloud API |
| Claude | Cloud API |
| Gemini | Cloud API |
The assistant is context-aware -- it sees the current route, selected entities, and active filters. It can be toggled, dragged, and resized. Conversation history persists across page navigation. Configure providers under Settings > Integrations.
| Requirement | Version |
|---|---|
| Docker | 20.10+ |
| kubectl | 1.25+ |
| Kind | 0.20+ |
| Node.js | 18+ |
| Python | 3.10+ |
| Make | any |
The single command to build everything from scratch -- creates a Kind cluster, builds all Docker images, deploys all Kubernetes resources, initializes the database, ingests test data, and launches port-forwarding in separate terminal tabs:
make reset-devThis is the go-to command for development. It tears down any existing cluster and stands up a clean environment end-to-end. Once complete, three terminal tabs will open automatically:
| Tab | Purpose | URL |
|---|---|---|
| Hybrid Networking | Port-forwards all K8s services to localhost | -- |
| Local Backend | Runs the FastAPI backend with hot-reload | http://localhost:8000 |
| Local Frontend | Runs the Next.js dev server with hot-reload | http://localhost:3000 |
Log in with openuba / password.
- Deletes any existing Kind cluster
- Cleans up old Docker images
- Creates a new Kind cluster from
configs/local.yaml - Builds all container images (backend, frontend, operator, base runner, sklearn, pytorch, tensorflow, networkx)
- Loads images into the Kind cluster
- Deploys all Kubernetes manifests (namespace, secrets, persistent volumes, Postgres, PostGraphile, Spark, Elasticsearch, backend, frontend, operator, CRDs, ingress)
- Waits for pods to become ready
- Triggers initial data ingestion (
toy_1dataset into Spark and Elasticsearch) - Opens three terminal tabs for port-forwarding, backend, and frontend
If you prefer running everything inside the cluster (no local backend/frontend):
make create-local-cluster
make k8s-deploy
make k8s-forwardAccess the application at http://localhost:3000.
Run backend and frontend locally against a local Postgres (no Kubernetes):
make dev-postgres
make setup-backend
make dev-install-frontend
make dev| Service | Local Port |
|---|---|
| Frontend | 3000 |
| Backend API | 8000 |
| PostgreSQL | 5432 |
| PostGraphile (GraphQL) | 5001 |
| Spark Master | 7077 (UI: 8080) |
| Elasticsearch | 9200 |
During development, you will frequently need to rebuild and restart individual services after code changes. These commands rebuild the Docker image, load it into the Kind cluster, and trigger a rolling restart:
# rebuild and restart the backend pod
make dev-restart-backend
# rebuild and restart the frontend pod
make dev-restart-frontend
# re-apply CRDs, RBAC, and restart the operator
make dev-restart-operatorFor local (non-K8s) development:
make dev-restart-backend-local
make dev-restart-frontend-localmake build-backend # backend api server
make build-frontend # next.js frontend
make build-operator # kubernetes operator
make build-runner-base # model runner base image
make build-runner-sklearn # sklearn runner
make build-runner-torch # pytorch runner
make build-runner-tf # tensorflow runner
make build-runner-networkx # networkx runner
make build-containers # all of the abovemake k8s-logs-backend
make k8s-logs-frontend
make k8s-logs-spark
make k8s-logs-elasticsearch
make k8s-logs-postgraphile
make k8s-logs-all # all services simultaneously
make watch-pods # live pod statusmake init-db-k8s # initialize schema in the K8s Postgres pod
make init-db-local # initialize schema against local Postgres
make redeploy-db # full Postgres redeploy with fresh schemamake k8s-init-data # ingest toy_1 dataset into Spark and ElasticsearchThe test_datasets/toy_1/ directory contains real-world subsets of SSH, DNS, DHCP, and proxy logs. This dataset is treated as immutable -- it should never be modified. In production, users connect OpenUBA to their existing Spark or Elasticsearch clusters that already contain their datasets.
make dev-hybrid # infrastructure only (Postgres, Spark, ES, PostGraphile)
make k8s-forward # everything including backend and frontendmake delete-local-cluster # delete the Kind cluster
make clean-docker # prune unused Docker resources
make clean-all # delete cluster + prune Docker
make clean-logs # remove local log files
make dev-stop # stop local Postgres containerEvery command in OpenUBA is run through the Makefile. Below is the complete reference:
| Target | Description |
|---|---|
reset-dev |
Full reset -- deletes cluster, rebuilds everything, deploys, and launches dev tabs |
create-infra |
Runs scripts/start-dev.sh (cluster + build + deploy + tabs) |
create-local-cluster |
Creates the Kind cluster from configs/local.yaml |
delete-local-cluster |
Deletes the Kind cluster |
| Target | Description |
|---|---|
build-containers |
Builds all Docker images (backend, frontend, operator, all runners) |
build-backend |
Builds the backend image |
build-frontend |
Builds the frontend image |
build-operator |
Builds the operator image |
build-model-runner |
Builds base + all framework runner images |
build-runner-base |
Builds the model runner base image |
build-runner-sklearn |
Builds the sklearn runner image |
build-runner-torch |
Builds the PyTorch runner image |
build-runner-tf |
Builds the TensorFlow runner image |
build-runner-networkx |
Builds the NetworkX runner image |
| Target | Description |
|---|---|
k8s-deploy |
Builds, loads, and deploys all resources to K8s |
deploy-k8s |
Deploys K8s manifests (without building) |
deploy-operator |
Deploys CRDs, RBAC, and operator |
load-images |
Loads local Docker images into the Kind cluster |
k8s-delete |
Deletes all K8s resources |
k8s-init-data |
Triggers data ingestion via the backend API |
redeploy-db |
Redeploys Postgres with fresh schema |
| Target | Description |
|---|---|
dev |
Starts local backend + frontend against local Postgres |
dev-backend |
Starts the FastAPI backend locally with hot-reload |
dev-frontend |
Starts the Next.js frontend locally with hot-reload |
dev-hybrid |
Port-forwards infrastructure services for local dev |
dev-restart-backend |
Rebuilds and restarts the backend pod in K8s |
dev-restart-frontend |
Rebuilds and restarts the frontend pod in K8s |
dev-restart-operator |
Re-applies CRDs/RBAC and restarts the operator |
dev-restart-backend-local |
Restarts the local backend process |
dev-restart-frontend-local |
Restarts the local frontend process |
setup-backend |
Creates Python venv and installs dependencies |
dev-install-frontend |
Installs frontend npm dependencies |
dev-postgres |
Starts a local Postgres container |
dev-stop |
Stops the local Postgres container |
k8s-forward |
Port-forwards all services for demo/full K8s mode |
| Target | Description |
|---|---|
k8s-logs-backend |
Tail backend logs |
k8s-logs-frontend |
Tail frontend logs |
k8s-logs-spark |
Tail Spark logs |
k8s-logs-elasticsearch |
Tail Elasticsearch logs |
k8s-logs-postgraphile |
Tail PostGraphile logs |
k8s-logs-all |
Tail all service logs simultaneously |
watch-pods |
Live pod status watch |
| Target | Description |
|---|---|
test |
Runs unit and integration tests |
test-unit |
Runs unit tests only |
test-integration |
Runs integration tests only |
test-api |
Runs API router tests |
test-repositories |
Runs repository tests |
test-registry |
Runs registry adapter tests |
test-services |
Runs service tests |
e2e-full |
Full E2E suite (setup, deploy, test, cleanup) |
e2e-test |
Runs E2E tests (requires prior deploy) |
e2e-test-models |
E2E model management tests |
e2e-test-anomalies |
E2E anomaly tests |
e2e-test-cases |
E2E case management tests |
e2e-test-rules |
E2E rules tests |
e2e-test-display |
E2E dashboard tests |
test-all |
Runs all tests (unit + integration + E2E) |
| Target | Description |
|---|---|
clean-docker |
Prunes unused Docker resources |
clean-all |
Deletes cluster and prunes Docker |
clean-logs |
Removes local log and pid files |
| Target | Description |
|---|---|
get_pods |
Lists pods in the openuba namespace |
get_trainings |
Lists UBATraining custom resources |
init-db |
Initializes the database schema |
init-db-local |
Initializes schema against local Postgres |
init-db-k8s |
Initializes schema in the K8s Postgres pod |
deploy-dashboard |
Deploys the Kubernetes Dashboard |
k8s-proxy |
Starts kubectl proxy for the K8s Dashboard |
# run all unit and integration tests
make test
# unit tests only
make test-unit
# integration tests only
make test-integration
# API router tests
make test-api
# repository tests
make test-repositories
# registry adapter tests
make test-registry
# service tests
make test-services
# full end-to-end test suite (builds, deploys, tests, cleans up)
make e2e-full
# run everything
make test-all- Twitter: http://twitter.com/OpenUBA
- Discord: https://discord.gg/Ps9p9Wy
- Telegram: https://t.me/GACWR


