Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
a6bea18
feat: add unified Endpoint class replacing 8 resource config classes
KAJdev Feb 25, 2026
c605303
feat: wire Endpoint into scanner and flash run, add id= and client mode
KAJdev Feb 25, 2026
6be0140
feat: wire Endpoint into build pipeline and resource discovery
KAJdev Feb 25, 2026
021d512
feat: implement Endpoint client mode (run/runsync/status/HTTP methods)
KAJdev Feb 25, 2026
b272e72
feat: add EndpointJob with status()/wait()/cancel() and webhook support
KAJdev Feb 25, 2026
dc375db
feat: deprecate legacy resource classes and @remote, update skeleton …
KAJdev Feb 25, 2026
244c5d4
format
KAJdev Feb 25, 2026
191ceb9
chore: fix lint errors and formatting
KAJdev Feb 25, 2026
4c3ff83
fix: default to live provisioning when no explicit env signal is set
KAJdev Feb 26, 2026
3be2f23
fix: resolve Endpoint resource type in deploy provisioner
KAJdev Feb 26, 2026
5eb7d57
Merge branch 'main' into zeke/single-entrypoint
KAJdev Feb 26, 2026
335606e
Merge branch 'main' into zeke/single-entrypoint
KAJdev Feb 26, 2026
3787b4b
feat: add scaler_type, scaler_value, and template params to Endpoint
KAJdev Feb 27, 2026
ec3c7d7
fix: suppress warnings from internal calls
KAJdev Feb 27, 2026
a1c48eb
Merge branch 'main' into zeke/single-entrypoint
KAJdev Feb 27, 2026
f428071
chore: update docs
KAJdev Mar 2, 2026
2807254
Merge branch 'main' into zeke/single-entrypoint
KAJdev Mar 2, 2026
349e235
Merge branch 'main' into zeke/single-entrypoint
deanq Mar 3, 2026
2b7e838
fix: add input validation for Endpoint workers, routes, and decorator…
KAJdev Mar 4, 2026
3adabeb
Merge branch 'main' into zeke/single-entrypoint
KAJdev Mar 4, 2026
efa2412
Merge branch 'zeke/single-entrypoint' of https://github.com/runpod/fl…
KAJdev Mar 4, 2026
2382fcd
fix: default LB endpoints to REQUEST_COUNT scaler type
KAJdev Mar 4, 2026
19e1e56
fix: tolerate re-imported GpuType/GpuGroup enums in _normalize_gpu
KAJdev Mar 4, 2026
9a6ad0b
fix: detect cross-endpoint calls inside class methods
KAJdev Mar 4, 2026
c4cf791
fix: strip Authorization header from R2 presigned URL uploads
KAJdev Mar 4, 2026
9f1318a
fix: format endpoint.py and remove unused import
KAJdev Mar 4, 2026
9bf315c
fix: update skeleton template tests to assert Endpoint class
KAJdev Mar 4, 2026
2dd59b9
fix: mark login tests as serial to prevent parallel interference
KAJdev Mar 4, 2026
9c5732f
fix: use LB subdomain URLs for client requests and handle deployed wo…
KAJdev Mar 4, 2026
f29de9d
fix: reject Endpoint(id=) and Endpoint(image=) as decorators
KAJdev Mar 4, 2026
7cf0cf8
fix: wrap client HTTP calls in _ClientCoroutine for clear decorator e…
KAJdev Mar 4, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -191,3 +191,4 @@ cython_debug/
test_app/
pytest-results.xml
coverage.xml
/.pi
74 changes: 27 additions & 47 deletions docs/Cross_Endpoint_Routing.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,11 @@

## Overview

Cross-endpoint routing enables serverless functions to seamlessly call functions deployed on different endpoints. Functions can execute locally or remotely based on service discovery configuration, allowing developers to build distributed applications without manual routing logic.
Cross-endpoint routing enables `Endpoint`-decorated functions to seamlessly call functions deployed on different endpoints. Functions can execute locally or remotely based on service discovery configuration, allowing developers to build distributed applications without manual routing logic.

## Problem Statement

Previously, serverless functions were isolated to their deployment endpoint. Building distributed applications required:
- Manual HTTP calls to other endpoints with serialization boilerplate
- No unified function invocation pattern across endpoints
- Difficulty discovering which functions exist on which endpoints
- Complex error handling for remote failures
- No automatic argument serialization/deserialization

Cross-endpoint routing solves these problems by providing transparent function routing with manifest-based service discovery.
Previously, serverless functions were isolated to their deployment endpoint. Building distributed applications required manual HTTP calls, boilerplate serialization, and complex error handling. Cross-endpoint routing solves these problems by providing transparent function routing with manifest-based service discovery.

## User Guide

Expand Down Expand Up @@ -74,26 +67,19 @@ export RUNPOD_ENDPOINT_ID=gpu-endpoint-123

#### 3. Define Functions

Define functions normally. The routing system decides execution location:
Define functions using `Endpoint`. The routing system decides execution location at runtime:

```python
from runpod_flash import stub
from runpod_flash import Endpoint, GpuGroup

@stub.function()
@Endpoint(name="image-processor", gpu=GpuGroup.ADA_24)
async def process_image(image_path: str) -> dict:
"""Process an image - may execute locally or remotely."""
# This function might route to 'image-processor' endpoint
# based on manifest configuration
"""process an image -- may execute locally or remotely."""
return {"processed": True}

@stub.function()
async def local_only_function(data: str) -> str:
"""Always executes locally (not in manifest)."""
return f"Processed: {data}"

@stub.function()
@Endpoint(name="report-generator", cpu="cpu3c-1-2")
async def generate_report(data: list) -> bytes:
"""May route to 'report-generator' endpoint."""
"""may route to 'report-generator' endpoint."""
return b"report data"
```

Expand All @@ -102,11 +88,9 @@ async def generate_report(data: list) -> bytes:
The routing system handles execution location transparently:

```python
# Local execution (not in manifest)
result = await local_only_function("hello")

# Remote or local execution (based on manifest)
# remote or local execution (based on manifest)
result = await process_image("path/to/image.jpg")
report = await generate_report([1, 2, 3])
```

### Configuration
Expand Down Expand Up @@ -194,17 +178,21 @@ Split functionality across endpoints using manifest:

**Functions**:
```python
@stub.function()
from runpod_flash import Endpoint, GpuGroup

@Endpoint(name="image-processor", gpu=GpuGroup.ADA_24)
async def resize_image(path: str, size: int) -> str:
return process_image(path, size)

@stub.function()
@Endpoint(name="report-generator", cpu="cpu3c-1-2")
async def generate_metrics(data: list) -> dict:
return create_metrics(data)

@stub.function()
orchestrator = Endpoint(name="orchestrator", cpu="cpu3c-1-2")

@orchestrator.post("/workflow")
async def workflow():
# Transparently calls across endpoints
# transparently calls across endpoints
image = await resize_image("input.jpg", 512)
metrics = await generate_metrics([1, 2, 3])
return {"image": image, "metrics": metrics}
Expand Down Expand Up @@ -234,34 +222,26 @@ Configure some functions for remote execution, others local:

**Functions**:
```python
@stub.function()
from runpod_flash import Endpoint, GpuGroup

@Endpoint(name="gpu-cluster", gpu=GpuGroup.AMPERE_80)
async def heavy_computation(data: bytes) -> bytes:
# Routes to GPU cluster (in function_registry)
# routes to GPU cluster
return gpu_process(data)

@stub.function()
async def light_computation(value: int) -> int:
# Always local - not in function_registry
return value * 2
```

#### Pattern 3: Fallback to Local

Functions gracefully fall back to local execution if routing fails:

```python
@stub.function()
from runpod_flash import Endpoint, GpuGroup

@Endpoint(name="critical-service", gpu=GpuGroup.ANY)
async def critical_service(request: dict) -> dict:
# Routes to critical-endpoint if:
# - In function_registry
# - Manifest available
# Otherwise executes locally
# routes to critical endpoint if manifest available
# otherwise executes locally
return handle_critical(request)

@stub.function()
async def helper_function(x: int) -> int:
# Always local - not in manifest
return x + 1
```

### Error Handling
Expand Down
18 changes: 9 additions & 9 deletions docs/Deployment_Architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ A deployed Flash App consists of peer endpoints, where functions are partitioned

```mermaid
graph TD
A["📦 flash build"] -->|"Analyze App"| B["Scan remote functions"]
A["📦 flash build"] -->|"Analyze App"| B["Scan Endpoint patterns"]
B -->|"Write"| C["flash_manifest.json"]
B -->|"Archive"| D["artifact.tar.gz"]

Expand Down Expand Up @@ -114,8 +114,8 @@ graph LR
- **Single Codebase**: All endpoints run identical code, differentiation via manifest assignments
- **Manifest-Driven**: The manifest controls function distribution and routing
- **Smart Routing**: System automatically determines if execution is local (in-process) or remote (inter-endpoint)
- **Deployed Mode**: Unlike Live mode, endpoints are aware they're in distributed deployment with explicit role assignments
- **Transparent Execution**: Functions can call other functions without knowing deployment topology; manifest handles routing
- **Deployed Mode**: Unlike live mode, endpoints are aware they're in distributed deployment with explicit role assignments
- **Transparent Execution**: `Endpoint`-decorated functions can call other functions without knowing deployment topology; manifest handles routing
- **State Synchronization**: State Manager maintains the source of truth; endpoints sync via GraphQL
- **Reconciliation**: The CLI reconciles the manifest with persisted state during `flash deploy`
- **Peer-to-Peer Discovery**: All endpoints query State Manager GraphQL API directly for service discovery
Expand All @@ -139,7 +139,7 @@ Generated by `flash build` command:
},
"resources": {
"endpoint_1": {
"resource_type": "ServerlessResource",
"resource_type": "Endpoint",
"functions": [
{
"name": "funcA",
Expand All @@ -156,7 +156,7 @@ Generated by `flash build` command:
]
},
"endpoint_2": {
"resource_type": "LoadBalancerSlsResource",
"resource_type": "Endpoint",
"functions": [
{
"name": "funcC",
Expand Down Expand Up @@ -203,14 +203,14 @@ Stored in State Manager with deployment metadata:
},
"resources": {
"endpoint_1": {
"resource_type": "ServerlessResource",
"resource_type": "Endpoint",
"functions": [...],
"config_hash": "a1b2c3d4e5f6",
"endpoint_url": "https://ep1-abc123.api.runpod.ai",
"status": "deployed"
},
"endpoint_2": {
"resource_type": "LoadBalancerSlsResource",
"resource_type": "Endpoint",
"functions": [...],
"config_hash": "f6e5d4c3b2a1",
"endpoint_url": "https://ep2-def456.api.runpod.ai",
Expand Down Expand Up @@ -238,14 +238,14 @@ All endpoints query State Manager directly for manifest synchronization. There i
},
"resources": {
"endpoint_1": {
"resource_type": "ServerlessResource",
"resource_type": "Endpoint",
"functions": [...],
"config_hash": "a1b2c3d4e5f6",
"endpoint_url": "https://ep1-abc123.api.runpod.ai",
"status": "deployed"
},
"endpoint_2": {
"resource_type": "LoadBalancerSlsResource",
"resource_type": "Endpoint",
"functions": [...],
"config_hash": "f6e5d4c3b2a1",
"endpoint_url": "https://ep2-def456.api.runpod.ai",
Expand Down
48 changes: 24 additions & 24 deletions docs/Flash_Deploy_Guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,14 @@

## Overview

Flash Deploy is a distributed runtime system that enables scalable execution of `@remote` functions across dynamically provisioned Runpod serverless endpoints. It bridges the gap between local development and production cloud deployment through a unified interface.
Flash Deploy is a distributed runtime system that enables scalable execution of `Endpoint`-decorated functions across dynamically provisioned Runpod serverless endpoints. It bridges the gap between local development and production cloud deployment through a unified interface.

### System Goals

1. **Transparency**: Developers write local Python, deploy to cloud without code changes
2. **Scalability**: Functions execute on remote serverless endpoints with resource isolation
3. **Flexibility**: Support both queue-based and load-balanced execution models
4. **Reliability**: Automatic resource provisioning, state reconciliation, and drift detection
1. **Transparency**: developers write local Python, deploy to cloud without code changes
2. **Scalability**: functions execute on remote serverless endpoints with resource isolation
3. **Flexibility**: supports both queue-based and load-balanced execution models
4. **Reliability**: automatic resource provisioning, state reconciliation, and drift detection

### High-Level Architecture

Expand All @@ -23,16 +23,16 @@ graph TB
Developer["Developer Machine"]

subgraph Build["Build Phase"]
Scan["Scanner<br/>Find @remote"]
Scan["Scanner<br/>Find Endpoint"]
Manifest["ManifestBuilder<br/>flash_manifest.json"]
end

subgraph Cloud["Runpod Cloud"]
S3["S3 Storage<br/>artifact.tar.gz"]

subgraph Endpoints["Peer Endpoints<br/>(one per resource config)"]
Handler1["GPU Handler<br/>@remote functions"]
Handler2["CPU Handler<br/>@remote functions"]
Handler1["GPU Handler<br/>Endpoint functions"]
Handler2["CPU Handler<br/>Endpoint functions"]
StateQuery["Service Registry<br/>Query State Manager"]
end
end
Expand All @@ -44,7 +44,7 @@ graph TB
Developer -->|flash deploy --env| S3
CLI -->|provision all endpoints| Endpoints
Endpoints -->|query manifest<br/>peer-to-peer| Database
Developer -->|call @remote| Endpoints
Developer -->|call Endpoint functions| Endpoints

style Endpoints fill:#388e3c,stroke:#1b5e20,stroke-width:3px,color:#fff
style Build fill:#f57c00,stroke:#e65100,stroke-width:3px,color:#fff
Expand All @@ -54,11 +54,11 @@ graph TB

**Endpoints**: All deployed endpoints are peers. The CLI provisions them upfront during `flash deploy`. Each endpoint loads the manifest from its `.flash/` directory and queries State Manager for peer discovery.

**Worker Endpoints**: Endpoints that execute `@remote` functions. One per resource config (e.g., `gpu_config`, `cpu_config`).
**Worker Endpoints**: Endpoints that execute `Endpoint`-decorated functions. One per resource config (e.g., `gpu_worker`, `cpu_api`).

**Manifest**: JSON document describing all deployed functions, their resource configs, routing rules, and metadata. Built at compile-time, distributed to all endpoints.

**Resource Config**: A Python object that defines CloudResource specifications (GPU type, memory, image, etc.). Becomes a deployable endpoint.
**Resource Config**: Derived from `Endpoint(...)` parameters (GPU type, workers, scaling, etc.). `Endpoint` internally creates the appropriate resource config class for deployment.

**Service Registry**: Runtime component that maps function names to endpoint URLs and determines local vs remote execution.

Expand Down Expand Up @@ -226,8 +226,8 @@ This section walks through the entire journey from source code to executing remo
```mermaid
sequenceDiagram
Developer->>Build: flash build
Build->>Build: Scan files for @remote
Build->>Build: Find resource configs<br/>(e.g., gpu_config, cpu_config)
Build->>Build: Scan files for Endpoint patterns
Build->>Build: Find resource configs<br/>(QB decorators + LB route registrations)
Build->>Build: Scan functions per resource<br/>Build function registry
Build->>ManifestBuilder: Create manifest entry<br/>per resource config
ManifestBuilder->>ManifestBuilder: Validate routes<br/>(no conflicts)
Expand All @@ -239,7 +239,7 @@ sequenceDiagram
```

**Scanner** (`src/runpod_flash/cli/commands/build_utils/scanner.py`):
- Decorators scanned: `@remote`, `@load_balanced`, `@cluster`
- Patterns scanned: `@Endpoint(...)` (QB), `ep.get("/path")` / `ep.post("/path")` (LB), and legacy `@remote`
- Extracts: function name, module path, async status, HTTP routing info
- Groups functions by resource config

Expand All @@ -251,13 +251,13 @@ sequenceDiagram
"generated_at": "2024-01-21T10:00:00Z",
"project_name": "my_project",
"resources": {
"gpu_config": {
"resource_type": "LiveServerless",
"gpu_worker": {
"resource_type": "Endpoint",
"functions": [{"name": "process", "module": "main", ...}],
"is_load_balanced": false
}
},
"function_registry": {"process": "gpu_config"},
"function_registry": {"process": "gpu_worker"},
"routes": {}
}
```
Expand Down Expand Up @@ -300,7 +300,7 @@ sequenceDiagram

### Phase 3: Endpoint Boot & Service Discovery

Each endpoint boots independently. Endpoints that make cross-endpoint calls (i.e., call `@remote` functions deployed on a different resource config) query State Manager to discover peer endpoint URLs. Endpoints that only execute local functions do not need State Manager access.
Each endpoint boots independently. Endpoints that make cross-endpoint calls (i.e., call functions deployed on a different resource config) query State Manager to discover peer endpoint URLs. Endpoints that only execute local functions do not need State Manager access.

```mermaid
sequenceDiagram
Expand Down Expand Up @@ -354,7 +354,7 @@ sequenceDiagram

### Phase 4: Runtime Function Execution

When client calls `@remote function`:
When client calls an Endpoint-decorated function:

```mermaid
sequenceDiagram
Expand Down Expand Up @@ -416,7 +416,7 @@ def handler(job: Dict[str, Any]) -> Dict[str, Any]:

**Load-Balanced** (`src/runpod_flash/runtime/lb_handler.py`):
- FastAPI app with user-defined HTTP routes
- `/execute` endpoint for @remote execution (LiveLoadBalancer only)
- `/execute` endpoint for internal function execution (local dev only)
- User routes: HTTP methods + paths from manifest

**Key Files:**
Expand All @@ -438,7 +438,7 @@ The manifest is the contract between build-time and runtime. It defines all depl
**Builder**: `ManifestBuilder` in `src/runpod_flash/cli/commands/build_utils/manifest.py`

**Input**:
- List of discovered `@remote` functions (from scanner)
- List of discovered Endpoint-decorated functions (from scanner)
- Each function has:
- Name, module, async status
- Resource config name
Expand All @@ -452,7 +452,7 @@ The manifest is the contract between build-time and runtime. It defines all depl
"project_name": "my_app",
"resources": {
"gpu_config": {
"resource_type": "LiveServerless",
"resource_type": "Endpoint",
"functions": [
{
"name": "train",
Expand Down Expand Up @@ -648,7 +648,7 @@ await StateManagerClient.update_resource_state(flash_environment_id, resources)

## Remote Execution

When `@remote function` is called, the client determines whether to execute locally or remotely.
When an Endpoint-decorated function is called, the client determines whether to execute locally or remotely.

### Execution Modes

Expand Down Expand Up @@ -1105,7 +1105,7 @@ logging.getLogger("runpod_flash.runtime.service_registry").setLevel(logging.DEBU

| File | Purpose |
|------|---------|
| `src/runpod_flash/cli/commands/build_utils/scanner.py` | Scans for @remote decorators |
| `src/runpod_flash/cli/commands/build_utils/scanner.py` | Scans for Endpoint patterns and legacy @remote |
| `src/runpod_flash/cli/commands/build_utils/manifest.py` | Manifest builder and validation |

### Resource Management
Expand Down
Loading
Loading