runpod · KAJdev · Feb 25, 2026 · Feb 25, 2026 · Feb 25, 2026 · Feb 25, 2026
diff --git a/.gitignore b/.gitignore
@@ -191,3 +191,4 @@ cython_debug/
 test_app/
 pytest-results.xml
 coverage.xml
+/.pi
diff --git a/docs/Cross_Endpoint_Routing.md b/docs/Cross_Endpoint_Routing.md
@@ -2,18 +2,11 @@
 
 ## Overview
 
-Cross-endpoint routing enables serverless functions to seamlessly call functions deployed on different endpoints. Functions can execute locally or remotely based on service discovery configuration, allowing developers to build distributed applications without manual routing logic.
+Cross-endpoint routing enables `Endpoint`-decorated functions to seamlessly call functions deployed on different endpoints. Functions can execute locally or remotely based on service discovery configuration, allowing developers to build distributed applications without manual routing logic.
 
 ## Problem Statement
 
-Previously, serverless functions were isolated to their deployment endpoint. Building distributed applications required:
-- Manual HTTP calls to other endpoints with serialization boilerplate
-- No unified function invocation pattern across endpoints
-- Difficulty discovering which functions exist on which endpoints
-- Complex error handling for remote failures
-- No automatic argument serialization/deserialization
-
-Cross-endpoint routing solves these problems by providing transparent function routing with manifest-based service discovery.
+Previously, serverless functions were isolated to their deployment endpoint. Building distributed applications required manual HTTP calls, boilerplate serialization, and complex error handling. Cross-endpoint routing solves these problems by providing transparent function routing with manifest-based service discovery.
 
 ## User Guide
 
@@ -74,26 +67,19 @@ export RUNPOD_ENDPOINT_ID=gpu-endpoint-123
 
 #### 3. Define Functions
 
-Define functions normally. The routing system decides execution location:
+Define functions using `Endpoint`. The routing system decides execution location at runtime:
 
 ```python
-from runpod_flash import stub
+from runpod_flash import Endpoint, GpuGroup
 
-@stub.function()
+@Endpoint(name="image-processor", gpu=GpuGroup.ADA_24)
 async def process_image(image_path: str) -> dict:
-    """Process an image - may execute locally or remotely."""
-    # This function might route to 'image-processor' endpoint
-    # based on manifest configuration
+    """process an image -- may execute locally or remotely."""
     return {"processed": True}
 
-@stub.function()
-async def local_only_function(data: str) -> str:
-    """Always executes locally (not in manifest)."""
-    return f"Processed: {data}"
-
-@stub.function()
+@Endpoint(name="report-generator", cpu="cpu3c-1-2")
 async def generate_report(data: list) -> bytes:
-    """May route to 'report-generator' endpoint."""
+    """may route to 'report-generator' endpoint."""
     return b"report data"
 ```
 
@@ -102,11 +88,9 @@ async def generate_report(data: list) -> bytes:
 The routing system handles execution location transparently:
 
 ```python
-# Local execution (not in manifest)
-result = await local_only_function("hello")
-
-# Remote or local execution (based on manifest)
+# remote or local execution (based on manifest)
 result = await process_image("path/to/image.jpg")
+report = await generate_report([1, 2, 3])
 ```
 
 ### Configuration
@@ -194,17 +178,21 @@ Split functionality across endpoints using manifest:
 
 **Functions**:
 ```python
-@stub.function()
+from runpod_flash import Endpoint, GpuGroup
+
+@Endpoint(name="image-processor", gpu=GpuGroup.ADA_24)
 async def resize_image(path: str, size: int) -> str:
     return process_image(path, size)
 
-@stub.function()
+@Endpoint(name="report-generator", cpu="cpu3c-1-2")
 async def generate_metrics(data: list) -> dict:
     return create_metrics(data)
 
-@stub.function()
+orchestrator = Endpoint(name="orchestrator", cpu="cpu3c-1-2")
+
+@orchestrator.post("/workflow")
 async def workflow():
-    # Transparently calls across endpoints
+    # transparently calls across endpoints
     image = await resize_image("input.jpg", 512)
     metrics = await generate_metrics([1, 2, 3])
     return {"image": image, "metrics": metrics}
@@ -234,34 +222,26 @@ Configure some functions for remote execution, others local:
 
 **Functions**:
 ```python
-@stub.function()
+from runpod_flash import Endpoint, GpuGroup
+
+@Endpoint(name="gpu-cluster", gpu=GpuGroup.AMPERE_80)
 async def heavy_computation(data: bytes) -> bytes:
-    # Routes to GPU cluster (in function_registry)
+    # routes to GPU cluster
     return gpu_process(data)
-
-@stub.function()
-async def light_computation(value: int) -> int:
-    # Always local - not in function_registry
-    return value * 2
 ```
 
 #### Pattern 3: Fallback to Local
 
 Functions gracefully fall back to local execution if routing fails:
 
 ```python
-@stub.function()
+from runpod_flash import Endpoint, GpuGroup
+
+@Endpoint(name="critical-service", gpu=GpuGroup.ANY)
 async def critical_service(request: dict) -> dict:
-    # Routes to critical-endpoint if:
-    # - In function_registry
-    # - Manifest available
-    # Otherwise executes locally
+    # routes to critical endpoint if manifest available
+    # otherwise executes locally
     return handle_critical(request)
-
-@stub.function()
-async def helper_function(x: int) -> int:
-    # Always local - not in manifest
-    return x + 1
 ```
 
 ### Error Handling

diff --git a/docs/Deployment_Architecture.md b/docs/Deployment_Architecture.md
@@ -7,7 +7,7 @@ A deployed Flash App consists of peer endpoints, where functions are partitioned
 
 ```mermaid
 graph TD
-    A["📦 flash build"] -->|"Analyze App"| B["Scan remote functions"]
+    A["📦 flash build"] -->|"Analyze App"| B["Scan Endpoint patterns"]
     B -->|"Write"| C["flash_manifest.json"]
     B -->|"Archive"| D["artifact.tar.gz"]
 
@@ -114,8 +114,8 @@ graph LR
 - **Single Codebase**: All endpoints run identical code, differentiation via manifest assignments
 - **Manifest-Driven**: The manifest controls function distribution and routing
 - **Smart Routing**: System automatically determines if execution is local (in-process) or remote (inter-endpoint)
-- **Deployed Mode**: Unlike Live mode, endpoints are aware they're in distributed deployment with explicit role assignments
-- **Transparent Execution**: Functions can call other functions without knowing deployment topology; manifest handles routing
+- **Deployed Mode**: Unlike live mode, endpoints are aware they're in distributed deployment with explicit role assignments
+- **Transparent Execution**: `Endpoint`-decorated functions can call other functions without knowing deployment topology; manifest handles routing
 - **State Synchronization**: State Manager maintains the source of truth; endpoints sync via GraphQL
 - **Reconciliation**: The CLI reconciles the manifest with persisted state during `flash deploy`
 - **Peer-to-Peer Discovery**: All endpoints query State Manager GraphQL API directly for service discovery
@@ -139,7 +139,7 @@ Generated by `flash build` command:
   },
   "resources": {
     "endpoint_1": {
-      "resource_type": "ServerlessResource",
+      "resource_type": "Endpoint",
       "functions": [
         {
           "name": "funcA",
@@ -156,7 +156,7 @@ Generated by `flash build` command:
       ]
     },
     "endpoint_2": {
-      "resource_type": "LoadBalancerSlsResource",
+      "resource_type": "Endpoint",
       "functions": [
         {
           "name": "funcC",
@@ -203,14 +203,14 @@ Stored in State Manager with deployment metadata:
   },
   "resources": {
     "endpoint_1": {
-      "resource_type": "ServerlessResource",
+      "resource_type": "Endpoint",
       "functions": [...],
       "config_hash": "a1b2c3d4e5f6",
       "endpoint_url": "https://ep1-abc123.api.runpod.ai",
       "status": "deployed"
     },
     "endpoint_2": {
-      "resource_type": "LoadBalancerSlsResource",
+      "resource_type": "Endpoint",
       "functions": [...],
       "config_hash": "f6e5d4c3b2a1",
       "endpoint_url": "https://ep2-def456.api.runpod.ai",
@@ -238,14 +238,14 @@ All endpoints query State Manager directly for manifest synchronization. There i
   },
   "resources": {
     "endpoint_1": {
-      "resource_type": "ServerlessResource",
+      "resource_type": "Endpoint",
       "functions": [...],
       "config_hash": "a1b2c3d4e5f6",
       "endpoint_url": "https://ep1-abc123.api.runpod.ai",
       "status": "deployed"
     },
     "endpoint_2": {
-      "resource_type": "LoadBalancerSlsResource",
+      "resource_type": "Endpoint",
       "functions": [...],
       "config_hash": "f6e5d4c3b2a1",
       "endpoint_url": "https://ep2-def456.api.runpod.ai",

diff --git a/docs/Flash_Deploy_Guide.md b/docs/Flash_Deploy_Guide.md
@@ -7,14 +7,14 @@
 
 ## Overview
 
-Flash Deploy is a distributed runtime system that enables scalable execution of `@remote` functions across dynamically provisioned Runpod serverless endpoints. It bridges the gap between local development and production cloud deployment through a unified interface.
+Flash Deploy is a distributed runtime system that enables scalable execution of `Endpoint`-decorated functions across dynamically provisioned Runpod serverless endpoints. It bridges the gap between local development and production cloud deployment through a unified interface.
 
 ### System Goals
 
-1. **Transparency**: Developers write local Python, deploy to cloud without code changes
-2. **Scalability**: Functions execute on remote serverless endpoints with resource isolation
-3. **Flexibility**: Support both queue-based and load-balanced execution models
-4. **Reliability**: Automatic resource provisioning, state reconciliation, and drift detection
+1. **Transparency**: developers write local Python, deploy to cloud without code changes
+2. **Scalability**: functions execute on remote serverless endpoints with resource isolation
+3. **Flexibility**: supports both queue-based and load-balanced execution models
+4. **Reliability**: automatic resource provisioning, state reconciliation, and drift detection
 
 ### High-Level Architecture
 
@@ -23,16 +23,16 @@ graph TB
     Developer["Developer Machine"]
 
     subgraph Build["Build Phase"]
-        Scan["Scanner<br/>Find @remote"]
+        Scan["Scanner<br/>Find Endpoint"]
         Manifest["ManifestBuilder<br/>flash_manifest.json"]
     end
 
     subgraph Cloud["Runpod Cloud"]
         S3["S3 Storage<br/>artifact.tar.gz"]
 
         subgraph Endpoints["Peer Endpoints<br/>(one per resource config)"]
-            Handler1["GPU Handler<br/>@remote functions"]
-            Handler2["CPU Handler<br/>@remote functions"]
+            Handler1["GPU Handler<br/>Endpoint functions"]
+            Handler2["CPU Handler<br/>Endpoint functions"]
             StateQuery["Service Registry<br/>Query State Manager"]
         end
     end
@@ -44,7 +44,7 @@ graph TB
     Developer -->|flash deploy --env| S3
     CLI -->|provision all endpoints| Endpoints
     Endpoints -->|query manifest<br/>peer-to-peer| Database
-    Developer -->|call @remote| Endpoints
+    Developer -->|call Endpoint functions| Endpoints
 
     style Endpoints fill:#388e3c,stroke:#1b5e20,stroke-width:3px,color:#fff
     style Build fill:#f57c00,stroke:#e65100,stroke-width:3px,color:#fff
@@ -54,11 +54,11 @@ graph TB
 
 **Endpoints**: All deployed endpoints are peers. The CLI provisions them upfront during `flash deploy`. Each endpoint loads the manifest from its `.flash/` directory and queries State Manager for peer discovery.
 
-**Worker Endpoints**: Endpoints that execute `@remote` functions. One per resource config (e.g., `gpu_config`, `cpu_config`).
+**Worker Endpoints**: Endpoints that execute `Endpoint`-decorated functions. One per resource config (e.g., `gpu_worker`, `cpu_api`).
 
 **Manifest**: JSON document describing all deployed functions, their resource configs, routing rules, and metadata. Built at compile-time, distributed to all endpoints.
 
-**Resource Config**: A Python object that defines CloudResource specifications (GPU type, memory, image, etc.). Becomes a deployable endpoint.
+**Resource Config**: Derived from `Endpoint(...)` parameters (GPU type, workers, scaling, etc.). `Endpoint` internally creates the appropriate resource config class for deployment.
 
 **Service Registry**: Runtime component that maps function names to endpoint URLs and determines local vs remote execution.
 
@@ -226,8 +226,8 @@ This section walks through the entire journey from source code to executing remo
 ```mermaid
 sequenceDiagram
     Developer->>Build: flash build
-    Build->>Build: Scan files for @remote
-    Build->>Build: Find resource configs<br/>(e.g., gpu_config, cpu_config)
+    Build->>Build: Scan files for Endpoint patterns
+    Build->>Build: Find resource configs<br/>(QB decorators + LB route registrations)
     Build->>Build: Scan functions per resource<br/>Build function registry
     Build->>ManifestBuilder: Create manifest entry<br/>per resource config
     ManifestBuilder->>ManifestBuilder: Validate routes<br/>(no conflicts)
@@ -239,7 +239,7 @@ sequenceDiagram
 ```
 
 **Scanner** (`src/runpod_flash/cli/commands/build_utils/scanner.py`):
-- Decorators scanned: `@remote`, `@load_balanced`, `@cluster`
+- Patterns scanned: `@Endpoint(...)` (QB), `ep.get("/path")` / `ep.post("/path")` (LB), and legacy `@remote`
 - Extracts: function name, module path, async status, HTTP routing info
 - Groups functions by resource config
 
@@ -251,13 +251,13 @@ sequenceDiagram
     "generated_at": "2024-01-21T10:00:00Z",
     "project_name": "my_project",
     "resources": {
-      "gpu_config": {
-        "resource_type": "LiveServerless",
+      "gpu_worker": {
+        "resource_type": "Endpoint",
         "functions": [{"name": "process", "module": "main", ...}],
         "is_load_balanced": false
       }
     },
-    "function_registry": {"process": "gpu_config"},
+    "function_registry": {"process": "gpu_worker"},
     "routes": {}
   }
   ```
@@ -300,7 +300,7 @@ sequenceDiagram
 
 ### Phase 3: Endpoint Boot & Service Discovery
 
-Each endpoint boots independently. Endpoints that make cross-endpoint calls (i.e., call `@remote` functions deployed on a different resource config) query State Manager to discover peer endpoint URLs. Endpoints that only execute local functions do not need State Manager access.
+Each endpoint boots independently. Endpoints that make cross-endpoint calls (i.e., call functions deployed on a different resource config) query State Manager to discover peer endpoint URLs. Endpoints that only execute local functions do not need State Manager access.
 
 ```mermaid
 sequenceDiagram
@@ -354,7 +354,7 @@ sequenceDiagram
 
 ### Phase 4: Runtime Function Execution
 
-When client calls `@remote function`:
+When client calls an Endpoint-decorated function:
 
 ```mermaid
 sequenceDiagram
@@ -416,7 +416,7 @@ def handler(job: Dict[str, Any]) -> Dict[str, Any]:
 
 **Load-Balanced** (`src/runpod_flash/runtime/lb_handler.py`):
 - FastAPI app with user-defined HTTP routes
-- `/execute` endpoint for @remote execution (LiveLoadBalancer only)
+- `/execute` endpoint for internal function execution (local dev only)
 - User routes: HTTP methods + paths from manifest
 
 **Key Files:**
@@ -438,7 +438,7 @@ The manifest is the contract between build-time and runtime. It defines all depl
 **Builder**: `ManifestBuilder` in `src/runpod_flash/cli/commands/build_utils/manifest.py`
 
 **Input**:
-- List of discovered `@remote` functions (from scanner)
+- List of discovered Endpoint-decorated functions (from scanner)
 - Each function has:
   - Name, module, async status
   - Resource config name
@@ -452,7 +452,7 @@ The manifest is the contract between build-time and runtime. It defines all depl
   "project_name": "my_app",
   "resources": {
     "gpu_config": {
-      "resource_type": "LiveServerless",
+      "resource_type": "Endpoint",
       "functions": [
         {
           "name": "train",
@@ -648,7 +648,7 @@ await StateManagerClient.update_resource_state(flash_environment_id, resources)
 
 ## Remote Execution
 
-When `@remote function` is called, the client determines whether to execute locally or remotely.
+When an Endpoint-decorated function is called, the client determines whether to execute locally or remotely.
 
 ### Execution Modes
 
@@ -1105,7 +1105,7 @@ logging.getLogger("runpod_flash.runtime.service_registry").setLevel(logging.DEBU
 
 | File | Purpose |
 |------|---------|
-| `src/runpod_flash/cli/commands/build_utils/scanner.py` | Scans for @remote decorators |
+| `src/runpod_flash/cli/commands/build_utils/scanner.py` | Scans for Endpoint patterns and legacy @remote |
 | `src/runpod_flash/cli/commands/build_utils/manifest.py` | Manifest builder and validation |
 
 ### Resource Management