Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion .devcontainer/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,15 @@ services:
- ministack

ministack:
image: ministackorg/ministack
image: ministackorg/ministack:full
ports:
- "4566:4566"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
environment:
GLUE_DOCKER_IMAGE: ministack_glue_libs_4.0.0_image_01
MINISTACK_ENDPOINT: http://ministack:4566
S3_PERSIST: "1"
restart: unless-stopped

volumes:
Expand Down
50 changes: 50 additions & 0 deletions .github/workflows/publish-glue-image.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
name: Build and Publish Custom MiniStack Glue Image

on:
push:
branches:
- main
- develop
paths:
- 'docker/glue/**'
- '.github/workflows/publish-glue-image.yml'
workflow_dispatch:

env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}/ministack_glue_libs_4.0.0_image_01

jobs:
build-and-push:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write

steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Log in to the Container registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=raw,value=latest
type=sha,format=short

- name: Build and push Docker image
uses: docker/build-push-action@v5
with:
context: ./docker/glue
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

---

## [0.12.2] - 2026-06-24

### Added

- Log forwarding module for MiniStack Glue/PySpark containers to stream output/error logs to local CloudWatch logs endpoint.
- GitHub Actions workflow to build and publish the custom MiniStack Glue Docker image to GHCR.
- Auto-discovery mechanism for the emulated MiniStack host inside the Docker bridge network.

## [0.12.1] - 2026-06-24

### Added
Expand Down
26 changes: 24 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
</p>

<p align="center">
<img src="https://img.shields.io/badge/version-0.12.1-blue?style=flat-square" alt="Version 0.12.1" />
<img src="https://img.shields.io/badge/version-0.12.2-blue?style=flat-square" alt="Version 0.12.2" />
<img src="https://img.shields.io/badge/license-MIT-green?style=flat-square" alt="MIT License" />
<img src="https://img.shields.io/badge/Next.js-16+-black?style=flat-square&logo=next.js" alt="Next.js 16+" />
<img src="https://img.shields.io/badge/AI-Gemini%20%7C%20AWS%20Bedrock%20%7C%20WebLLM-orange?style=flat-square" alt="AI Providers" />
Expand Down Expand Up @@ -282,13 +282,35 @@ pnpm start

### 6. Deploy to Local AWS (MiniStack)

1. Start MiniStack: `docker run -p 4566:4566 ministackorg/ministack`
1. Start MiniStack: `docker run -p 4566:4566 -v /var/run/docker.sock:/var/run/docker.sock -e GLUE_DOCKER_IMAGE=ghcr.io/dmux/openarchflow/ministack_glue_libs_4.0.0_image_01:latest -e S3_PERSIST=1 ministackorg/ministack:full`
2. Click the **🚀 Rocket** icon in the toolbar
3. Click **Test Connection** — you should see "Connected"
4. Click **Deploy All** — nodes deploy in sequence with live status badges
5. Click any deployed node → **Open Console** to interact with the resource
6. Run a simulation — deployed nodes receive real traffic from the simulation engine

#### 📝 AWS Glue / PySpark Emulation & Live Logs

To emulate AWS Glue ETL jobs, query them using Athena SQL, and inspect execution logs directly in the OpenArchFlow UI, configure MiniStack using the `:full` image along with our custom Spark logging container:

1. **Start MiniStack with Docker Socket & Custom Image**:
```bash
docker run -d -p 4566:4566 \
-v /var/run/docker.sock:/var/run/docker.sock \
-e GLUE_DOCKER_IMAGE=ghcr.io/dmux/openarchflow/ministack_glue_libs_4.0.0_image_01:latest \
-e S3_PERSIST=1 \
ministackorg/ministack:full
```
> [!NOTE]
> The `:full` tag of MiniStack is **required** to use the **Athena Query** tab. The default image does not contain the native DuckDB engine required to query real S3 files and returns static mocked data.

*(Alternatively, use the provided `.devcontainer/docker-compose.yml` to spin up MiniStack automatically with these settings).*

2. **Run PySpark Jobs**:
* Open the **Glue Studio** panel on a deployed Glue Catalog node.
* Start your job in the **Runs** tab.
* Enable **● Live** to stream JVM and Spark logs directly to the dashboard interface!

### 7. Export Diagram

- Click **Actions** → **Export as PNG**
Expand Down
14 changes: 14 additions & 0 deletions docker/glue/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
FROM amazon/aws-glue-libs:glue_libs_4.0.0_image_01

USER root

# Copy forwarder script and entrypoint wrapper
COPY forward_logs.py /opt/glue/bin/forward_logs.py
COPY entrypoint.sh /opt/glue/bin/entrypoint.sh

RUN chmod +x /opt/glue/bin/forward_logs.py /opt/glue/bin/entrypoint.sh

# Revert to the default non-root user of aws-glue-libs
USER glue_user

ENTRYPOINT ["/opt/glue/bin/entrypoint.sh"]
9 changes: 9 additions & 0 deletions docker/glue/entrypoint.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/bin/bash
set -e

# Ensure the Spark bin directories are in the PATH
export PATH="/home/glue_user/spark/bin:/home/glue_user/maven/bin:${PATH}"

# Run the command passed by MiniStack (typically spark-submit)
# and pipe both stdout and stderr through the python log forwarder.
exec "$@" 2>&1 | python3 /opt/glue/bin/forward_logs.py
84 changes: 84 additions & 0 deletions docker/glue/forward_logs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
#!/usr/bin/env python3
import sys
import os
import time
import urllib.request
import boto3

# Discover the working MiniStack endpoint from the sibling container
endpoints = [
os.environ.get("MINISTACK_ENDPOINT"),
"http://host.docker.internal:4566",
"http://172.17.0.1:4566",
"http://ministack:4566",
"http://localhost:4566"
]

# Filter valid and unique endpoints preserving order
endpoints = [e for e in endpoints if e]
seen = set()
endpoints = [e for e in endpoints if not (e in seen or seen.add(e))]

endpoint = "http://host.docker.internal:4566" # fallback default
for ep in endpoints:
try:
# Test connection
urllib.request.urlopen(f"{ep}/", timeout=1)
endpoint = ep
sys.stderr.write(f"[log-forwarder] Successfully connected to MiniStack at: {endpoint}\n")
sys.stderr.flush()
break
except Exception:
continue

log_group = os.environ.get("GLUE_LOG_GROUP", "/aws-glue/jobs/output")
log_stream = os.environ.get("GLUE_LOG_STREAM", "spark-stdout-stream")

# Set up boto3 client pointing to discovered endpoint
client = boto3.client(
"logs",
endpoint_url=endpoint,
region_name=os.environ.get("AWS_DEFAULT_REGION", "us-east-1"),
aws_access_key_id=os.environ.get("AWS_ACCESS_KEY_ID", "mock"),
aws_secret_access_key=os.environ.get("AWS_SECRET_ACCESS_KEY", "mock")
)

# Ensure group and stream exist
try:
client.create_log_group(logGroupName=log_group)
except Exception:
pass

try:
client.create_log_stream(logGroupName=log_group, logStreamName=log_stream)
except Exception:
pass

# Pipe stdin to stdout and forward to CloudWatch Logs
try:
for line in sys.stdin:
sys.stdout.write(line)
sys.stdout.flush()

stripped = line.strip()
if stripped:
try:
client.put_log_events(
logGroupName=log_group,
logStreamName=log_stream,
logEvents=[
{
"timestamp": int(round(time.time() * 1000)),
"message": stripped
}
]
)
except Exception as e:
# Log send errors to stderr but don't crash
sys.stderr.write(f"\n[log-forwarder-send-error] {str(e)}\n")
sys.stderr.flush()
except KeyboardInterrupt:
pass
except Exception as e:
sys.stderr.write(f"\n[log-forwarder-error] {str(e)}\n")
sys.stderr.flush()
3 changes: 2 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "open-arch-flow",
"version": "0.12.1",
"version": "0.12.2",
"private": true,
"packageManager": "pnpm@11.1.2",
"author": "Rafael Sales <rafael.sales@gmail.com>",
Expand All @@ -26,6 +26,7 @@
},
"dependencies": {
"@aws-sdk/client-api-gateway": "^3.1073.0",
"@aws-sdk/client-athena": "^3.1075.0",
"@aws-sdk/client-bedrock": "^3.1073.0",
"@aws-sdk/client-bedrock-runtime": "^3.1073.0",
"@aws-sdk/client-cloudfront": "^3.1073.0",
Expand Down
Loading
Loading