Implement Phase 1: Security and Monitoring Features#15
Draft
kommunication wants to merge 2 commits intoGhurtchu:mainfrom
Draft
Implement Phase 1: Security and Monitoring Features#15kommunication wants to merge 2 commits intoGhurtchu:mainfrom
kommunication wants to merge 2 commits intoGhurtchu:mainfrom
Conversation
This commit implements critical security and monitoring improvements for the distributed code execution engine. Security Enhancements: - Add API key-based authentication for all code execution endpoints - Implement rate limiting (100 requests/hour per API key, configurable) - Add comprehensive input validation: * Code size limits (100KB/50k chars) * Language validation * Dangerous pattern detection (rm -rf, wget, curl, etc.) - Remove security vulnerabilities (seccomp=unconfined from docker-compose) Monitoring & Observability: - Add Prometheus metrics collection: * Request counts by language and status * Execution duration histograms * Active execution gauges * Authentication failure tracking * Rate limit violation tracking * Input validation error tracking - Implement health check endpoint (/health) - Implement readiness check endpoint (/ready) - Add JVM metrics (memory, GC, threads) New Components: - monitoring/Metrics.scala: Prometheus metrics collection - security/Authentication.scala: API key authentication middleware - security/RateLimiter.scala: Actor-based rate limiting - security/InputValidator.scala: Multi-stage input validation Configuration: - Add security.rate-limit.max-requests config option - Add RATE_LIMIT_MAX_REQUESTS environment variable support Documentation: - Update README with authentication examples - Document rate limiting behavior - Document monitoring endpoints and available metrics - Add architecture improvements section Files Modified: - build.sbt: Add Prometheus client dependencies - ClusterSystem.scala: Integrate auth, rate limiting, validation, and metrics - docker-compose.yaml: Remove insecure seccomp=unconfined - application.conf: Add security configuration - README.md: Comprehensive documentation of new features
This commit implements async job execution capabilities and advanced resource management for the distributed code execution engine. Async Job Execution: - Add JobManager actor for centralized job state management - Implement job lifecycle tracking (Queued → Running → Completed/Failed) - Add REST API endpoints for async job operations: * POST /jobs - Submit jobs for async execution * GET /jobs/:id - Retrieve job status and results * GET /jobs - List all jobs with pagination - Add automatic job cleanup with configurable TTL (default: 1 hour) - Maintain backward compatibility with synchronous /lang/:language endpoint Advanced Resource Management: - Create ResourceConfig for per-language resource profiles - Implement configurable CPU, memory, and timeout limits per language: * Java: 2 CPUs, 256MB, 10s timeout * Python: 1 CPU, 50MB, 5s timeout * JavaScript: 1 CPU, 50MB, 5s timeout * Ruby: 1 CPU, 30MB, 5s timeout * Perl: 1 CPU, 20MB, 3s timeout * PHP: 1 CPU, 40MB, 5s timeout - Update CodeExecutor to use configurable resource limits - Update FileHandler and Worker to pass resource limits through execution chain Enhanced Monitoring: - Add job queue depth metrics (braindrill_queue_depth) - Add queued jobs gauge (braindrill_queued_jobs) - Add total jobs submitted counter (braindrill_jobs_submitted_total) - Track job state transitions in metrics New Components: - jobs/Job.scala: Job model with state management - jobs/JobManager.scala: Actor-based job queue and lifecycle management - jobs/JobJsonSupport.scala: JSON serialization for job API responses - config/ResourceConfig.scala: Per-language resource configuration Configuration: - Add jobs.ttl config option for job cleanup TTL - Add per-language resource profiles Documentation: - Add comprehensive async API documentation to README - Document per-language resource limits - Add job API usage examples - Update metrics documentation with new job-related metrics - Update TODO list to reflect Phase 2 completion Files Modified: - ClusterSystem.scala: Add JobManager, async job endpoints, and dual execution modes - Metrics.scala: Add job queue tracking metrics - Worker.scala: Integrate ResourceConfig for dynamic resource allocation - CodeExecutor.scala: Use configurable resource limits in docker execution - FileHandler.scala: Pass resource limits to CodeExecutor - application.conf: Add jobs configuration section - README.md: Extensive documentation of Phase 2 features Benefits: - Non-blocking job submission for long-running code execution - Job history and result retrieval - Optimized resource allocation per programming language - Better observability with job queue metrics - Foundation for future auto-scaling implementation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This commit implements critical security and monitoring improvements for the distributed code execution engine.
Security Enhancements:
Monitoring & Observability:
New Components:
Configuration:
Documentation:
Files Modified: