Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 18 additions & 64 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,8 @@
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue)](https://www.python.org/downloads/)

<!-- PyPI badges (uncomment once package is published)
[![PyPI version](https://img.shields.io/pypi/v/openadapt-capture.svg)](https://pypi.org/project/openadapt-capture/)
[![Downloads](https://img.shields.io/pypi/dm/openadapt-capture.svg)](https://pypi.org/project/openadapt-capture/)
-->

**OpenAdapt Capture** is the data collection component of the [OpenAdapt](https://github.com/OpenAdaptAI) GUI automation ecosystem.

Expand Down Expand Up @@ -43,7 +41,7 @@ Capture platform-agnostic GUI interaction streams with time-aligned screenshots
|-----------|---------|------------|
| **openadapt-capture** | Record human demonstrations | [GitHub](https://github.com/OpenAdaptAI/openadapt-capture) |
| **openadapt-ml** | Train and evaluate GUI automation models | [GitHub](https://github.com/OpenAdaptAI/openadapt-ml) |
| **openadapt-privacy** | PII scrubbing for recordings | Coming soon |
| **openadapt-privacy** | PII scrubbing for recordings | [GitHub](https://github.com/OpenAdaptAI/openadapt-privacy) |

---

Expand Down Expand Up @@ -208,75 +206,29 @@ The HTML viewer includes:
uv run python scripts/generate_readme_demo.py --duration 10
```

## Optional Extras
## Sharing Recordings

| Extra | Features |
|-------|----------|
| `audio` | Audio capture + Whisper transcription |
| `privacy` | PII scrubbing (openadapt-privacy) |
| `all` | Everything |

---

## Training with OpenAdapt-ML

Captured recordings can be used to train vision-language models with [openadapt-ml](https://github.com/OpenAdaptAI/openadapt-ml).

### End-to-End Workflow
Share recordings between machines using [Magic Wormhole](https://magic-wormhole.readthedocs.io/):

```bash
# 1. Capture a workflow demonstration
uv run python -c "
from openadapt_capture import Recorder

with Recorder('./my_capture', task_description='Turn off Night Shift') as recorder:
input('Perform the task, then press Enter to stop...')
"

# 2. Train a model on the capture (requires openadapt-ml)
uv pip install openadapt-ml
uv run python -m openadapt_ml.cloud.local train \
--capture ./my_capture \
--open # Opens training dashboard

# 3. Compare human vs model predictions
uv run python -m openadapt_ml.scripts.compare \
--capture ./my_capture \
--checkpoint checkpoints/model \
--open
```
# On the sending machine
capture share send ./my_capture
# Shows a code like: 7-guitarist-revenge

### Cloud GPU Training

For faster training with cloud GPUs:

```bash
# Train on Lambda Labs A10 (~$0.75/hr)
uv run python -m openadapt_ml.cloud.lambda_labs train \
--capture ./my_capture \
--goal "Turn off Night Shift"
# On the receiving machine
capture share receive 7-guitarist-revenge
```

See the [openadapt-ml documentation](https://github.com/OpenAdaptAI/openadapt-ml#6-cloud-gpu-training) for cloud setup.

### Data Format

OpenAdapt-ML converts captures to its Episode format automatically:

```python
from openadapt_ml.ingest.capture import capture_to_episode
The `share` command compresses the recording, sends it via Magic Wormhole, and extracts it on the receiving end. No account or setup required - just share the code.

episode = capture_to_episode("./my_capture")
print(f"Loaded {len(episode.steps)} steps")
print(f"Instruction: {episode.instruction}")
```
## Optional Extras

The conversion maps capture event types to ML action types:
- `mouse.singleclick` / `mouse.click` -> `CLICK`
- `mouse.doubleclick` -> `DOUBLE_CLICK`
- `mouse.drag` -> `DRAG`
- `mouse.scroll` -> `SCROLL`
- `key.type` -> `TYPE`
| Extra | Features |
|-------|----------|
| `audio` | Audio capture + Whisper transcription |
| `privacy` | PII scrubbing ([openadapt-privacy](https://github.com/OpenAdaptAI/openadapt-privacy)) |
| `share` | Recording sharing via Magic Wormhole |
| `all` | Everything |

---

Expand All @@ -290,6 +242,8 @@ uv run pytest
## Related Projects

- [openadapt-ml](https://github.com/OpenAdaptAI/openadapt-ml) - Train and evaluate GUI automation models
- [openadapt-privacy](https://github.com/OpenAdaptAI/openadapt-privacy) - PII detection and scrubbing for recordings
- [openadapt-evals](https://github.com/OpenAdaptAI/openadapt-evals) - Benchmark evaluation for GUI agents
- [Windows Agent Arena](https://github.com/microsoft/WindowsAgentArena) - Benchmark for Windows GUI agents

## License
Expand Down
12 changes: 6 additions & 6 deletions openadapt_capture/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,12 @@

# Browser events and bridge (optional - requires websockets)
try:
from openadapt_capture.browser_bridge import (
BrowserBridge,
BrowserEventRecord,
BrowserMode,
run_browser_bridge,
)
from openadapt_capture.browser_events import (
BoundingBox,
BrowserClickEvent,
Expand All @@ -93,12 +99,6 @@
SemanticElementRef,
VisibleElement,
)
from openadapt_capture.browser_bridge import (
BrowserBridge,
BrowserEventRecord,
BrowserMode,
run_browser_bridge,
)
_BROWSER_BRIDGE_AVAILABLE = True
except ImportError:
_BROWSER_BRIDGE_AVAILABLE = False
Expand Down
16 changes: 5 additions & 11 deletions openadapt_capture/browser_bridge.py
Original file line number Diff line number Diff line change
Expand Up @@ -381,20 +381,14 @@ async def _handle_dom_event(self, data: dict) -> None:
self._event_count += 1

# Parse into typed event if possible
typed_event = self._parse_typed_event(event_type, payload, data)
self._parse_typed_event(event_type, payload, data)

# Store in CaptureStorage if available
if self.storage is not None:
# Store as JSON in the events table
# Note: We store the raw event, not Pydantic model to match storage patterns
try:
from openadapt_capture.events import BaseEvent
# Create a minimal event for storage compatibility
# Browser events don't fit the standard EventType enum
# so we store them as raw JSON in a custom way
pass # Storage integration would go here
except ImportError:
pass
# Storage integration would go here
# Browser events don't fit the standard EventType enum
# so we store them as raw JSON in a custom way
pass

# Notify callback
if self.on_event is not None:
Expand Down
1 change: 0 additions & 1 deletion openadapt_capture/browser_events.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@

from pydantic import BaseModel, Field


# =============================================================================
# Browser Event Types
# =============================================================================
Expand Down
2 changes: 1 addition & 1 deletion openadapt_capture/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -353,7 +353,7 @@ def share(action: str, path_or_code: str, output_dir: str = ".") -> None:
capture share receive 7-guitarist-revenge
capture share receive 7-guitarist-revenge ./recordings
"""
from openadapt_capture.share import send, receive
from openadapt_capture.share import receive, send

if action == "send":
send(path_or_code)
Expand Down
132 changes: 132 additions & 0 deletions openadapt_capture/platform/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
"""Platform-specific implementations for GUI event capture.

This module provides platform-specific implementations for:
- Screen capture
- Input event capture
- Display information (resolution, DPI, pixel ratio)

The module automatically selects the appropriate implementation based on
the current platform (darwin, win32, linux).
"""

from __future__ import annotations

import sys
from typing import TYPE_CHECKING

if TYPE_CHECKING:
from typing import Protocol

class PlatformProvider(Protocol):
"""Protocol for platform-specific providers."""

@staticmethod
def get_screen_dimensions() -> tuple[int, int]:
"""Get screen dimensions in physical pixels."""
...

@staticmethod
def get_display_pixel_ratio() -> float:
"""Get display pixel ratio (physical/logical)."""
...

@staticmethod
def is_accessibility_enabled() -> bool:
"""Check if accessibility permissions are enabled."""
...


def get_platform() -> str:
"""Get the current platform identifier.

Returns:
'darwin' for macOS, 'win32' for Windows, 'linux' for Linux.
"""
return sys.platform


def get_platform_provider() -> "PlatformProvider":
"""Get the platform-specific provider for the current OS.

Returns:
Platform provider instance for the current operating system.

Raises:
NotImplementedError: If the platform is not supported.
"""
platform = get_platform()

if platform == "darwin":
from openadapt_capture.platform.darwin import DarwinPlatform
return DarwinPlatform()
elif platform == "win32":
from openadapt_capture.platform.windows import WindowsPlatform
return WindowsPlatform()
elif platform.startswith("linux"):
from openadapt_capture.platform.linux import LinuxPlatform
return LinuxPlatform()
else:
raise NotImplementedError(f"Platform not supported: {platform}")


def get_screen_dimensions() -> tuple[int, int]:
"""Get screen dimensions in physical pixels.

This returns the actual screenshot pixel dimensions, which may be
larger than logical dimensions on HiDPI/Retina displays.

Returns:
Tuple of (width, height) in physical pixels.
"""
try:
provider = get_platform_provider()
return provider.get_screen_dimensions()
except (NotImplementedError, ImportError):
# Fallback to generic implementation
try:
from PIL import ImageGrab
screenshot = ImageGrab.grab()
return screenshot.size
except Exception:
return (1920, 1080) # Default fallback


def get_display_pixel_ratio() -> float:
"""Get the display pixel ratio (physical/logical).

This is the ratio of physical pixels to logical pixels.
For example, 2.0 for Retina displays on macOS.

Returns:
Pixel ratio (e.g., 1.0 for standard displays, 2.0 for Retina).
"""
try:
provider = get_platform_provider()
return provider.get_display_pixel_ratio()
except (NotImplementedError, ImportError):
return 1.0


def is_accessibility_enabled() -> bool:
"""Check if accessibility permissions are enabled.

On macOS, this checks if the application has accessibility permissions
required for keyboard and mouse event capture.

Returns:
True if accessibility is enabled, False otherwise.
"""
try:
provider = get_platform_provider()
return provider.is_accessibility_enabled()
except (NotImplementedError, ImportError):
return True # Assume enabled on unknown platforms


__all__ = [
"get_platform",
"get_platform_provider",
"get_screen_dimensions",
"get_display_pixel_ratio",
"is_accessibility_enabled",
]
Loading
Loading