Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
117 changes: 71 additions & 46 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,67 @@
# Stagehand Ruby API library

The Stagehand Ruby library provides convenient access to the Stagehand REST API from any Ruby 3.2.0+ application. It ships with comprehensive types & docstrings in Yard, RBS, and RBI – [see below](https://github.com/browserbase/stagehand-ruby#Sorbet) for usage with Sorbet. The standard library's `net/http` is used as the HTTP transport, with connection pooling via the `connection_pool` gem.

It is generated with [Stainless](https://www.stainless.com/).

## MCP Server

Use the Stagehand MCP Server to enable AI assistants to interact with this API, allowing them to explore endpoints, make test requests, and use documentation to help integrate this SDK into your application.

[![Add to Cursor](https://cursor.com/deeplink/mcp-install-dark.svg)](https://cursor.com/en-US/install-mcp?name=stagehand-mcp&config=eyJjb21tYW5kIjoibnB4IiwiYXJncyI6WyIteSIsInN0YWdlaGFuZC1tY3AiXX0)
[![Install in VS Code](https://img.shields.io/badge/_-Add_to_VS_Code-blue?style=for-the-badge&logo=)](https://vscode.stainless.com/mcp/%7B%22name%22%3A%22stagehand-mcp%22%2C%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22stagehand-mcp%22%5D%7D)

> Note: You may need to set environment variables in your MCP client.

## Documentation

Documentation for releases of this gem can be found [on RubyDoc](https://gemdocs.org/gems/stagehand).

The REST API documentation can be found on [docs.stagehand.dev](https://docs.stagehand.dev).
<div id="toc" align="center" style="margin-bottom: 0;">
<ul style="list-style: none; margin: 0; padding: 0;">
<a href="https://stagehand.dev">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="media/dark_logo.png" />
<img alt="Stagehand" src="media/light_logo.png" width="200" style="margin-right: 30px;" />
</picture>
</a>
</ul>
</div>
<p align="center">
<strong>The AI Browser Automation Framework</strong><br>
<a href="https://docs.stagehand.dev/v3/sdk/ruby">Read the Docs</a>
</p>

<p align="center">
<a href="https://github.com/browserbase/stagehand/tree/main?tab=MIT-1-ov-file#MIT-1-ov-file">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="media/dark_license.svg" />
<img alt="MIT License" src="media/light_license.svg" />
</picture>
</a>
<a href="https://stagehand.dev/discord">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="media/dark_discord.svg" />
<img alt="Discord Community" src="media/light_discord.svg" />
</picture>
</a>
</p>

<p align="center">
<a href="https://trendshift.io/repositories/12122" target="_blank"><img src="https://trendshift.io/api/badge/repositories/12122" alt="browserbase%2Fstagehand | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
</p>

<p align="center">
If you're looking for other languages, you can find them
<a href="https://docs.stagehand.dev/v3/first-steps/introduction"> here</a>
</p>

<div align="center" style="display: flex; align-items: center; justify-content: center; gap: 4px; margin-bottom: 0;">
<b>Vibe code</b>
<span style="font-size: 1.05em;"> Stagehand with </span>
<a href="https://director.ai" style="display: flex; align-items: center;">
<span>Director</span>
</a>
<span> </span>
<picture>
<img alt="Director" src="media/director_icon.svg" width="25" />
</picture>
</div>

## What is Stagehand?

Stagehand is a browser automation framework used to control web browsers with natural language and code. By combining the power of AI with the precision of code, Stagehand makes web automation flexible, maintainable, and actually reliable.

## Why Stagehand?

Most existing browser automation tools either require you to write low-level code in a framework like Selenium, Playwright, or Puppeteer, or use high-level agents that can be unpredictable in production. By letting developers choose what to write in code vs. natural language (and bridging the gap between the two) Stagehand is the natural choice for browser automations in production.

1. **Choose when to write code vs. natural language**: use AI when you want to navigate unfamiliar pages, and use code when you know exactly what you want to do.

2. **Go from AI-driven to repeatable workflows**: Stagehand lets you preview AI actions before running them, and also helps you easily cache repeatable actions to save time and tokens.

3. **Write once, run forever**: Stagehand's auto-caching combined with self-healing remembers previous actions, runs without LLM inference, and knows when to involve AI whenever the website changes and your automation breaks.

## Installation

Expand Down Expand Up @@ -45,33 +89,24 @@ client = Stagehand::Client.new(
)

# Start a new browser session
# x_language and x_sdk_version headers are required for the v3 API
start_response = client.sessions.start(
model_name: "openai/gpt-5-nano",
x_language: :typescript,
x_sdk_version: "3.0.6"
model_name: "openai/gpt-5-nano"
)
puts "Session started: #{start_response.data.session_id}"

session_id = start_response.data.session_id

# Navigate to a webpage
# frame_id is required - use empty string for the main frame
client.sessions.navigate(
session_id,
url: "https://news.ycombinator.com",
frame_id: "",
x_language: :typescript,
x_sdk_version: "3.0.6"
url: "https://news.ycombinator.com"
)
puts "Navigated to Hacker News"

# Use Observe to find possible actions on the page
observe_response = client.sessions.observe(
session_id,
instruction: "find the link to view comments for the top post",
x_language: :typescript,
x_sdk_version: "3.0.6"
instruction: "find the link to view comments for the top post"
)

actions = observe_response.data.result
Expand All @@ -85,9 +120,7 @@ puts "Acting on: #{action.description}"
# Convert the observe result to a hash and ensure method is set to "click"
act_response = client.sessions.act(
session_id,
input: action.to_h.merge(method: "click"),
x_language: :typescript,
x_sdk_version: "3.0.6"
input: action.to_h.merge(method: "click")
)
puts "Act completed: #{act_response.data.result[:message]}"

Expand All @@ -109,9 +142,7 @@ extract_response = client.sessions.extract(
}
},
required: ["comment_text"]
},
x_language: :typescript,
x_sdk_version: "3.0.6"
}
)
puts "Extracted data: #{extract_response.data.result}"

Expand All @@ -135,20 +166,14 @@ execute_response = client.sessions.execute(
api_key: ENV["MODEL_API_KEY"]
),
cua: false
},
x_language: :typescript,
x_sdk_version: "3.0.6"
}
)
puts "Agent completed: #{execute_response.data.result[:message]}"
puts "Agent success: #{execute_response.data.result[:success]}"
puts "Agent actions taken: #{execute_response.data.result[:actions]&.length || 0}"

# End the session to cleanup browser resources
client.sessions.end_(
session_id,
x_language: :typescript,
x_sdk_version: "3.0.6"
)
client.sessions.end_(session_id)
puts "Session ended"
```

Expand Down
33 changes: 7 additions & 26 deletions examples/basic.rb
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/usr/bin/env ruby

Check failure on line 1 in examples/basic.rb

View workflow job for this annotation

GitHub Actions / lint

Lint/ScriptPermission: Script file basic.rb doesn't have execute permission.
# frozen_string_literal: true

require "bundler/setup"
Expand All @@ -17,33 +17,24 @@
)

# Start a new browser session
# x_language and x_sdk_version headers are required for the v3 API
start_response = client.sessions.start(
model_name: "openai/gpt-5-nano",
x_language: :typescript,
x_sdk_version: "3.0.6"
model_name: "openai/gpt-5-nano"
)
puts "Session started: #{start_response.data.session_id}"

session_id = start_response.data.session_id

# Navigate to a webpage
# frame_id is required - use empty string for the main frame
client.sessions.navigate(
session_id,
url: "https://news.ycombinator.com",
frame_id: "",
x_language: :typescript,
x_sdk_version: "3.0.6"
url: "https://news.ycombinator.com"
)
puts "Navigated to Hacker News"

# Use Observe to find possible actions on the page
observe_response = client.sessions.observe(
session_id,
instruction: "find the link to view comments for the top post",
x_language: :typescript,
x_sdk_version: "3.0.6"
instruction: "find the link to view comments for the top post"
)

actions = observe_response.data.result
Expand All @@ -62,9 +53,7 @@
# Convert the observe result to a hash and ensure method is set to "click"
act_response = client.sessions.act(
session_id,
input: action.to_h.merge(method: "click"),
x_language: :typescript,
x_sdk_version: "3.0.6"
input: action.to_h.merge(method: "click")
)
puts "Act completed: #{act_response.data.result[:message]}"

Expand All @@ -86,9 +75,7 @@
}
},
required: ["comment_text"]
},
x_language: :typescript,
x_sdk_version: "3.0.6"
}
)
puts "Extracted data: #{extract_response.data.result}"

Expand All @@ -112,18 +99,12 @@
api_key: ENV["MODEL_API_KEY"]
),
cua: false
},
x_language: :typescript,
x_sdk_version: "3.0.6"
}
)
puts "Agent completed: #{execute_response.data.result[:message]}"
puts "Agent success: #{execute_response.data.result[:success]}"
puts "Agent actions taken: #{execute_response.data.result[:actions]&.length || 0}"

# End the session to cleanup browser resources
client.sessions.end_(
session_id,
x_language: :typescript,
x_sdk_version: "3.0.6"
)
client.sessions.end_(session_id)
puts "Session ended"
Loading