reset
This commit is contained in:
23
.github/workflows/ci.yml
vendored
23
.github/workflows/ci.yml
vendored
@@ -1,23 +0,0 @@
|
||||
name: CI
|
||||
|
||||
on:
|
||||
push: {}
|
||||
pull_request: {}
|
||||
|
||||
jobs:
|
||||
test:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- name: Set up Python
|
||||
uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: 3.11
|
||||
- name: Install runtime dependencies
|
||||
run: python -m pip install --upgrade pip && pip install -r requirements.txt
|
||||
- name: Install dev dependencies
|
||||
run: pip install -r requirements-dev.txt
|
||||
- name: Run lints
|
||||
run: ruff check server skill tests
|
||||
- name: Run tests
|
||||
run: pytest
|
||||
69
README.md
69
README.md
@@ -1,69 +1,2 @@
|
||||
# Clickthrough
|
||||
|
||||
Let an Agent interact with your Computer.
|
||||
|
||||
`Clickthrough` is a proof-of-concept bridge between a vision-aware agent and a headless controller. The project is split into two halves:
|
||||
|
||||
1. A Python server that accepts a static grid overlay (think of a screenshot broken into cells) and exposes lightweight endpoints to ask questions, plan actions, or even run pointer/keyboard events.
|
||||
2. A **skill** that bundles the HTTP calls/intent construction so we can hardwire the same flow inside OpenClaw later.
|
||||
|
||||
## Server surface (FastAPI)
|
||||
|
||||
- `POST /grid/init`: Accepts a base64 screenshot plus the requested rows/columns, returns a `grid_id`, cell bounds, and helpful metadata. The grid is stored in-memory so the agent can reference cells by ID in later actions.
|
||||
- `POST /grid/action`: Takes a plan (`grid_id`, optional target cell, and an action like `click`/`drag`/`type`) and returns a structured `ActionResult` with computed coordinates for tooling to consume.
|
||||
- `GET /grid/{grid_id}/summary`: Returns both a heuristic description (`GridPlanner`) and a rich descriptor so the skill can summarize what it sees.
|
||||
- `GET /grid/{grid_id}/history`: Streams back the action history for that grid so an agent or operator can audit what was done.
|
||||
- `POST /grid/{grid_id}/plan`: Lets `GridPlanner` select the target and return a preview action plan without committing to it, so we can inspect coordinates before triggering events.
|
||||
- `POST /grid/{grid_id}/refresh` + `GET /stream/screenshots`: Refresh the cached screenshot/metadata and broadcast the updated scene over a websocket so clients can redraw overlays in near real time.
|
||||
- `GET /health`: A minimal health check for deployments.
|
||||
|
||||
Vision metadata is kept on a per-grid basis, including history, layout dimensions, and any appended memo. Each `VisionGrid` also exposes a short textual summary so the skill layer can turn sensory data into sentences directly.
|
||||
|
||||
## Skill layer (OpenClaw integration)
|
||||
|
||||
The `skill/` package wraps the server calls and exposes helpers:
|
||||
|
||||
- `ClickthroughSkill.describe_grid()` builds a grid session and returns the descriptor.
|
||||
- `ClickthroughSkill.plan_action()` drives the `/grid/action` endpoint.
|
||||
- `ClickthroughSkill.plan_with_planner()` calls `/grid/{grid_id}/plan`, so you can preview the `GridPlanner` suggestion before executing it.
|
||||
- `ClickthroughSkill.grid_summary()` and `.grid_history()` surface the new metadata endpoints.
|
||||
- `ClickthroughSkill.refresh_grid()` pushes a new screenshot and memo, triggering websocket listeners.
|
||||
- `ClickthroughAgentRunner` simulates a tiny agent loop that asks the planner for a preview, executes the resulting action, and then gathers the summary/history so you can iterate on reasoning loops in tests.
|
||||
|
||||
Future work can swap the stub runner for a full OpenClaw skill that keeps reasoning inside the agent and uses these primitives to steer the mouse/keyboard.
|
||||
|
||||
## Screenshot streaming
|
||||
|
||||
Capture loops can now talk to FastAPI in two ways:
|
||||
|
||||
1. POST `/grid/{grid_id}/refresh` with fresh base64 screenshots and an optional memo; the server updates the cached grid metadata and broadcasts the change.
|
||||
2. Open a websocket to `GET /stream/screenshots` (optionally passing `grid_id` as a query param) to receive realtime deltas whenever a refresh happens. Clients can use the descriptor/payload to redraw overlays or trigger new planner runs without polling.
|
||||
|
||||
## Testing
|
||||
|
||||
1. `python3 -m pip install -r requirements.txt`
|
||||
2. `python3 -m pip install -r requirements-dev.txt`
|
||||
3. `python3 -m pytest`
|
||||
|
||||
The `tests/` suite covers grid construction, the FastAPI surface, and the skill/runner helpers.
|
||||
|
||||
## Continuous Integration
|
||||
|
||||
`.github/workflows/ci.yml` runs on pushes and PRs:
|
||||
|
||||
- Checks out the repo and sets up Python 3.11.
|
||||
- Installs dependencies (`requirements.txt` + `requirements-dev.txt`).
|
||||
- Runs `ruff check` over the Python packages.
|
||||
- Executes `pytest` to keep coverage high.
|
||||
|
||||
## Control UI
|
||||
|
||||
- `/ui/` serves a small control panel where you can bootstrap a grid from a base64 screenshot, ask the planner for a preview, execute clicks, refresh the screenshot, and watch the summary/history.
|
||||
- Most traffic is HTTP: `/grid/init`, `/grid/{id}/plan`, `/grid/{id}/action`, `/grid/{id}/refresh`, `/grid/{id}/summary`, and `/grid/{id}/history`. Only the `/stream/screenshots` websocket pushes updates after a refresh so the overlay redraws.
|
||||
- The FastAPI root now redirects to `/ui/` when the client assets are present, making the UI a lightweight entry point for demos or manual command-and-control work.
|
||||
|
||||
## Next steps
|
||||
|
||||
- Add OCR or UI heuristics so grid cells have meaningful labels before the agent reasons about them.
|
||||
- Persist grids and histories in a lightweight store so long-running sessions survive restarts.
|
||||
- Expand the UI to preview actions visually (perhaps overlaying cells on top of rendered screenshots).
|
||||
Let an Agent interact with your Computer.
|
||||
159
client/app.js
159
client/app.js
@@ -1,159 +0,0 @@
|
||||
const gridForm = document.getElementById("grid-form");
|
||||
const descriptorEl = document.getElementById("descriptor");
|
||||
const gridMetaEl = document.getElementById("grid-meta");
|
||||
const summaryEl = document.getElementById("summary");
|
||||
const historyEl = document.getElementById("history");
|
||||
const planOutput = document.getElementById("plan-output");
|
||||
const preferredInput = document.getElementById("preferred-label");
|
||||
const refreshScreenshot = document.getElementById("refresh-screenshot");
|
||||
const refreshMemo = document.getElementById("refresh-memo");
|
||||
const logEl = document.getElementById("ws-log");
|
||||
|
||||
let currentGrid = null;
|
||||
let lastPlan = null;
|
||||
let ws = null;
|
||||
let keepAliveId = null;
|
||||
|
||||
const log = (message) => {
|
||||
const timestamp = new Date().toLocaleTimeString();
|
||||
logEl.textContent = `[${timestamp}] ${message}\n${logEl.textContent}`;
|
||||
};
|
||||
|
||||
const headers = {
|
||||
"Content-Type": "application/json",
|
||||
};
|
||||
|
||||
const subscribeToGrid = (gridId) => {
|
||||
if (!gridId) return;
|
||||
if (ws) {
|
||||
ws.close();
|
||||
}
|
||||
const protocol = window.location.protocol === "https:" ? "wss" : "ws";
|
||||
ws = new WebSocket(`${protocol}://${window.location.host}/stream/screenshots?grid_id=${gridId}`);
|
||||
|
||||
ws.addEventListener("open", () => {
|
||||
log(`WebSocket listening for grid ${gridId}`);
|
||||
ws.send("ready");
|
||||
keepAliveId = setInterval(() => ws.send("ping"), 15000);
|
||||
});
|
||||
|
||||
ws.addEventListener("message", (event) => {
|
||||
log(`Update received → ${event.data}`);
|
||||
});
|
||||
|
||||
ws.addEventListener("close", () => {
|
||||
log("WebSocket disconnected");
|
||||
if (keepAliveId) {
|
||||
clearInterval(keepAliveId);
|
||||
keepAliveId = null;
|
||||
}
|
||||
});
|
||||
};
|
||||
|
||||
const updateDescriptor = (descriptor) => {
|
||||
descriptorEl.textContent = JSON.stringify(descriptor, null, 2);
|
||||
gridMetaEl.textContent = `Grid ${descriptor.grid_id} (${descriptor.rows}x${descriptor.columns}) · ${descriptor.cells.length} cells`;
|
||||
};
|
||||
|
||||
const updateSummary = async () => {
|
||||
if (!currentGrid) return;
|
||||
const [summaryResponse, historyResponse] = await Promise.all([
|
||||
fetch(`/grid/${currentGrid}/summary`),
|
||||
fetch(`/grid/${currentGrid}/history`),
|
||||
]);
|
||||
|
||||
if (summaryResponse.ok) {
|
||||
const payload = await summaryResponse.json();
|
||||
summaryEl.textContent = payload.summary;
|
||||
}
|
||||
|
||||
if (historyResponse.ok) {
|
||||
const payload = await historyResponse.json();
|
||||
historyEl.textContent = JSON.stringify(payload.history, null, 2);
|
||||
}
|
||||
};
|
||||
|
||||
const initGrid = async (event) => {
|
||||
event.preventDefault();
|
||||
const formData = new FormData(gridForm);
|
||||
const payload = {
|
||||
width: Number(formData.get("width")),
|
||||
height: Number(formData.get("height")),
|
||||
rows: Number(formData.get("rows")),
|
||||
columns: Number(formData.get("columns")),
|
||||
screenshot_base64: formData.get("screenshot"),
|
||||
};
|
||||
const response = await fetch("/grid/init", {
|
||||
method: "POST",
|
||||
headers,
|
||||
body: JSON.stringify(payload),
|
||||
});
|
||||
const descriptor = await response.json();
|
||||
currentGrid = descriptor.grid_id;
|
||||
updateDescriptor(descriptor);
|
||||
await updateSummary();
|
||||
subscribeToGrid(currentGrid);
|
||||
planOutput.textContent = "Plan preview will appear here.";
|
||||
log(`Grid ${currentGrid} initialized.`);
|
||||
};
|
||||
|
||||
document.getElementById("plan-button").addEventListener("click", async () => {
|
||||
if (!currentGrid) {
|
||||
log("Initialize a grid first.");
|
||||
return;
|
||||
}
|
||||
const response = await fetch(`/grid/${currentGrid}/plan`, {
|
||||
method: "POST",
|
||||
headers,
|
||||
body: JSON.stringify({
|
||||
preferred_label: preferredInput.value || null,
|
||||
action: "click",
|
||||
text: "ui-trigger",
|
||||
}),
|
||||
});
|
||||
const result = await response.json();
|
||||
lastPlan = result.plan;
|
||||
planOutput.textContent = JSON.stringify(result, null, 2);
|
||||
});
|
||||
|
||||
document.getElementById("run-action").addEventListener("click", async () => {
|
||||
if (!lastPlan) {
|
||||
log("Run the planner first.");
|
||||
return;
|
||||
}
|
||||
const payload = {
|
||||
grid_id: lastPlan.grid_id,
|
||||
action: lastPlan.action,
|
||||
target_cell: lastPlan.target_cell,
|
||||
text: "from-ui",
|
||||
comment: "UI action",
|
||||
};
|
||||
const response = await fetch("/grid/action", {
|
||||
method: "POST",
|
||||
headers,
|
||||
body: JSON.stringify(payload),
|
||||
});
|
||||
const result = await response.json();
|
||||
log(`Action ${result.detail} at ${result.coordinates}`);
|
||||
await updateSummary();
|
||||
});
|
||||
|
||||
document.getElementById("refresh-button").addEventListener("click", async () => {
|
||||
if (!currentGrid) {
|
||||
log("Start a grid first.");
|
||||
return;
|
||||
}
|
||||
const payload = {
|
||||
screenshot_base64: refreshScreenshot.value || "",
|
||||
memo: refreshMemo.value || undefined,
|
||||
};
|
||||
const response = await fetch(`/grid/${currentGrid}/refresh`, {
|
||||
method: "POST",
|
||||
headers,
|
||||
body: JSON.stringify(payload),
|
||||
});
|
||||
const data = await response.json();
|
||||
log(`Refresh acknowledged: ${JSON.stringify(data)}`);
|
||||
});
|
||||
|
||||
gridForm.addEventListener("submit", initGrid);
|
||||
@@ -1,85 +0,0 @@
|
||||
<!doctype html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<title>Clickthrough Control</title>
|
||||
<link rel="stylesheet" href="styles.css" />
|
||||
</head>
|
||||
<body>
|
||||
<main>
|
||||
<header>
|
||||
<h1>Clickthrough Control Panel</h1>
|
||||
<p>Most actions use HTTP; screenshots stream over WebSocket when refreshed.</p>
|
||||
</header>
|
||||
|
||||
<section class="card">
|
||||
<h2>Grid bootstrap</h2>
|
||||
<form id="grid-form">
|
||||
<label>
|
||||
Width
|
||||
<input type="number" name="width" value="640" min="1" required />
|
||||
</label>
|
||||
<label>
|
||||
Height
|
||||
<input type="number" name="height" value="480" min="1" required />
|
||||
</label>
|
||||
<label>
|
||||
Rows
|
||||
<input type="number" name="rows" value="3" min="1" required />
|
||||
</label>
|
||||
<label>
|
||||
Columns
|
||||
<input type="number" name="columns" value="3" min="1" required />
|
||||
</label>
|
||||
<label class="stretch">
|
||||
Screenshot (base64)
|
||||
<textarea name="screenshot" id="screenshot" rows="3">AA==</textarea>
|
||||
</label>
|
||||
<button type="submit">Init grid</button>
|
||||
</form>
|
||||
</section>
|
||||
|
||||
<section class="card" id="grid-details">
|
||||
<h2>Grid status</h2>
|
||||
<div id="grid-meta">No grid yet.</div>
|
||||
<pre id="descriptor" class="monospace"></pre>
|
||||
</section>
|
||||
|
||||
<section class="card">
|
||||
<h2>Planner & Actions</h2>
|
||||
<label>
|
||||
Preferred label
|
||||
<input type="text" id="preferred-label" placeholder="button" />
|
||||
</label>
|
||||
<div class="button-row">
|
||||
<button type="button" id="plan-button">Preview plan</button>
|
||||
<button type="button" id="run-action">Run action</button>
|
||||
</div>
|
||||
<pre id="plan-output" class="monospace">Plan preview will appear here.</pre>
|
||||
</section>
|
||||
|
||||
<section class="card">
|
||||
<h2>Refresh Screenshot</h2>
|
||||
<textarea id="refresh-screenshot" rows="3" placeholder="Paste new base64 screenshot"></textarea>
|
||||
<label>
|
||||
Memo
|
||||
<input type="text" id="refresh-memo" placeholder="Describe the scene" />
|
||||
</label>
|
||||
<button type="button" id="refresh-button">Refresh grid</button>
|
||||
<p class="note">Refresh triggers /stream/screenshots so the UI can redraw.</p>
|
||||
</section>
|
||||
|
||||
<section class="card">
|
||||
<h2>Summary & history</h2>
|
||||
<pre id="summary" class="monospace">No data yet.</pre>
|
||||
<pre id="history" class="monospace">History will show here.</pre>
|
||||
</section>
|
||||
|
||||
<section class="card">
|
||||
<h2>Websocket log</h2>
|
||||
<pre id="ws-log" class="monospace">Waiting for updates…</pre>
|
||||
</section>
|
||||
</main>
|
||||
<script type="module" src="app.js"></script>
|
||||
</body>
|
||||
</html>
|
||||
@@ -1,108 +0,0 @@
|
||||
* {
|
||||
box-sizing: border-box;
|
||||
}
|
||||
|
||||
body {
|
||||
font-family: "Inter", "Segoe UI", system-ui, sans-serif;
|
||||
margin: 0;
|
||||
background: #121212;
|
||||
color: #f5f5f5;
|
||||
}
|
||||
|
||||
main {
|
||||
max-width: 960px;
|
||||
margin: 0 auto;
|
||||
padding: 24px;
|
||||
}
|
||||
|
||||
header {
|
||||
text-align: center;
|
||||
margin-bottom: 24px;
|
||||
}
|
||||
|
||||
header h1 {
|
||||
margin-bottom: 8px;
|
||||
}
|
||||
|
||||
.card {
|
||||
background: #1f1f1f;
|
||||
padding: 16px;
|
||||
border-radius: 16px;
|
||||
margin-bottom: 16px;
|
||||
box-shadow: 0 20px 45px rgba(0, 0, 0, 0.35);
|
||||
}
|
||||
|
||||
label {
|
||||
display: block;
|
||||
margin-bottom: 12px;
|
||||
}
|
||||
|
||||
label input,
|
||||
label textarea {
|
||||
width: 100%;
|
||||
border-radius: 10px;
|
||||
border: 1px solid #333;
|
||||
background: #0f0f0f;
|
||||
color: #f1f1f1;
|
||||
padding: 8px 12px;
|
||||
margin-top: 4px;
|
||||
font-family: inherit;
|
||||
}
|
||||
|
||||
textarea {
|
||||
font-family: inherit;
|
||||
}
|
||||
|
||||
button {
|
||||
background: linear-gradient(135deg, #6d7cff, #3b82f6);
|
||||
border: none;
|
||||
padding: 10px 20px;
|
||||
color: white;
|
||||
border-radius: 999px;
|
||||
font-weight: 600;
|
||||
cursor: pointer;
|
||||
transition: transform 0.15s ease;
|
||||
}
|
||||
|
||||
button:hover {
|
||||
transform: translateY(-1px);
|
||||
}
|
||||
|
||||
.button-row {
|
||||
display: flex;
|
||||
gap: 12px;
|
||||
flex-wrap: wrap;
|
||||
margin-bottom: 12px;
|
||||
}
|
||||
|
||||
.monospace {
|
||||
background: #0c0c0c;
|
||||
border-radius: 12px;
|
||||
padding: 12px;
|
||||
border: 1px solid #333;
|
||||
min-height: 80px;
|
||||
}
|
||||
|
||||
.note {
|
||||
font-size: 0.9rem;
|
||||
margin-top: 8px;
|
||||
color: #b0b0b0;
|
||||
}
|
||||
|
||||
@media (min-width: 768px) {
|
||||
label {
|
||||
display: flex;
|
||||
gap: 12px;
|
||||
align-items: center;
|
||||
}
|
||||
|
||||
label input,
|
||||
label textarea {
|
||||
width: auto;
|
||||
flex: 1;
|
||||
}
|
||||
|
||||
.stretch textarea {
|
||||
width: 100%;
|
||||
}
|
||||
}
|
||||
@@ -1,3 +0,0 @@
|
||||
[pytest]
|
||||
testpaths = tests
|
||||
python_files = test_*.py
|
||||
@@ -1,2 +0,0 @@
|
||||
pytest>=8.0.0
|
||||
ruff>=0.0.1
|
||||
@@ -1,5 +0,0 @@
|
||||
fastapi>=0.105.2
|
||||
uvicorn[standard]>=0.23.2
|
||||
pydantic>=2.8.2
|
||||
pydantic-settings>=2.5.0
|
||||
httpx>=0.28.1
|
||||
@@ -1,5 +0,0 @@
|
||||
[tool.ruff]
|
||||
line-length = 100
|
||||
select = ["E", "F", "I", "S"]
|
||||
target-version = "py311"
|
||||
exclude = ["data", "__pycache__"]
|
||||
@@ -1 +0,0 @@
|
||||
from .main import app # noqa: F401
|
||||
@@ -1,34 +0,0 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Tuple
|
||||
|
||||
from .models import ActionPayload, ActionResult
|
||||
|
||||
|
||||
class ActionEngine:
|
||||
def __init__(self, grid) -> None:
|
||||
self.grid = grid
|
||||
|
||||
def plan(self, payload: ActionPayload) -> ActionResult:
|
||||
coords = self._resolve_coords(payload.target_cell)
|
||||
detail = self._describe(payload, coords)
|
||||
return ActionResult(
|
||||
success=True,
|
||||
detail=detail,
|
||||
coordinates=coords,
|
||||
payload={"comment": payload.comment or "", "text": payload.text or ""},
|
||||
)
|
||||
|
||||
def _resolve_coords(self, target_cell: str | None) -> Tuple[int, int] | None:
|
||||
if not target_cell:
|
||||
return None
|
||||
return self.grid.resolve_cell_center(target_cell)
|
||||
|
||||
def _describe(
|
||||
self, payload: ActionPayload, coords: Tuple[int, int] | None
|
||||
) -> str:
|
||||
cell_info = payload.target_cell or "free space"
|
||||
location = f"@{cell_info}" if coords else "(no target)"
|
||||
action_hint = payload.action.value
|
||||
extra = f" text='{payload.text}'" if payload.text else ""
|
||||
return f"Plan {action_hint} {location}{extra}"
|
||||
@@ -1,14 +0,0 @@
|
||||
from pathlib import Path
|
||||
|
||||
from pydantic import ConfigDict
|
||||
from pydantic_settings import BaseSettings
|
||||
|
||||
|
||||
class ServerSettings(BaseSettings):
|
||||
grid_rows: int = 4
|
||||
grid_cols: int = 4
|
||||
cell_margin_px: int = 4
|
||||
storage_dir: Path = Path("data/screenshots")
|
||||
default_timeout: int = 10
|
||||
|
||||
model_config = ConfigDict(env_prefix="CLICKTHROUGH_", env_file=".env")
|
||||
136
server/grid.py
136
server/grid.py
@@ -1,136 +0,0 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass
|
||||
from typing import Any, Dict, List, Tuple
|
||||
import uuid
|
||||
|
||||
from .actions import ActionEngine
|
||||
from .config import ServerSettings
|
||||
from .models import (
|
||||
ActionPayload,
|
||||
ActionResult,
|
||||
GridCellModel,
|
||||
GridDescriptor,
|
||||
GridInitRequest,
|
||||
)
|
||||
|
||||
|
||||
@dataclass
|
||||
class _StoredCell:
|
||||
model: GridCellModel
|
||||
center: Tuple[int, int]
|
||||
|
||||
|
||||
class VisionGrid:
|
||||
def __init__(self, request: GridInitRequest, grid_id: str, rows: int, columns: int):
|
||||
self.grid_id = grid_id
|
||||
self.screenshot = request.screenshot_base64
|
||||
self.memo = request.memo
|
||||
self.rows = rows
|
||||
self.columns = columns
|
||||
self.width = request.width
|
||||
self.height = request.height
|
||||
self.cells: Dict[str, _StoredCell] = {}
|
||||
self._action_history: List[dict[str, Any]] = []
|
||||
self._engine = ActionEngine(self)
|
||||
self._build_cells()
|
||||
|
||||
def _build_cells(self, margin: int = 4) -> None:
|
||||
cell_width = max(1, self.width // self.columns)
|
||||
cell_height = max(1, self.height // self.rows)
|
||||
|
||||
for row in range(self.rows):
|
||||
for col in range(self.columns):
|
||||
left = col * cell_width + margin
|
||||
top = row * cell_height + margin
|
||||
right = min(self.width - margin, (col + 1) * cell_width - margin)
|
||||
bottom = min(self.height - margin, (row + 1) * cell_height - margin)
|
||||
cell_id = f"{self.grid_id}-{row}-{col}"
|
||||
bounds = (left, top, right, bottom)
|
||||
center = ((left + right) // 2, (top + bottom) // 2)
|
||||
cell = GridCellModel(
|
||||
cell_id=cell_id,
|
||||
row=row,
|
||||
column=col,
|
||||
bounds=bounds,
|
||||
)
|
||||
self.cells[cell_id] = _StoredCell(model=cell, center=center)
|
||||
|
||||
def describe(self) -> GridDescriptor:
|
||||
return GridDescriptor(
|
||||
grid_id=self.grid_id,
|
||||
rows=self.rows,
|
||||
columns=self.columns,
|
||||
cells=[cell.model for cell in self.cells.values()],
|
||||
metadata=self.metadata,
|
||||
)
|
||||
|
||||
@property
|
||||
def metadata(self) -> Dict[str, Any]:
|
||||
return {
|
||||
"memo": self.memo or "",
|
||||
"width": self.width,
|
||||
"height": self.height,
|
||||
}
|
||||
|
||||
def resolve_cell_center(self, cell_id: str) -> Tuple[int, int]:
|
||||
cell = self.cells.get(cell_id)
|
||||
if not cell:
|
||||
raise KeyError(f"Unknown cell {cell_id}")
|
||||
return cell.center
|
||||
|
||||
def preview_action(self, payload: ActionPayload) -> ActionResult:
|
||||
return self._engine.plan(payload)
|
||||
|
||||
def apply_action(self, payload: ActionPayload) -> ActionResult:
|
||||
result = self._engine.plan(payload)
|
||||
self._action_history.append(result.model_dump())
|
||||
return result
|
||||
|
||||
def update_screenshot(self, screenshot_base64: str, memo: str | None = None) -> None:
|
||||
self.screenshot = screenshot_base64
|
||||
if memo:
|
||||
self.memo = memo
|
||||
|
||||
@property
|
||||
def action_history(self) -> List[dict[str, Any]]:
|
||||
return list(self._action_history)
|
||||
|
||||
def summary(self) -> str:
|
||||
last_action = self._action_history[-1] if self._action_history else None
|
||||
last_summary = (
|
||||
f"Last action: {last_action.get('detail')}" if last_action else "No actions recorded yet"
|
||||
)
|
||||
return (
|
||||
f"Grid {self.grid_id} ({self.rows}x{self.columns}) with {len(self.cells)} cells. {last_summary}."
|
||||
)
|
||||
|
||||
|
||||
class GridManager:
|
||||
def __init__(self, settings: ServerSettings):
|
||||
self.settings = settings
|
||||
self._grids: Dict[str, VisionGrid] = {}
|
||||
|
||||
@property
|
||||
def grid_count(self) -> int:
|
||||
return len(self._grids)
|
||||
|
||||
def create_grid(self, request: GridInitRequest) -> VisionGrid:
|
||||
rows = request.rows or self.settings.grid_rows
|
||||
columns = request.columns or self.settings.grid_cols
|
||||
grid_id = uuid.uuid4().hex
|
||||
grid = VisionGrid(request, grid_id, rows, columns)
|
||||
self._grids[grid_id] = grid
|
||||
return grid
|
||||
|
||||
def get_grid(self, grid_id: str) -> VisionGrid:
|
||||
try:
|
||||
return self._grids[grid_id]
|
||||
except KeyError as exc:
|
||||
raise KeyError(f"Grid {grid_id} not found") from exc
|
||||
|
||||
def get_history(self, grid_id: str) -> List[dict[str, Any]]:
|
||||
return self.get_grid(grid_id).action_history
|
||||
|
||||
def clear(self) -> None:
|
||||
self._grids.clear()
|
||||
133
server/main.py
133
server/main.py
@@ -1,133 +0,0 @@
|
||||
import time
|
||||
from pathlib import Path
|
||||
|
||||
from fastapi import FastAPI, HTTPException, WebSocket, WebSocketDisconnect
|
||||
from fastapi.responses import RedirectResponse
|
||||
from fastapi.staticfiles import StaticFiles
|
||||
|
||||
from .config import ServerSettings
|
||||
from .grid import GridManager
|
||||
from .models import (
|
||||
ActionPayload,
|
||||
GridDescriptor,
|
||||
GridInitRequest,
|
||||
GridPlanRequest,
|
||||
GridRefreshRequest,
|
||||
)
|
||||
from .planner import GridPlanner
|
||||
from .streamer import ScreenshotStreamer
|
||||
|
||||
|
||||
settings = ServerSettings()
|
||||
manager = GridManager(settings)
|
||||
planner = GridPlanner()
|
||||
streamer = ScreenshotStreamer()
|
||||
|
||||
app = FastAPI(
|
||||
title="Clickthrough",
|
||||
description="Grid-aware surface that lets an agent plan clicks, drags, and typing on a fake screenshot",
|
||||
version="0.3.0",
|
||||
)
|
||||
|
||||
client_dir = Path(__file__).resolve().parent.parent / "client"
|
||||
if client_dir.exists():
|
||||
app.mount("/ui", StaticFiles(directory=str(client_dir), html=True), name="ui")
|
||||
|
||||
|
||||
@app.get("/")
|
||||
async def root():
|
||||
if client_dir.exists():
|
||||
return RedirectResponse("/ui/")
|
||||
return {"status": "ok", "grid_count": manager.grid_count}
|
||||
|
||||
|
||||
@app.get("/health")
|
||||
def health_check() -> dict[str, str]:
|
||||
return {"status": "ok", "grid_count": str(manager.grid_count)}
|
||||
|
||||
|
||||
@app.post("/grid/init", response_model=GridDescriptor)
|
||||
def init_grid(request: GridInitRequest) -> GridDescriptor:
|
||||
grid = manager.create_grid(request)
|
||||
return grid.describe()
|
||||
|
||||
|
||||
@app.post("/grid/action")
|
||||
def apply_action(payload: ActionPayload):
|
||||
try:
|
||||
grid = manager.get_grid(payload.grid_id)
|
||||
except KeyError as exc:
|
||||
raise HTTPException(status_code=404, detail=str(exc)) from exc
|
||||
return grid.apply_action(payload)
|
||||
|
||||
|
||||
@app.get("/grid/{grid_id}/summary")
|
||||
def grid_summary(grid_id: str):
|
||||
try:
|
||||
grid = manager.get_grid(grid_id)
|
||||
except KeyError as exc:
|
||||
raise HTTPException(status_code=404, detail=str(exc)) from exc
|
||||
descriptor = grid.describe()
|
||||
return {
|
||||
"grid_id": grid_id,
|
||||
"summary": planner.describe(descriptor),
|
||||
"details": grid.summary(),
|
||||
"descriptor": descriptor,
|
||||
}
|
||||
|
||||
|
||||
@app.get("/grid/{grid_id}/history")
|
||||
def grid_history(grid_id: str):
|
||||
try:
|
||||
history = manager.get_history(grid_id)
|
||||
except KeyError as exc:
|
||||
raise HTTPException(status_code=404, detail=str(exc)) from exc
|
||||
return {"grid_id": grid_id, "history": history}
|
||||
|
||||
|
||||
@app.post("/grid/{grid_id}/plan")
|
||||
def plan_grid(grid_id: str, request: GridPlanRequest):
|
||||
try:
|
||||
grid = manager.get_grid(grid_id)
|
||||
except KeyError as exc:
|
||||
raise HTTPException(status_code=404, detail=str(exc)) from exc
|
||||
descriptor = grid.describe()
|
||||
payload = planner.build_payload(
|
||||
descriptor,
|
||||
action=request.action,
|
||||
preferred_label=request.preferred_label,
|
||||
text=request.text,
|
||||
comment=request.comment,
|
||||
)
|
||||
result = grid.preview_action(payload)
|
||||
return {"plan": payload.model_dump(), "result": result, "descriptor": descriptor}
|
||||
|
||||
|
||||
@app.post("/grid/{grid_id}/refresh")
|
||||
async def refresh_grid(grid_id: str, payload: GridRefreshRequest):
|
||||
try:
|
||||
grid = manager.get_grid(grid_id)
|
||||
except KeyError as exc:
|
||||
raise HTTPException(status_code=404, detail=str(exc)) from exc
|
||||
grid.update_screenshot(payload.screenshot_base64, payload.memo)
|
||||
descriptor = grid.describe()
|
||||
await streamer.broadcast(
|
||||
grid_id,
|
||||
{
|
||||
"grid_id": grid_id,
|
||||
"timestamp": time.time(),
|
||||
"descriptor": descriptor,
|
||||
"screenshot_base64": payload.screenshot_base64,
|
||||
},
|
||||
)
|
||||
return {"status": "updated", "grid_id": grid_id}
|
||||
|
||||
|
||||
@app.websocket("/stream/screenshots")
|
||||
async def stream_screenshots(websocket: WebSocket, grid_id: str | None = None):
|
||||
key = await streamer.connect(websocket, grid_id)
|
||||
try:
|
||||
while True:
|
||||
await websocket.receive_text()
|
||||
except WebSocketDisconnect:
|
||||
streamer.disconnect(websocket, key)
|
||||
@@ -1,67 +0,0 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from enum import Enum
|
||||
from typing import Any, Dict, List, Optional, Tuple
|
||||
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
|
||||
class ActionType(str, Enum):
|
||||
CLICK = "click"
|
||||
DOUBLE_CLICK = "double_click"
|
||||
DRAG = "drag"
|
||||
TYPE = "type"
|
||||
SCROLL = "scroll"
|
||||
|
||||
|
||||
class GridInitRequest(BaseModel):
|
||||
width: int
|
||||
height: int
|
||||
screenshot_base64: str
|
||||
rows: Optional[int] = None
|
||||
columns: Optional[int] = None
|
||||
memo: Optional[str] = None
|
||||
|
||||
|
||||
class GridCellModel(BaseModel):
|
||||
cell_id: str
|
||||
row: int
|
||||
column: int
|
||||
bounds: Tuple[int, int, int, int]
|
||||
label: Optional[str] = None
|
||||
|
||||
|
||||
class GridDescriptor(BaseModel):
|
||||
grid_id: str
|
||||
rows: int
|
||||
columns: int
|
||||
cells: List[GridCellModel]
|
||||
metadata: Dict[str, Any] = Field(default_factory=dict)
|
||||
|
||||
|
||||
class ActionPayload(BaseModel):
|
||||
grid_id: str
|
||||
action: ActionType
|
||||
target_cell: Optional[str] = None
|
||||
text: Optional[str] = None
|
||||
comment: Optional[str] = None
|
||||
data: Dict[str, Any] = Field(default_factory=dict)
|
||||
|
||||
|
||||
class ActionResult(BaseModel):
|
||||
success: bool
|
||||
detail: str
|
||||
coordinates: Optional[Tuple[int, int]] = None
|
||||
payload: Dict[str, Any] = Field(default_factory=dict)
|
||||
|
||||
|
||||
class GridPlanRequest(BaseModel):
|
||||
preferred_label: Optional[str] = None
|
||||
action: ActionType = ActionType.CLICK
|
||||
text: Optional[str] = None
|
||||
comment: Optional[str] = None
|
||||
|
||||
|
||||
class GridRefreshRequest(BaseModel):
|
||||
screenshot_base64: str
|
||||
memo: Optional[str] = None
|
||||
@@ -1,70 +0,0 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from math import hypot
|
||||
from typing import Sequence
|
||||
|
||||
from .models import ActionPayload, ActionType, GridCellModel, GridDescriptor
|
||||
|
||||
|
||||
class GridPlanner:
|
||||
"""Helper that picks a grid cell using simple heuristics."""
|
||||
|
||||
def select_cell(
|
||||
self, descriptor: GridDescriptor, preferred_label: str | None = None
|
||||
) -> GridCellModel | None:
|
||||
if not descriptor.cells:
|
||||
return None
|
||||
|
||||
if preferred_label:
|
||||
match = self._match_label(descriptor.cells, preferred_label)
|
||||
if match:
|
||||
return match
|
||||
|
||||
center_point = self._grid_center(descriptor)
|
||||
return min(descriptor.cells, key=lambda cell: self._distance(self._cell_center(cell), center_point))
|
||||
|
||||
def build_payload(
|
||||
self,
|
||||
descriptor: GridDescriptor,
|
||||
action: ActionType = ActionType.CLICK,
|
||||
preferred_label: str | None = None,
|
||||
text: str | None = None,
|
||||
comment: str | None = None,
|
||||
) -> ActionPayload:
|
||||
target = self.select_cell(descriptor, preferred_label)
|
||||
return ActionPayload(
|
||||
grid_id=descriptor.grid_id,
|
||||
action=action,
|
||||
target_cell=target.cell_id if target else None,
|
||||
text=text,
|
||||
comment=comment,
|
||||
)
|
||||
|
||||
def describe(self, descriptor: GridDescriptor) -> str:
|
||||
cell_count = len(descriptor.cells)
|
||||
return (
|
||||
f"Grid {descriptor.grid_id} is {descriptor.rows}x{descriptor.columns} with {cell_count} cells."
|
||||
)
|
||||
|
||||
def _grid_center(self, descriptor: GridDescriptor) -> tuple[float, float]:
|
||||
width = descriptor.metadata.get("width", 0)
|
||||
height = descriptor.metadata.get("height", 0)
|
||||
return (width / 2, height / 2)
|
||||
|
||||
def _cell_center(self, cell: GridCellModel) -> tuple[float, float]:
|
||||
left, top, right, bottom = cell.bounds
|
||||
return ((left + right) / 2, (top + bottom) / 2)
|
||||
|
||||
def _distance(
|
||||
self, first: tuple[float, float], second: tuple[float, float]
|
||||
) -> float:
|
||||
return hypot(first[0] - second[0], first[1] - second[1])
|
||||
|
||||
def _match_label(
|
||||
self, cells: Sequence[GridCellModel], label: str
|
||||
) -> GridCellModel | None:
|
||||
lowered = label.lower()
|
||||
for cell in cells:
|
||||
if cell.label and lowered in cell.label.lower():
|
||||
return cell
|
||||
return None
|
||||
@@ -1,38 +0,0 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from collections import defaultdict
|
||||
from typing import Any, DefaultDict, Dict, List
|
||||
|
||||
from fastapi import WebSocket
|
||||
from websockets.exceptions import ConnectionClosedError
|
||||
|
||||
|
||||
class ScreenshotStreamer:
|
||||
"""Keeps websocket listeners and pushes screenshot updates."""
|
||||
|
||||
def __init__(self) -> None:
|
||||
self._listeners: DefaultDict[str, List[WebSocket]] = defaultdict(list)
|
||||
|
||||
async def connect(self, websocket: WebSocket, grid_id: str | None = None) -> str:
|
||||
await websocket.accept()
|
||||
key = grid_id or "*"
|
||||
self._listeners[key].append(websocket)
|
||||
return key
|
||||
|
||||
def disconnect(self, websocket: WebSocket, grid_key: str | None = None) -> None:
|
||||
key = grid_key or "*"
|
||||
sockets = self._listeners.get(key)
|
||||
if not sockets:
|
||||
return
|
||||
if websocket in sockets:
|
||||
sockets.remove(websocket)
|
||||
if not sockets:
|
||||
self._listeners.pop(key, None)
|
||||
|
||||
async def broadcast(self, grid_id: str, payload: Dict[str, Any]) -> None:
|
||||
listeners = list(self._listeners.get(grid_id, [])) + list(self._listeners.get("*", []))
|
||||
for websocket in listeners:
|
||||
try:
|
||||
await websocket.send_json(payload)
|
||||
except (ConnectionClosedError, RuntimeError):
|
||||
self.disconnect(websocket, grid_id)
|
||||
@@ -1,11 +0,0 @@
|
||||
"""Utility helpers for the Clickthrough agent skill."""
|
||||
|
||||
from .agent_runner import AgentRunResult, ClickthroughAgentRunner
|
||||
from .clickthrough_skill import ActionPlan, ClickthroughSkill
|
||||
|
||||
__all__ = [
|
||||
"ClickthroughSkill",
|
||||
"ActionPlan",
|
||||
"ClickthroughAgentRunner",
|
||||
"AgentRunResult",
|
||||
]
|
||||
@@ -1,60 +0,0 @@
|
||||
from dataclasses import dataclass
|
||||
from typing import Any, Dict
|
||||
|
||||
from .clickthrough_skill import ActionPlan, ClickthroughSkill
|
||||
|
||||
|
||||
@dataclass
|
||||
class AgentRunResult:
|
||||
summary: Dict[str, Any]
|
||||
action: Dict[str, Any]
|
||||
history: Dict[str, Any]
|
||||
grid: Dict[str, Any]
|
||||
plan_preview: Dict[str, Any]
|
||||
|
||||
|
||||
class ClickthroughAgentRunner:
|
||||
def __init__(self, skill: ClickthroughSkill) -> None:
|
||||
self.skill = skill
|
||||
|
||||
def run_once(
|
||||
self,
|
||||
screenshot_base64: str,
|
||||
width: int,
|
||||
height: int,
|
||||
rows: int = 4,
|
||||
columns: int = 4,
|
||||
preferred_label: str | None = None,
|
||||
action: str = "click",
|
||||
text: str | None = None,
|
||||
) -> AgentRunResult:
|
||||
grid = self.skill.describe_grid(
|
||||
screenshot_base64=screenshot_base64,
|
||||
width=width,
|
||||
height=height,
|
||||
rows=rows,
|
||||
columns=columns,
|
||||
)
|
||||
plan_response = self.skill.plan_with_planner(
|
||||
grid_id=grid["grid_id"],
|
||||
preferred_label=preferred_label,
|
||||
action=action,
|
||||
text=text,
|
||||
)
|
||||
plan_payload = plan_response["plan"]
|
||||
plan = ActionPlan(
|
||||
grid_id=plan_payload["grid_id"],
|
||||
target_cell=plan_payload.get("target_cell"),
|
||||
action=plan_payload["action"],
|
||||
text=plan_payload.get("text"),
|
||||
)
|
||||
action_result = self.skill.plan_action(plan)
|
||||
summary = self.skill.grid_summary(grid["grid_id"])
|
||||
history = self.skill.grid_history(grid["grid_id"])
|
||||
return AgentRunResult(
|
||||
summary=summary,
|
||||
action=action_result,
|
||||
history=history,
|
||||
grid=grid,
|
||||
plan_preview=plan_response,
|
||||
)
|
||||
@@ -1,98 +0,0 @@
|
||||
from dataclasses import dataclass
|
||||
from typing import Any, Dict
|
||||
|
||||
import httpx
|
||||
|
||||
|
||||
@dataclass
|
||||
class ActionPlan:
|
||||
grid_id: str
|
||||
target_cell: str | None
|
||||
action: str
|
||||
text: str | None = None
|
||||
|
||||
|
||||
class ClickthroughSkill:
|
||||
"""Lightweight wrapper around the Clickthrough HTTP API."""
|
||||
|
||||
def __init__(self, server_url: str = "http://localhost:8000") -> None:
|
||||
self._client = httpx.Client(base_url=server_url, timeout=10)
|
||||
|
||||
def describe_grid(
|
||||
self,
|
||||
screenshot_base64: str,
|
||||
width: int,
|
||||
height: int,
|
||||
rows: int = 4,
|
||||
columns: int = 4,
|
||||
) -> Dict[str, Any]:
|
||||
payload = {
|
||||
"width": width,
|
||||
"height": height,
|
||||
"rows": rows,
|
||||
"columns": columns,
|
||||
"screenshot_base64": screenshot_base64,
|
||||
"memo": "agent-powered grid",
|
||||
}
|
||||
response = self._client.post("/grid/init", json=payload)
|
||||
response.raise_for_status()
|
||||
return response.json()
|
||||
|
||||
def plan_action(self, plan: ActionPlan) -> Dict[str, Any]:
|
||||
payload = {
|
||||
"grid_id": plan.grid_id,
|
||||
"action": plan.action,
|
||||
"target_cell": plan.target_cell,
|
||||
"text": plan.text,
|
||||
"comment": "skill-generated plan",
|
||||
}
|
||||
response = self._client.post("/grid/action", json=payload)
|
||||
response.raise_for_status()
|
||||
return response.json()
|
||||
|
||||
def grid_summary(self, grid_id: str) -> Dict[str, Any]:
|
||||
response = self._client.get(f"/grid/{grid_id}/summary")
|
||||
response.raise_for_status()
|
||||
return response.json()
|
||||
|
||||
def grid_history(self, grid_id: str) -> Dict[str, Any]:
|
||||
response = self._client.get(f"/grid/{grid_id}/history")
|
||||
response.raise_for_status()
|
||||
return response.json()
|
||||
|
||||
def plan_with_planner(
|
||||
self,
|
||||
grid_id: str,
|
||||
preferred_label: str | None = None,
|
||||
action: str = "click",
|
||||
text: str | None = None,
|
||||
comment: str | None = None,
|
||||
) -> Dict[str, Any]:
|
||||
payload = {
|
||||
"preferred_label": preferred_label,
|
||||
"action": action,
|
||||
"text": text,
|
||||
"comment": comment or "planner-generated",
|
||||
}
|
||||
response = self._client.post(f"/grid/{grid_id}/plan", json=payload)
|
||||
response.raise_for_status()
|
||||
return response.json()
|
||||
|
||||
def refresh_grid(self, grid_id: str, screenshot_base64: str, memo: str | None = None) -> Dict[str, Any]:
|
||||
payload = {"screenshot_base64": screenshot_base64, "memo": memo}
|
||||
response = self._client.post(f"/grid/{grid_id}/refresh", json=payload)
|
||||
response.raise_for_status()
|
||||
return response.json()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
import base64
|
||||
|
||||
dummy = base64.b64encode(b"fake-screenshot").decode()
|
||||
skill = ClickthroughSkill()
|
||||
grid = skill.describe_grid(dummy, width=800, height=600)
|
||||
print("Grid cells:", len(grid.get("cells", [])))
|
||||
if grid.get("cells"):
|
||||
first_cell = grid["cells"][0]["cell_id"]
|
||||
result = skill.plan_action(ActionPlan(grid_id=grid["grid_id"], target_cell=first_cell, action="click"))
|
||||
print("Action result:", result)
|
||||
@@ -1,29 +0,0 @@
|
||||
import base64
|
||||
|
||||
import pytest
|
||||
|
||||
from server.main import manager
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def fake_screenshot() -> str:
|
||||
"""Return a reproducible base64 string representing a dummy screenshot."""
|
||||
return base64.b64encode(b"clickthrough-dummy").decode()
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def default_grid_request(fake_screenshot):
|
||||
return {
|
||||
"width": 640,
|
||||
"height": 480,
|
||||
"screenshot_base64": fake_screenshot,
|
||||
"rows": 3,
|
||||
"columns": 3,
|
||||
}
|
||||
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def reset_manager_state():
|
||||
manager._grids.clear()
|
||||
yield
|
||||
manager._grids.clear()
|
||||
@@ -1,79 +0,0 @@
|
||||
from typing import Any, Dict
|
||||
|
||||
from skill.agent_runner import ClickthroughAgentRunner
|
||||
from skill.clickthrough_skill import ActionPlan, ClickthroughSkill
|
||||
|
||||
|
||||
class DummySkill(ClickthroughSkill):
|
||||
def __init__(self):
|
||||
self.last_plan: ActionPlan | None = None
|
||||
|
||||
def describe_grid(
|
||||
self,
|
||||
screenshot_base64: str,
|
||||
width: int,
|
||||
height: int,
|
||||
rows: int = 4,
|
||||
columns: int = 4,
|
||||
) -> Dict[str, Any]:
|
||||
return {
|
||||
"grid_id": "dummy-grid",
|
||||
"cells": [
|
||||
{"cell_id": "dummy-grid-1", "label": "button", "bounds": [0, 0, 100, 100]},
|
||||
{"cell_id": "dummy-grid-2", "label": "target", "bounds": [100, 0, 200, 100]},
|
||||
],
|
||||
}
|
||||
|
||||
def plan_with_planner(
|
||||
self,
|
||||
grid_id: str,
|
||||
preferred_label: str | None = None,
|
||||
action: str = "click",
|
||||
text: str | None = None,
|
||||
comment: str | None = None,
|
||||
) -> Dict[str, Any]:
|
||||
cells = ["dummy-grid-1", "dummy-grid-2"]
|
||||
if preferred_label == "target":
|
||||
target = "dummy-grid-2"
|
||||
else:
|
||||
target = cells[len(cells) // 2]
|
||||
plan = {
|
||||
"grid_id": grid_id,
|
||||
"target_cell": target,
|
||||
"action": action,
|
||||
"text": text,
|
||||
"comment": comment,
|
||||
}
|
||||
return {
|
||||
"plan": plan,
|
||||
"result": {"success": True, "detail": "preview"},
|
||||
"descriptor": {"grid_id": grid_id},
|
||||
}
|
||||
|
||||
def plan_action(self, plan: ActionPlan) -> Dict[str, Any]:
|
||||
self.last_plan = plan
|
||||
return {"success": True, "target_cell": plan.target_cell}
|
||||
|
||||
def grid_summary(self, grid_id: str) -> Dict[str, Any]:
|
||||
return {"grid_id": grid_id, "summary": "ok"}
|
||||
|
||||
def grid_history(self, grid_id: str) -> Dict[str, Any]:
|
||||
return {"grid_id": grid_id, "history": []}
|
||||
|
||||
|
||||
def test_agent_runner_prefers_label():
|
||||
runner = ClickthroughAgentRunner(DummySkill())
|
||||
result = runner.run_once(
|
||||
screenshot_base64="AA==",
|
||||
width=120,
|
||||
height=80,
|
||||
preferred_label="target",
|
||||
)
|
||||
assert result.action["target_cell"] == "dummy-grid-2"
|
||||
assert result.summary["summary"] == "ok"
|
||||
|
||||
|
||||
def test_agent_runner_defaults_to_center():
|
||||
runner = ClickthroughAgentRunner(DummySkill())
|
||||
result = runner.run_once(screenshot_base64="AA==", width=120, height=80)
|
||||
assert result.action["target_cell"] == "dummy-grid-2"
|
||||
@@ -1,32 +0,0 @@
|
||||
from fastapi.testclient import TestClient
|
||||
|
||||
from server.main import app, manager
|
||||
|
||||
test_client = TestClient(app)
|
||||
|
||||
|
||||
def test_plan_endpoint(default_grid_request):
|
||||
init_response = test_client.post("/grid/init", json=default_grid_request)
|
||||
grid_id = init_response.json()["grid_id"]
|
||||
|
||||
plan_response = test_client.post(
|
||||
f"/grid/{grid_id}/plan",
|
||||
json={"preferred_label": None, "action": "click", "text": "hello"},
|
||||
)
|
||||
assert plan_response.status_code == 200
|
||||
payload = plan_response.json()
|
||||
assert payload["plan"]["grid_id"] == grid_id
|
||||
assert payload["result"]["success"]
|
||||
|
||||
|
||||
def test_refresh_endpoint(default_grid_request):
|
||||
init_response = test_client.post("/grid/init", json=default_grid_request)
|
||||
grid_id = init_response.json()["grid_id"]
|
||||
|
||||
refresh_response = test_client.post(
|
||||
f"/grid/{grid_id}/refresh", json={"screenshot_base64": "AAA", "memo": "updated"}
|
||||
)
|
||||
assert refresh_response.status_code == 200
|
||||
grid = manager.get_grid(grid_id)
|
||||
assert grid.screenshot == "AAA"
|
||||
assert grid.memo == "updated"
|
||||
@@ -1,51 +0,0 @@
|
||||
from server.config import ServerSettings
|
||||
from server.grid import GridManager
|
||||
from server.models import ActionPayload, ActionType, GridInitRequest
|
||||
|
||||
|
||||
def test_grid_creation_respects_dimensions(default_grid_request):
|
||||
settings = ServerSettings(grid_rows=2, grid_cols=2)
|
||||
manager = GridManager(settings)
|
||||
request = GridInitRequest(**default_grid_request)
|
||||
grid = manager.create_grid(request)
|
||||
|
||||
descriptor = grid.describe()
|
||||
assert descriptor.grid_id
|
||||
assert descriptor.rows == 3
|
||||
assert descriptor.columns == 3
|
||||
assert len(descriptor.cells) == 9
|
||||
assert descriptor.metadata.get("width") == 640
|
||||
assert descriptor.metadata.get("height") == 480
|
||||
|
||||
|
||||
def test_grid_action_records_history(default_grid_request):
|
||||
manager = GridManager(ServerSettings())
|
||||
request = GridInitRequest(**default_grid_request)
|
||||
grid = manager.create_grid(request)
|
||||
descriptor = grid.describe()
|
||||
target_cell = descriptor.cells[0].cell_id
|
||||
|
||||
payload = ActionPayload(
|
||||
grid_id=descriptor.grid_id,
|
||||
action=ActionType.CLICK,
|
||||
target_cell=target_cell,
|
||||
comment="click test",
|
||||
)
|
||||
result = grid.apply_action(payload)
|
||||
|
||||
assert result.success
|
||||
assert result.coordinates is not None
|
||||
assert grid.action_history[-1]["coordinates"] == result.coordinates
|
||||
|
||||
|
||||
def test_manager_get_grid_missing(default_grid_request):
|
||||
manager = GridManager(ServerSettings())
|
||||
request = GridInitRequest(**default_grid_request)
|
||||
_ = manager.create_grid(request)
|
||||
|
||||
try:
|
||||
manager.get_grid("does-not-exist")
|
||||
found = True
|
||||
except KeyError:
|
||||
found = False
|
||||
assert not found
|
||||
@@ -1,32 +0,0 @@
|
||||
from server.config import ServerSettings
|
||||
from server.grid import GridManager
|
||||
from server.planner import GridPlanner
|
||||
from server.models import ActionType, GridInitRequest
|
||||
|
||||
|
||||
def test_planner_preferred_label(default_grid_request):
|
||||
settings = ServerSettings()
|
||||
manager = GridManager(settings)
|
||||
request = GridInitRequest(**default_grid_request)
|
||||
grid = manager.create_grid(request)
|
||||
descriptor = grid.describe()
|
||||
descriptor.cells[0].label = "target"
|
||||
|
||||
planner = GridPlanner()
|
||||
payload = planner.build_payload(descriptor, preferred_label="target", action=ActionType.CLICK)
|
||||
|
||||
assert payload.target_cell == descriptor.cells[0].cell_id
|
||||
|
||||
|
||||
def test_planner_falls_back_to_center(default_grid_request):
|
||||
settings = ServerSettings()
|
||||
manager = GridManager(settings)
|
||||
request = GridInitRequest(**default_grid_request)
|
||||
grid = manager.create_grid(request)
|
||||
descriptor = grid.describe()
|
||||
|
||||
planner = GridPlanner()
|
||||
payload = planner.build_payload(descriptor, action=ActionType.CLICK)
|
||||
|
||||
assert payload.target_cell is not None
|
||||
assert payload.grid_id == descriptor.grid_id
|
||||
@@ -1,41 +0,0 @@
|
||||
import asyncio
|
||||
|
||||
from server.streamer import ScreenshotStreamer
|
||||
|
||||
|
||||
class DummyWebSocket:
|
||||
def __init__(self):
|
||||
self.sent = []
|
||||
self.accepted = False
|
||||
|
||||
async def accept(self) -> None:
|
||||
self.accepted = True
|
||||
|
||||
async def send_json(self, payload):
|
||||
self.sent.append(payload)
|
||||
|
||||
|
||||
def test_streamer_broadcasts_to_grid():
|
||||
streamer = ScreenshotStreamer()
|
||||
socket = DummyWebSocket()
|
||||
|
||||
async def scenario():
|
||||
key = await streamer.connect(socket, "grid-123")
|
||||
await streamer.broadcast("grid-123", {"frame": 1})
|
||||
streamer.disconnect(socket, key)
|
||||
|
||||
asyncio.run(scenario())
|
||||
assert socket.sent == [{"frame": 1}]
|
||||
|
||||
|
||||
def test_streamer_wildcard_listener_receives_updates():
|
||||
streamer = ScreenshotStreamer()
|
||||
socket = DummyWebSocket()
|
||||
|
||||
async def scenario():
|
||||
key = await streamer.connect(socket, None)
|
||||
await streamer.broadcast("grid-456", {"frame": 2})
|
||||
streamer.disconnect(socket, key)
|
||||
|
||||
asyncio.run(scenario())
|
||||
assert socket.sent == [{"frame": 2}]
|
||||
@@ -1,12 +0,0 @@
|
||||
from fastapi.testclient import TestClient
|
||||
|
||||
from server.main import app
|
||||
|
||||
|
||||
test_client = TestClient(app)
|
||||
|
||||
|
||||
def test_ui_root_serves_index():
|
||||
response = test_client.get("/ui/")
|
||||
assert response.status_code == 200
|
||||
assert "Clickthrough Control" in response.text
|
||||
Reference in New Issue
Block a user