diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
deleted file mode 100644
index 35f817c..0000000
--- a/.github/workflows/ci.yml
+++ /dev/null
@@ -1,23 +0,0 @@
-name: CI
-
-on:
- push: {}
- pull_request: {}
-
-jobs:
- test:
- runs-on: ubuntu-latest
- steps:
- - uses: actions/checkout@v4
- - name: Set up Python
- uses: actions/setup-python@v5
- with:
- python-version: 3.11
- - name: Install runtime dependencies
- run: python -m pip install --upgrade pip && pip install -r requirements.txt
- - name: Install dev dependencies
- run: pip install -r requirements-dev.txt
- - name: Run lints
- run: ruff check server skill tests
- - name: Run tests
- run: pytest
diff --git a/README.md b/README.md
index a77e716..5a2c84e 100644
--- a/README.md
+++ b/README.md
@@ -1,69 +1,2 @@
# Clickthrough
-
-Let an Agent interact with your Computer.
-
-`Clickthrough` is a proof-of-concept bridge between a vision-aware agent and a headless controller. The project is split into two halves:
-
-1. A Python server that accepts a static grid overlay (think of a screenshot broken into cells) and exposes lightweight endpoints to ask questions, plan actions, or even run pointer/keyboard events.
-2. A **skill** that bundles the HTTP calls/intent construction so we can hardwire the same flow inside OpenClaw later.
-
-## Server surface (FastAPI)
-
-- `POST /grid/init`: Accepts a base64 screenshot plus the requested rows/columns, returns a `grid_id`, cell bounds, and helpful metadata. The grid is stored in-memory so the agent can reference cells by ID in later actions.
-- `POST /grid/action`: Takes a plan (`grid_id`, optional target cell, and an action like `click`/`drag`/`type`) and returns a structured `ActionResult` with computed coordinates for tooling to consume.
-- `GET /grid/{grid_id}/summary`: Returns both a heuristic description (`GridPlanner`) and a rich descriptor so the skill can summarize what it sees.
-- `GET /grid/{grid_id}/history`: Streams back the action history for that grid so an agent or operator can audit what was done.
-- `POST /grid/{grid_id}/plan`: Lets `GridPlanner` select the target and return a preview action plan without committing to it, so we can inspect coordinates before triggering events.
-- `POST /grid/{grid_id}/refresh` + `GET /stream/screenshots`: Refresh the cached screenshot/metadata and broadcast the updated scene over a websocket so clients can redraw overlays in near real time.
-- `GET /health`: A minimal health check for deployments.
-
-Vision metadata is kept on a per-grid basis, including history, layout dimensions, and any appended memo. Each `VisionGrid` also exposes a short textual summary so the skill layer can turn sensory data into sentences directly.
-
-## Skill layer (OpenClaw integration)
-
-The `skill/` package wraps the server calls and exposes helpers:
-
-- `ClickthroughSkill.describe_grid()` builds a grid session and returns the descriptor.
-- `ClickthroughSkill.plan_action()` drives the `/grid/action` endpoint.
-- `ClickthroughSkill.plan_with_planner()` calls `/grid/{grid_id}/plan`, so you can preview the `GridPlanner` suggestion before executing it.
-- `ClickthroughSkill.grid_summary()` and `.grid_history()` surface the new metadata endpoints.
-- `ClickthroughSkill.refresh_grid()` pushes a new screenshot and memo, triggering websocket listeners.
-- `ClickthroughAgentRunner` simulates a tiny agent loop that asks the planner for a preview, executes the resulting action, and then gathers the summary/history so you can iterate on reasoning loops in tests.
-
-Future work can swap the stub runner for a full OpenClaw skill that keeps reasoning inside the agent and uses these primitives to steer the mouse/keyboard.
-
-## Screenshot streaming
-
-Capture loops can now talk to FastAPI in two ways:
-
-1. POST `/grid/{grid_id}/refresh` with fresh base64 screenshots and an optional memo; the server updates the cached grid metadata and broadcasts the change.
-2. Open a websocket to `GET /stream/screenshots` (optionally passing `grid_id` as a query param) to receive realtime deltas whenever a refresh happens. Clients can use the descriptor/payload to redraw overlays or trigger new planner runs without polling.
-
-## Testing
-
-1. `python3 -m pip install -r requirements.txt`
-2. `python3 -m pip install -r requirements-dev.txt`
-3. `python3 -m pytest`
-
-The `tests/` suite covers grid construction, the FastAPI surface, and the skill/runner helpers.
-
-## Continuous Integration
-
-`.github/workflows/ci.yml` runs on pushes and PRs:
-
-- Checks out the repo and sets up Python 3.11.
-- Installs dependencies (`requirements.txt` + `requirements-dev.txt`).
-- Runs `ruff check` over the Python packages.
-- Executes `pytest` to keep coverage high.
-
-## Control UI
-
-- `/ui/` serves a small control panel where you can bootstrap a grid from a base64 screenshot, ask the planner for a preview, execute clicks, refresh the screenshot, and watch the summary/history.
-- Most traffic is HTTP: `/grid/init`, `/grid/{id}/plan`, `/grid/{id}/action`, `/grid/{id}/refresh`, `/grid/{id}/summary`, and `/grid/{id}/history`. Only the `/stream/screenshots` websocket pushes updates after a refresh so the overlay redraws.
-- The FastAPI root now redirects to `/ui/` when the client assets are present, making the UI a lightweight entry point for demos or manual command-and-control work.
-
-## Next steps
-
-- Add OCR or UI heuristics so grid cells have meaningful labels before the agent reasons about them.
-- Persist grids and histories in a lightweight store so long-running sessions survive restarts.
-- Expand the UI to preview actions visually (perhaps overlaying cells on top of rendered screenshots).
+Let an Agent interact with your Computer.
\ No newline at end of file
diff --git a/client/app.js b/client/app.js
deleted file mode 100644
index 8fd6764..0000000
--- a/client/app.js
+++ /dev/null
@@ -1,159 +0,0 @@
-const gridForm = document.getElementById("grid-form");
-const descriptorEl = document.getElementById("descriptor");
-const gridMetaEl = document.getElementById("grid-meta");
-const summaryEl = document.getElementById("summary");
-const historyEl = document.getElementById("history");
-const planOutput = document.getElementById("plan-output");
-const preferredInput = document.getElementById("preferred-label");
-const refreshScreenshot = document.getElementById("refresh-screenshot");
-const refreshMemo = document.getElementById("refresh-memo");
-const logEl = document.getElementById("ws-log");
-
-let currentGrid = null;
-let lastPlan = null;
-let ws = null;
-let keepAliveId = null;
-
-const log = (message) => {
- const timestamp = new Date().toLocaleTimeString();
- logEl.textContent = `[${timestamp}] ${message}\n${logEl.textContent}`;
-};
-
-const headers = {
- "Content-Type": "application/json",
-};
-
-const subscribeToGrid = (gridId) => {
- if (!gridId) return;
- if (ws) {
- ws.close();
- }
- const protocol = window.location.protocol === "https:" ? "wss" : "ws";
- ws = new WebSocket(`${protocol}://${window.location.host}/stream/screenshots?grid_id=${gridId}`);
-
- ws.addEventListener("open", () => {
- log(`WebSocket listening for grid ${gridId}`);
- ws.send("ready");
- keepAliveId = setInterval(() => ws.send("ping"), 15000);
- });
-
- ws.addEventListener("message", (event) => {
- log(`Update received → ${event.data}`);
- });
-
- ws.addEventListener("close", () => {
- log("WebSocket disconnected");
- if (keepAliveId) {
- clearInterval(keepAliveId);
- keepAliveId = null;
- }
- });
-};
-
-const updateDescriptor = (descriptor) => {
- descriptorEl.textContent = JSON.stringify(descriptor, null, 2);
- gridMetaEl.textContent = `Grid ${descriptor.grid_id} (${descriptor.rows}x${descriptor.columns}) · ${descriptor.cells.length} cells`;
-};
-
-const updateSummary = async () => {
- if (!currentGrid) return;
- const [summaryResponse, historyResponse] = await Promise.all([
- fetch(`/grid/${currentGrid}/summary`),
- fetch(`/grid/${currentGrid}/history`),
- ]);
-
- if (summaryResponse.ok) {
- const payload = await summaryResponse.json();
- summaryEl.textContent = payload.summary;
- }
-
- if (historyResponse.ok) {
- const payload = await historyResponse.json();
- historyEl.textContent = JSON.stringify(payload.history, null, 2);
- }
-};
-
-const initGrid = async (event) => {
- event.preventDefault();
- const formData = new FormData(gridForm);
- const payload = {
- width: Number(formData.get("width")),
- height: Number(formData.get("height")),
- rows: Number(formData.get("rows")),
- columns: Number(formData.get("columns")),
- screenshot_base64: formData.get("screenshot"),
- };
- const response = await fetch("/grid/init", {
- method: "POST",
- headers,
- body: JSON.stringify(payload),
- });
- const descriptor = await response.json();
- currentGrid = descriptor.grid_id;
- updateDescriptor(descriptor);
- await updateSummary();
- subscribeToGrid(currentGrid);
- planOutput.textContent = "Plan preview will appear here.";
- log(`Grid ${currentGrid} initialized.`);
-};
-
-document.getElementById("plan-button").addEventListener("click", async () => {
- if (!currentGrid) {
- log("Initialize a grid first.");
- return;
- }
- const response = await fetch(`/grid/${currentGrid}/plan`, {
- method: "POST",
- headers,
- body: JSON.stringify({
- preferred_label: preferredInput.value || null,
- action: "click",
- text: "ui-trigger",
- }),
- });
- const result = await response.json();
- lastPlan = result.plan;
- planOutput.textContent = JSON.stringify(result, null, 2);
-});
-
-document.getElementById("run-action").addEventListener("click", async () => {
- if (!lastPlan) {
- log("Run the planner first.");
- return;
- }
- const payload = {
- grid_id: lastPlan.grid_id,
- action: lastPlan.action,
- target_cell: lastPlan.target_cell,
- text: "from-ui",
- comment: "UI action",
- };
- const response = await fetch("/grid/action", {
- method: "POST",
- headers,
- body: JSON.stringify(payload),
- });
- const result = await response.json();
- log(`Action ${result.detail} at ${result.coordinates}`);
- await updateSummary();
-});
-
-document.getElementById("refresh-button").addEventListener("click", async () => {
- if (!currentGrid) {
- log("Start a grid first.");
- return;
- }
- const payload = {
- screenshot_base64: refreshScreenshot.value || "",
- memo: refreshMemo.value || undefined,
- };
- const response = await fetch(`/grid/${currentGrid}/refresh`, {
- method: "POST",
- headers,
- body: JSON.stringify(payload),
- });
- const data = await response.json();
- log(`Refresh acknowledged: ${JSON.stringify(data)}`);
-});
-
-gridForm.addEventListener("submit", initGrid);
diff --git a/client/index.html b/client/index.html
deleted file mode 100644
index 86206b9..0000000
--- a/client/index.html
+++ /dev/null
@@ -1,85 +0,0 @@
-
-
-
-
- Clickthrough Control
-
-
-
-
-
-
Clickthrough Control Panel
-
Most actions use HTTP; screenshots stream over WebSocket when refreshed.
-
-
-
-
Grid bootstrap
-
-
-
-
-
Grid status
-
No grid yet.
-
-
-
-
-
Planner & Actions
-
-
-
-
-
-
Plan preview will appear here.
-
-
-
-
Refresh Screenshot
-
-
-
-
Refresh triggers /stream/screenshots so the UI can redraw.