init

2026-04-05 19:15:12 +02:00
parent 101753fa14
commit a2ef50401b
10 changed files with 347 additions and 1 deletions
--- a/README.md
+++ b/README.md
@@ -1,2 +1,33 @@
 # Clickthrough
-Let an Agent interact with your Computer.
+
+Let an Agent interact with your Computer.
+
+`Clickthrough` is a proof-of-concept bridge between a vision-aware agent and a headless controller. The project is split into two halves:
+
+1. A Python server that accepts a static grid overlay (think of a screenshot broken into cells) and exposes lightweight endpoints to ask questions, plan actions, or even run pointer/keyboard events.
+2. A **skill** that bundles the HTTP calls/intent construction so we can hardwire the same flow inside OpenClaw later.
+
+## Server surface (FastAPI)
+
+- `POST /grid/init`: Accepts a base64 screenshot plus the requested rows/columns, returns a `grid_id`, cell bounds, and helpful metadata. The grid is stored in-memory so the agent can reference cells by ID in later actions.
+- `POST /grid/action`: Takes a plan (`grid_id`, optional target cell, and an action like `click`/`drag`/`type`) and returns a structured `ActionResult` with computed coordinates for tooling to consume.
+- `GET /health`: A minimal health check for deployments.
+
+The server tracks each grid by a UUID and keeps layout metadata so multiple agents can keep in sync with the same screenshot/scene.
+
+## Skill layer (OpenClaw integration)
+
+The `skill/` package is a placeholder for how an agent action would look in OpenClaw. It wraps the server calls, interprets the grid cells, and exposes helpers such as `describe_grid()` and `plan_action()` so future work can plug into the agent toolkit directly.
+
+## Getting started
+
+1. Install dependencies: `python -m pip install -r requirements.txt`.
+2. Run the server: `uvicorn server.main:app --reload`.
+3. Use the skill helper to bootstrap a grid, or wire the REST endpoints into a higher-level agent.
+
+## Next steps
+
+- Add real OCR/layout logic so cells understand labels.
+- Turn the action planner into a state machine that can focus/double-click/type/drag.
+- Persist grid sessions for longer running interactions.
+- Ship the OpenClaw skill (skill folder) as a plugin that can call `http://localhost:8000` and scaffold the agent’s reasoning.