2026-04-05 19:15:12 +02:00
2026-04-05 19:15:12 +02:00
2026-04-05 19:15:12 +02:00
2026-03-28 18:58:45 +01:00
2026-04-05 19:15:12 +02:00
2026-04-05 19:15:12 +02:00

Clickthrough

Let an Agent interact with your Computer.

Clickthrough is a proof-of-concept bridge between a vision-aware agent and a headless controller. The project is split into two halves:

  1. A Python server that accepts a static grid overlay (think of a screenshot broken into cells) and exposes lightweight endpoints to ask questions, plan actions, or even run pointer/keyboard events.
  2. A skill that bundles the HTTP calls/intent construction so we can hardwire the same flow inside OpenClaw later.

Server surface (FastAPI)

  • POST /grid/init: Accepts a base64 screenshot plus the requested rows/columns, returns a grid_id, cell bounds, and helpful metadata. The grid is stored in-memory so the agent can reference cells by ID in later actions.
  • POST /grid/action: Takes a plan (grid_id, optional target cell, and an action like click/drag/type) and returns a structured ActionResult with computed coordinates for tooling to consume.
  • GET /health: A minimal health check for deployments.

The server tracks each grid by a UUID and keeps layout metadata so multiple agents can keep in sync with the same screenshot/scene.

Skill layer (OpenClaw integration)

The skill/ package is a placeholder for how an agent action would look in OpenClaw. It wraps the server calls, interprets the grid cells, and exposes helpers such as describe_grid() and plan_action() so future work can plug into the agent toolkit directly.

Getting started

  1. Install dependencies: python -m pip install -r requirements.txt.
  2. Run the server: uvicorn server.main:app --reload.
  3. Use the skill helper to bootstrap a grid, or wire the REST endpoints into a higher-level agent.

Next steps

  • Add real OCR/layout logic so cells understand labels.
  • Turn the action planner into a state machine that can focus/double-click/type/drag.
  • Persist grid sessions for longer running interactions.
  • Ship the OpenClaw skill (skill folder) as a plugin that can call http://localhost:8000 and scaffold the agents reasoning.
Description
No description provided
Readme MIT 146 KiB
Languages
Python 100%