Archived

This repository has been archived on 2026-05-20. You can view files and clone it. You cannot open issues or pull requests or push a commit.

Go to file

Luna a2ef50401b init

2026-04-05 19:15:12 +02:00

server

init

2026-04-05 19:15:12 +02:00

skill

init

2026-04-05 19:15:12 +02:00

.gitignore

Added .gitignore

2026-03-28 18:58:45 +01:00

README.md

init

2026-04-05 19:15:12 +02:00

requirements.txt

init

2026-04-05 19:15:12 +02:00

README.md

Clickthrough

Let an Agent interact with your Computer.

Clickthrough is a proof-of-concept bridge between a vision-aware agent and a headless controller. The project is split into two halves:

A Python server that accepts a static grid overlay (think of a screenshot broken into cells) and exposes lightweight endpoints to ask questions, plan actions, or even run pointer/keyboard events.
A skill that bundles the HTTP calls/intent construction so we can hardwire the same flow inside OpenClaw later.

Server surface (FastAPI)

POST /grid/init: Accepts a base64 screenshot plus the requested rows/columns, returns a grid_id, cell bounds, and helpful metadata. The grid is stored in-memory so the agent can reference cells by ID in later actions.
POST /grid/action: Takes a plan (grid_id, optional target cell, and an action like click/drag/type) and returns a structured ActionResult with computed coordinates for tooling to consume.
GET /health: A minimal health check for deployments.

The server tracks each grid by a UUID and keeps layout metadata so multiple agents can keep in sync with the same screenshot/scene.

Skill layer (OpenClaw integration)

The skill/ package is a placeholder for how an agent action would look in OpenClaw. It wraps the server calls, interprets the grid cells, and exposes helpers such as describe_grid() and plan_action() so future work can plug into the agent toolkit directly.

Getting started

Install dependencies: python -m pip install -r requirements.txt.
Run the server: uvicorn server.main:app --reload.
Use the skill helper to bootstrap a grid, or wire the REST endpoints into a higher-level agent.

Next steps

Add real OCR/layout logic so cells understand labels.
Turn the action planner into a state machine that can focus/double-click/type/drag.
Persist grid sessions for longer running interactions.
Ship the OpenClaw skill (skill folder) as a plugin that can call http://localhost:8000 and scaffold the agent’s reasoning.

README.md Unescape Escape

Clickthrough

Server surface (FastAPI)

Skill layer (OpenClaw integration)

Getting started

Next steps

README.md