clickthrough/skill/SKILL.md

---
name: clickthrough-http-control
description: Drive GUI apps with Clickthrough v2 observe/localize/act APIs. Use image-tool point localization for ambiguous targets and avoid full-screen OCR loops.
---

# Clickthrough HTTP Control (v2)

Agents do not see live desktop video. They operate on snapshots.
Use this loop: **observe -> localize -> act -> verify**.

## Fast defaults

- Start with `POST /v2/observe` on a tight region, not full screen.
- Set `ocr_mode` to `none` unless text is required immediately.
- Use `image` tool localization for icon-heavy or dense controls.
- Use `POST /v2/act-verify` instead of manual sleep/poll loops.

## Mandatory image-tool click localization

When OCR is weak or ambiguous, ask image tool for one coordinate in bounds.

Prompt template:
- "Return one click point as JSON `{\"x\":<int>,\"y\":<int>}` inside this image (`width=W`, `height=H`) for the **<exact target>** control."

Rules:
- Ask for one point only.
- Include bounds in the prompt.
- If answer is not parseable `x,y`, re-ask once with stricter format.
- Send returned point to `POST /v2/localize` via `image_tool_point`.

## API playbook

1. **Observe**

```json
POST /v2/observe?screen=0
{
  "mode": "region",
  "region_x": 820,
  "region_y": 420,
  "region_width": 700,
  "region_height": 420,
  "include_image": true,
  "ocr_mode": "none"
}
```

2. **Localize** (choose one)

Text:
```json
POST /v2/localize
{"observation_id":"...","text_query":"Save","text_match":"exact"}
```

Image-tool point:
```json
POST /v2/localize
{"observation_id":"...","image_tool_point":{"x":312,"y":188}}
```

3. **Act**

```json
POST /v2/act?screen=0
{"action":{"action":"click","target":{"resolved_target_id":"..."}}}
```

4. **Verify**

```json
POST /v2/act-verify?screen=0
{
  "action":{"action":"click","target":{"resolved_target_id":"..."}},
  "condition":{"kind":"visual","state":"change","region_x":820,"region_y":420,"region_width":700,"region_height":420},
  "risk_level":"low"
}
```

## Risk policy

- Low risk (navigation, focus, benign clicks): single verification signal.
- High risk (delete/send/purchase/close-lossy): use `risk_level=high` and require two checks before act.
- Never do speculative repeat clicks; switch strategy after one failed verify.

## Anti-latency rules

- Never repeat full-screen OCR by default.
- Re-observe only the active pane/region.
- Prefer keyboard + window APIs for app switching.
- Use OCR on region only and cap area with `max_ocr_area_px`.

## Setup and auth

- Include `x-clickthrough-token` when token auth is enabled.
- `/exec` additionally requires `x-clickthrough-exec-secret`.
- Validate server first: `GET /health`.