docs(skill): add verify-first workflow and app-specific playbooks
All checks were successful
python-syntax / syntax-check (push) Successful in 9s
All checks were successful
python-syntax / syntax-check (push) Successful in 9s
This commit is contained in:
3
TODO.md
3
TODO.md
@@ -21,5 +21,6 @@
|
|||||||
- [x] Add exec configuration via env (`CLICKTHROUGH_EXEC_*`)
|
- [x] Add exec configuration via env (`CLICKTHROUGH_EXEC_*`)
|
||||||
- [x] Document exec API + config
|
- [x] Document exec API + config
|
||||||
- [x] Create backlog issues for OCR/find/window/input/session-state improvements
|
- [x] Create backlog issues for OCR/find/window/input/session-state improvements
|
||||||
- [ ] Open PR for exec feature branch and review/merge
|
- [x] Open PR for exec feature branch and review/merge
|
||||||
- [x] Require configured exec secret + per-request exec secret header
|
- [x] Require configured exec secret + per-request exec secret header
|
||||||
|
- [x] Upgrade skill with verify-before-click rules, confidence thresholds, two-phase risky actions, and Spotify playbook
|
||||||
|
|||||||
@@ -7,24 +7,36 @@ description: Control a local computer through the Clickthrough HTTP server using
|
|||||||
|
|
||||||
Use a strict observe-decide-act-verify loop.
|
Use a strict observe-decide-act-verify loop.
|
||||||
|
|
||||||
## Workflow
|
## Core workflow (mandatory)
|
||||||
|
|
||||||
1. Call `GET /screen` with coarse grid (e.g., 12x12).
|
1. Call `GET /screen` with coarse grid (e.g., 12x12).
|
||||||
2. Identify likely cell/region for the target UI element.
|
2. Identify likely target region and compute an initial confidence score.
|
||||||
3. If confidence is low, call `POST /zoom` centered on the candidate and use denser grid (e.g., 20x20).
|
3. If confidence < 0.85, call `POST /zoom` with denser grid (e.g., 20x20) and re-evaluate.
|
||||||
4. Execute one minimal action via `POST /action`.
|
4. **Before any click**, verify target identity (text/icon/location consistency).
|
||||||
5. Re-capture with `GET /screen` and verify the expected state change.
|
5. Execute one minimal action via `POST /action`.
|
||||||
6. Repeat until objective is complete.
|
6. Re-capture with `GET /screen` and verify the expected state change.
|
||||||
|
7. Repeat until objective is complete.
|
||||||
|
|
||||||
|
## Verify-before-click rules
|
||||||
|
|
||||||
|
- Never click if target identity is ambiguous.
|
||||||
|
- Require at least two matching signals before click (example: expected text + expected UI region).
|
||||||
|
- If confidence is low, do not "test click"; zoom and re-localize first.
|
||||||
|
- For high-impact actions (close/delete/send/purchase), use two-phase flow:
|
||||||
|
1) preview intended coordinate + reason
|
||||||
|
2) execute only after explicit confirmation.
|
||||||
|
|
||||||
## Precision rules
|
## Precision rules
|
||||||
|
|
||||||
- Prefer grid targets first, then use `dx/dy` for subcell precision.
|
- Prefer grid targets first, then use `dx/dy` for subcell precision.
|
||||||
- Keep `dx/dy` in `[-1,1]`; start at `0,0` and only offset when needed.
|
- Keep `dx/dy` in `[-1,1]`; start at `0,0` and only offset when needed.
|
||||||
- Use zoom before guessing offsets.
|
- Use zoom before guessing offsets.
|
||||||
|
- Avoid stale coordinates: re-capture before action if UI moved/scrolled.
|
||||||
|
|
||||||
## Safety rules
|
## Safety rules
|
||||||
|
|
||||||
- Respect `dry_run` and `allowed_region` restrictions from `/health`.
|
- Respect `dry_run` and `allowed_region` restrictions from `/health`.
|
||||||
|
- Respect `/exec` security requirements (`CLICKTHROUGH_EXEC_SECRET` + `x-clickthrough-exec-secret`).
|
||||||
- Avoid destructive shortcuts unless explicitly requested.
|
- Avoid destructive shortcuts unless explicitly requested.
|
||||||
- Send one action at a time unless deterministic; then use `/batch`.
|
- Send one action at a time unless deterministic; then use `/batch`.
|
||||||
|
|
||||||
@@ -33,3 +45,20 @@ Use a strict observe-decide-act-verify loop.
|
|||||||
- After every meaningful action, verify with a fresh screenshot.
|
- After every meaningful action, verify with a fresh screenshot.
|
||||||
- On mismatch, do not spam clicks: zoom, re-localize, and retry once.
|
- On mismatch, do not spam clicks: zoom, re-localize, and retry once.
|
||||||
- Prefer short, reversible actions over long macros.
|
- Prefer short, reversible actions over long macros.
|
||||||
|
- If two retries fail, switch strategy (hotkey/window focus/search) instead of repeating the same click.
|
||||||
|
|
||||||
|
## App-specific playbooks (recommended)
|
||||||
|
|
||||||
|
Build per-app routines for repetitive tasks instead of generic clicking.
|
||||||
|
|
||||||
|
### Spotify playbook
|
||||||
|
|
||||||
|
- Focus app window before search/navigation.
|
||||||
|
- Prefer keyboard-first flow for song start:
|
||||||
|
1) `Ctrl+L` (search)
|
||||||
|
2) type exact query
|
||||||
|
3) Enter
|
||||||
|
4) verify exact song+artist text
|
||||||
|
5) click/double-click row
|
||||||
|
6) verify now-playing bar
|
||||||
|
- If now-playing does not match target track, stop and re-localize; do not keep clicking nearby rows.
|
||||||
|
|||||||
Reference in New Issue
Block a user