docs(skill): explain screenshot analysis with image tool
All checks were successful
python-syntax / syntax-check (push) Successful in 11s

This commit is contained in:
2026-05-01 16:03:43 +02:00
parent 5122d416e8
commit b5fdd82494
3 changed files with 69 additions and 3 deletions

View File

@@ -44,6 +44,8 @@ For OCR support, install the native `tesseract` binary on the host (in addition
Important:
- `POST /action` expects an `action` plus a `target` object; do not send raw top-level `x` / `y` fields.
- Pixel coordinates and OCR bounding boxes are always global desktop coordinates.
- The agent does **not** inherently see the remote desktop; it reasons from screenshots, OCR, and window metadata.
- When OCR is not enough, pair Clickthrough screenshots with OpenClaw's `image` tool for explicit screenshot interpretation.
- Prefer structured GUI interaction first; use `/windows`, `/launch`, `/wait`, and `/action` before reaching for `/exec`.
See: