docs(skill): explain screenshot analysis with image tool

2026-05-01 16:03:43 +02:00
parent 5122d416e8
commit b5fdd82494
3 changed files with 69 additions and 3 deletions
--- a/README.md
+++ b/README.md
@@ -44,6 +44,8 @@ For OCR support, install the native `tesseract` binary on the host (in addition
 Important:
 - `POST /action` expects an `action` plus a `target` object; do not send raw top-level `x` / `y` fields.
 - Pixel coordinates and OCR bounding boxes are always global desktop coordinates.
+- The agent does **not** inherently see the remote desktop; it reasons from screenshots, OCR, and window metadata.
+- When OCR is not enough, pair Clickthrough screenshots with OpenClaw's `image` tool for explicit screenshot interpretation.
 - Prefer structured GUI interaction first; use `/windows`, `/launch`, `/wait`, and `/action` before reaching for `/exec`.

 See: