docs(skill): explain screenshot analysis with image tool
All checks were successful
python-syntax / syntax-check (push) Successful in 11s
All checks were successful
python-syntax / syntax-check (push) Successful in 11s
This commit is contained in:
@@ -44,6 +44,8 @@ For OCR support, install the native `tesseract` binary on the host (in addition
|
||||
Important:
|
||||
- `POST /action` expects an `action` plus a `target` object; do not send raw top-level `x` / `y` fields.
|
||||
- Pixel coordinates and OCR bounding boxes are always global desktop coordinates.
|
||||
- The agent does **not** inherently see the remote desktop; it reasons from screenshots, OCR, and window metadata.
|
||||
- When OCR is not enough, pair Clickthrough screenshots with OpenClaw's `image` tool for explicit screenshot interpretation.
|
||||
- Prefer structured GUI interaction first; use `/windows`, `/launch`, `/wait`, and `/action` before reaching for `/exec`.
|
||||
|
||||
See:
|
||||
|
||||
Reference in New Issue
Block a user