docs(skill): explain screenshot analysis with image tool
All checks were successful
python-syntax / syntax-check (push) Successful in 11s

This commit is contained in:
2026-05-01 16:03:43 +02:00
parent 5122d416e8
commit b5fdd82494
3 changed files with 69 additions and 3 deletions

View File

@@ -46,6 +46,9 @@ Query params:
Default response includes base64 image and metadata (`meta.region`, `meta.screen`, `meta.displays`, optional `meta.grid`).
`meta.region` uses global desktop coordinates.
These image-returning endpoints do not magically grant the agent live vision.
If the caller needs visual interpretation beyond OCR, pass the returned screenshot to OpenClaw's `image` tool and ask a narrow question about the visible UI.
## `POST /zoom`
Body:
@@ -72,6 +75,8 @@ Query params:
Default response returns cropped image + region metadata in global pixel coordinates. `center_x` and `center_y` are also global coordinates; use the selected display's `meta.region` from `/screen?screen=X` as the coordinate base.
`POST /zoom` is often the best screenshot to hand to the `image` tool when the agent needs help judging a specific button, icon, or dialog layout.
## `POST /action`
Body: one action.