docs(skill): explain screenshot analysis with image tool
All checks were successful
python-syntax / syntax-check (push) Successful in 11s
All checks were successful
python-syntax / syntax-check (push) Successful in 11s
This commit is contained in:
@@ -46,6 +46,9 @@ Query params:
|
||||
Default response includes base64 image and metadata (`meta.region`, `meta.screen`, `meta.displays`, optional `meta.grid`).
|
||||
`meta.region` uses global desktop coordinates.
|
||||
|
||||
These image-returning endpoints do not magically grant the agent live vision.
|
||||
If the caller needs visual interpretation beyond OCR, pass the returned screenshot to OpenClaw's `image` tool and ask a narrow question about the visible UI.
|
||||
|
||||
## `POST /zoom`
|
||||
|
||||
Body:
|
||||
@@ -72,6 +75,8 @@ Query params:
|
||||
|
||||
Default response returns cropped image + region metadata in global pixel coordinates. `center_x` and `center_y` are also global coordinates; use the selected display's `meta.region` from `/screen?screen=X` as the coordinate base.
|
||||
|
||||
`POST /zoom` is often the best screenshot to hand to the `image` tool when the agent needs help judging a specific button, icon, or dialog layout.
|
||||
|
||||
## `POST /action`
|
||||
|
||||
Body: one action.
|
||||
|
||||
Reference in New Issue
Block a user