docs(skill): explain screenshot analysis with image tool

2026-05-01 16:03:43 +02:00
parent 5122d416e8
commit b5fdd82494
3 changed files with 69 additions and 3 deletions
--- a/docs/API.md
+++ b/docs/API.md
@@ -46,6 +46,9 @@ Query params:
 Default response includes base64 image and metadata (`meta.region`, `meta.screen`, `meta.displays`, optional `meta.grid`).
 `meta.region` uses global desktop coordinates.

+These image-returning endpoints do not magically grant the agent live vision.
+If the caller needs visual interpretation beyond OCR, pass the returned screenshot to OpenClaw's `image` tool and ask a narrow question about the visible UI.
+
 ## `POST /zoom`

 Body:
@@ -72,6 +75,8 @@ Query params:

 Default response returns cropped image + region metadata in global pixel coordinates. `center_x` and `center_y` are also global coordinates; use the selected display's `meta.region` from `/screen?screen=X` as the coordinate base.

+`POST /zoom` is often the best screenshot to hand to the `image` tool when the agent needs help judging a specific button, icon, or dialog layout.
+
 ## `POST /action`

 Body: one action.