Add pytesseract OCR, click_text interact action, and interact verify endpoint
All checks were successful
python-syntax / syntax-check (push) Successful in 6s
All checks were successful
python-syntax / syntax-check (push) Successful in 6s
This commit is contained in:
@@ -5,21 +5,24 @@ description: Use 3 methods to control a computer: see (screenshot+grid), interac
|
||||
|
||||
# Clickthrough Computer Control
|
||||
|
||||
Use exactly 3 methods:
|
||||
Use these methods:
|
||||
- `see`
|
||||
- `interact`
|
||||
- `interact/verify`
|
||||
- `exec`
|
||||
|
||||
## Method 1: See
|
||||
|
||||
Use `POST /see` to capture full screen or a region with a grid overlay.
|
||||
Use `POST /see/zoom` to capture a tighter crop with a denser grid.
|
||||
Use `POST /see` with `ocr=true` when text localization is needed.
|
||||
|
||||
Rules:
|
||||
- Start with coarse grid (`12x12`).
|
||||
- For precision, zoom and use denser grid (`20x20` or higher).
|
||||
- Always use returned `meta.region` and `meta.grid` when computing click targets.
|
||||
- Coordinates are global desktop coordinates.
|
||||
- OCR results are in `data.meta.ocr` and include confidence, bbox, and center.
|
||||
|
||||
## Method 2: Interact
|
||||
|
||||
@@ -27,15 +30,26 @@ Use `POST /interact` for one action at a time.
|
||||
|
||||
Mouse actions:
|
||||
- `move`, `click`, `right_click`, `double_click`, `middle_click`, `scroll`
|
||||
- `click_text` (OCR-driven click; optionally scope with `click_text.region`)
|
||||
|
||||
Keyboard actions:
|
||||
- `type`, `hotkey`
|
||||
|
||||
Rules:
|
||||
- Prefer `grid` targets derived from fresh `see`/`see/zoom` captures.
|
||||
- For text buttons/labels, prefer `click_text` and bound OCR with a region when possible.
|
||||
- Use `pixel` only when you already have reliable coordinates.
|
||||
- After each important action, call `see` again before continuing.
|
||||
|
||||
## Method 2.5: Action Verify
|
||||
|
||||
Use `POST /interact/verify` to execute one action and immediately validate nearby OCR text state.
|
||||
|
||||
Rules:
|
||||
- Keep verification narrow: use `ocr_text_near_point` with a focused radius.
|
||||
- Use short intervals/timeouts for speed (`check_interval_ms` ~250, `timeout_ms` 1000-3000).
|
||||
- Prefer this over manual re-check loops when immediate confirmation is required.
|
||||
|
||||
## Method 3: Exec
|
||||
|
||||
Use `POST /exec` only for shell/system tasks.
|
||||
@@ -49,8 +63,8 @@ Rules:
|
||||
|
||||
1. `see` capture.
|
||||
2. If needed, `see/zoom` refine.
|
||||
3. `interact` one step.
|
||||
4. `see` verify.
|
||||
3. `interact` one step (`click_text` for text UI targets).
|
||||
4. `interact/verify` for action->state confirmation, or `see` verify.
|
||||
5. Repeat.
|
||||
|
||||
## Quick Safety Rules
|
||||
|
||||
Reference in New Issue
Block a user