feat(vision): add screenshot diff and stability helpers
All checks were successful
python-syntax / syntax-check (push) Successful in 9s
All checks were successful
python-syntax / syntax-check (push) Successful in 9s
This commit is contained in:
@@ -11,6 +11,7 @@ Let an Agent interact with your computer over HTTP, with grid-aware screenshots
|
||||
- **Window lifecycle endpoints**: list/focus/restore/minimize/maximize/close windows via `GET /windows` + `POST /windows/action`
|
||||
- **Structured launch endpoint**: start an app/process without dropping to a shell via `POST /launch`
|
||||
- **Wait/sync endpoint**: poll for text, window, or visual state changes via `POST /wait`
|
||||
- **Vision helper endpoints**: compare screenshots and measure stability via `POST /vision/diff` and `POST /vision/stability`
|
||||
- **OCR endpoints**: extract text blocks or search for matching text via `POST /ocr` and `POST /ocr/find`
|
||||
- **Command execution endpoint**: run PowerShell/Bash/CMD commands via `POST /exec`
|
||||
- **Coordinate transform metadata** in visual responses so agents can map grid cells to real pixels
|
||||
|
||||
Reference in New Issue
Block a user