docs(playbooks): expand clickthrough interaction routines
All checks were successful
python-syntax / syntax-check (push) Successful in 14s
All checks were successful
python-syntax / syntax-check (push) Successful in 14s
This commit is contained in:
@@ -253,6 +253,82 @@ Do not collapse those steps into fake certainty.
|
|||||||
|
|
||||||
Build per-app routines for repetitive tasks instead of generic clicking.
|
Build per-app routines for repetitive tasks instead of generic clicking.
|
||||||
|
|
||||||
|
### Launcher / search / start app playbook
|
||||||
|
|
||||||
|
Use this when the goal is "open app X" or "bring up tool Y".
|
||||||
|
|
||||||
|
1. check `GET /windows` first in case the app is already open
|
||||||
|
2. if present, use `POST /windows/action` to focus or restore it
|
||||||
|
3. if absent, prefer `POST /launch` when you know the executable path
|
||||||
|
4. if launch path is unknown but the OS launcher/search UI is available, use a keyboard-first flow:
|
||||||
|
- open launcher (`win`, `cmd+space`, or app-specific shortcut depending on host)
|
||||||
|
- type exact app name
|
||||||
|
- wait for stable results with `POST /wait` or recapture
|
||||||
|
- verify the result text with OCR or the `image` tool
|
||||||
|
- press Enter or click the exact result once
|
||||||
|
5. verify the app window now exists or is focused
|
||||||
|
|
||||||
|
Do not keep relaunching if the window already exists; that’s sloppy.
|
||||||
|
|
||||||
|
### Dialog confirmation playbook
|
||||||
|
|
||||||
|
Use for modals like save/discard, delete confirmation, permission prompts, and installer dialogs.
|
||||||
|
|
||||||
|
1. capture the dialog region with `POST /zoom`
|
||||||
|
2. use OCR first for title/body/button labels
|
||||||
|
3. if button hierarchy or emphasis matters, inspect the zoomed screenshot with the `image` tool
|
||||||
|
4. identify the exact intended action (`Cancel`, `Save`, `Allow`, `Delete`, etc.)
|
||||||
|
5. for destructive actions, require explicit user confirmation unless already requested
|
||||||
|
6. click once and verify the dialog disappeared or changed state
|
||||||
|
|
||||||
|
Good verification targets:
|
||||||
|
- dialog title vanished
|
||||||
|
- expected next window appeared
|
||||||
|
- destructive side effect is visible and confirmed
|
||||||
|
|
||||||
|
### File picker playbook
|
||||||
|
|
||||||
|
Use for open/save dialogs.
|
||||||
|
|
||||||
|
1. verify the file picker window is focused
|
||||||
|
2. OCR the visible breadcrumb/path area, filename field, and button row
|
||||||
|
3. prefer keyboard-first entry when possible:
|
||||||
|
- type or paste the target path/name into the focused field
|
||||||
|
- use `tab` / `shift+tab` to move predictably between filename and action buttons
|
||||||
|
4. if the target path is uncertain, use OCR plus the `image` tool to identify the active field and selected folder/file row
|
||||||
|
5. verify the intended filename/path is visible before confirming
|
||||||
|
6. activate `Open` / `Save` once and verify the picker closes
|
||||||
|
|
||||||
|
If the picker stays open, stop and inspect why instead of hammering Enter like a maniac.
|
||||||
|
|
||||||
|
### Browser tab / window playbook
|
||||||
|
|
||||||
|
Use for browser navigation, tab targeting, or web app recovery.
|
||||||
|
|
||||||
|
1. use `GET /windows` to focus the correct browser window first
|
||||||
|
2. prefer keyboard-first navigation:
|
||||||
|
- `ctrl+l` / `cmd+l` to focus the address bar
|
||||||
|
- `ctrl+tab` / `ctrl+shift+tab` for tab movement when order is known
|
||||||
|
- `ctrl+w` only for explicitly requested close actions
|
||||||
|
3. verify tab or page identity with OCR on the tab strip or page heading
|
||||||
|
4. if multiple similar tabs are open, zoom into the tab strip and use the `image` tool to distinguish active vs inactive tabs
|
||||||
|
5. after navigation, wait for visual stability or expected text before taking the next action
|
||||||
|
|
||||||
|
Do not assume a page loaded just because the click landed. Verify it.
|
||||||
|
|
||||||
|
### Settings / preferences navigation playbook
|
||||||
|
|
||||||
|
Use when the task involves toggles, dropdowns, sidebars, or nested settings panels.
|
||||||
|
|
||||||
|
1. identify the current settings page with OCR on the heading/sidebar
|
||||||
|
2. use OCR to find the specific section label before trying to toggle anything
|
||||||
|
3. if the layout is dense, zoom into the relevant pane and use the `image` tool to distinguish labels from controls
|
||||||
|
4. prefer small reversible actions: one toggle, one dropdown, one field edit at a time
|
||||||
|
5. after each change, verify the control state changed visually or via visible text
|
||||||
|
6. if a save/apply button exists, treat it as a separate confirmation step and verify completion
|
||||||
|
|
||||||
|
Settings UIs love hiding side effects. Assume nothing.
|
||||||
|
|
||||||
### Spotify playbook
|
### Spotify playbook
|
||||||
|
|
||||||
- Focus app window before search/navigation.
|
- Focus app window before search/navigation.
|
||||||
|
|||||||
Reference in New Issue
Block a user