docs(playbooks): expand clickthrough interaction routines
All checks were successful
python-syntax / syntax-check (push) Successful in 14s

This commit is contained in:
2026-05-01 16:04:24 +02:00
parent b5fdd82494
commit 8857feaf7b

View File

@@ -253,6 +253,82 @@ Do not collapse those steps into fake certainty.
Build per-app routines for repetitive tasks instead of generic clicking. Build per-app routines for repetitive tasks instead of generic clicking.
### Launcher / search / start app playbook
Use this when the goal is "open app X" or "bring up tool Y".
1. check `GET /windows` first in case the app is already open
2. if present, use `POST /windows/action` to focus or restore it
3. if absent, prefer `POST /launch` when you know the executable path
4. if launch path is unknown but the OS launcher/search UI is available, use a keyboard-first flow:
- open launcher (`win`, `cmd+space`, or app-specific shortcut depending on host)
- type exact app name
- wait for stable results with `POST /wait` or recapture
- verify the result text with OCR or the `image` tool
- press Enter or click the exact result once
5. verify the app window now exists or is focused
Do not keep relaunching if the window already exists; thats sloppy.
### Dialog confirmation playbook
Use for modals like save/discard, delete confirmation, permission prompts, and installer dialogs.
1. capture the dialog region with `POST /zoom`
2. use OCR first for title/body/button labels
3. if button hierarchy or emphasis matters, inspect the zoomed screenshot with the `image` tool
4. identify the exact intended action (`Cancel`, `Save`, `Allow`, `Delete`, etc.)
5. for destructive actions, require explicit user confirmation unless already requested
6. click once and verify the dialog disappeared or changed state
Good verification targets:
- dialog title vanished
- expected next window appeared
- destructive side effect is visible and confirmed
### File picker playbook
Use for open/save dialogs.
1. verify the file picker window is focused
2. OCR the visible breadcrumb/path area, filename field, and button row
3. prefer keyboard-first entry when possible:
- type or paste the target path/name into the focused field
- use `tab` / `shift+tab` to move predictably between filename and action buttons
4. if the target path is uncertain, use OCR plus the `image` tool to identify the active field and selected folder/file row
5. verify the intended filename/path is visible before confirming
6. activate `Open` / `Save` once and verify the picker closes
If the picker stays open, stop and inspect why instead of hammering Enter like a maniac.
### Browser tab / window playbook
Use for browser navigation, tab targeting, or web app recovery.
1. use `GET /windows` to focus the correct browser window first
2. prefer keyboard-first navigation:
- `ctrl+l` / `cmd+l` to focus the address bar
- `ctrl+tab` / `ctrl+shift+tab` for tab movement when order is known
- `ctrl+w` only for explicitly requested close actions
3. verify tab or page identity with OCR on the tab strip or page heading
4. if multiple similar tabs are open, zoom into the tab strip and use the `image` tool to distinguish active vs inactive tabs
5. after navigation, wait for visual stability or expected text before taking the next action
Do not assume a page loaded just because the click landed. Verify it.
### Settings / preferences navigation playbook
Use when the task involves toggles, dropdowns, sidebars, or nested settings panels.
1. identify the current settings page with OCR on the heading/sidebar
2. use OCR to find the specific section label before trying to toggle anything
3. if the layout is dense, zoom into the relevant pane and use the `image` tool to distinguish labels from controls
4. prefer small reversible actions: one toggle, one dropdown, one field edit at a time
5. after each change, verify the control state changed visually or via visible text
6. if a save/apply button exists, treat it as a separate confirmation step and verify completion
Settings UIs love hiding side effects. Assume nothing.
### Spotify playbook ### Spotify playbook
- Focus app window before search/navigation. - Focus app window before search/navigation.