Commit remaining workspace updates
Some checks failed
CI / test (push) Failing after 8s

This commit is contained in:
Space-Banane
2026-05-31 20:43:25 +02:00
parent 79c9e98842
commit 4123765aba
11 changed files with 4498 additions and 131 deletions

View File

@@ -6,8 +6,10 @@ ScreenJob lets an agent execute tasks that require a real desktop UI plus termin
## Main Features
- Hybrid control model: screenshot grounding plus Windows-native window/dialog/element helpers when available
- Screen perception (`see_screen`, `enhance`)
- Mouse/keyboard control (`click`, `type`, `press_key`)
- Native window/dialog control (`list_windows`, `find_window`, `focus_window`, `detect_dialog`, `dialog_action`, `dialog_set_filename`, `list_ui_elements`)
- Terminal execution (`execute_command`, `sleep`)
- Structured completion payload (`task_complete(return=..., data=...)`)
- Safety gate, auth, history, and live monitoring
@@ -45,6 +47,12 @@ Enhance-first click rule:
- Optional zoom control: set `scale` from `2` to `6` (defaults are tuned by region).
- After checking the enhanced image, click using the same target coordinate (or a small directional offset if needed).
Windows-native routing rule:
- First classify whether the current surface is a normal app window, browser window, `#32770` dialog, Explorer file picker, or another system surface.
- Prefer native window/dialog/element tools for focus changes, save/open dialogs, modal confirmations, and exposed controls.
- Fall back to screenshots plus mouse/keyboard only when native automation is unavailable or the UI is custom-drawn.
Verification rule:
- Before `task_complete`, verify actual on-screen content matches the expected outcome.