This commit is contained in:
8
SKILL.md
8
SKILL.md
@@ -6,8 +6,10 @@ ScreenJob lets an agent execute tasks that require a real desktop UI plus termin
|
||||
|
||||
## Main Features
|
||||
|
||||
- Hybrid control model: screenshot grounding plus Windows-native window/dialog/element helpers when available
|
||||
- Screen perception (`see_screen`, `enhance`)
|
||||
- Mouse/keyboard control (`click`, `type`, `press_key`)
|
||||
- Native window/dialog control (`list_windows`, `find_window`, `focus_window`, `detect_dialog`, `dialog_action`, `dialog_set_filename`, `list_ui_elements`)
|
||||
- Terminal execution (`execute_command`, `sleep`)
|
||||
- Structured completion payload (`task_complete(return=..., data=...)`)
|
||||
- Safety gate, auth, history, and live monitoring
|
||||
@@ -45,6 +47,12 @@ Enhance-first click rule:
|
||||
- Optional zoom control: set `scale` from `2` to `6` (defaults are tuned by region).
|
||||
- After checking the enhanced image, click using the same target coordinate (or a small directional offset if needed).
|
||||
|
||||
Windows-native routing rule:
|
||||
|
||||
- First classify whether the current surface is a normal app window, browser window, `#32770` dialog, Explorer file picker, or another system surface.
|
||||
- Prefer native window/dialog/element tools for focus changes, save/open dialogs, modal confirmations, and exposed controls.
|
||||
- Fall back to screenshots plus mouse/keyboard only when native automation is unavailable or the UI is custom-drawn.
|
||||
|
||||
Verification rule:
|
||||
|
||||
- Before `task_complete`, verify actual on-screen content matches the expected outcome.
|
||||
|
||||
Reference in New Issue
Block a user