feat: support key combinations in press_key function and update related tests

This commit is contained in:
Space-Banane
2026-05-27 21:20:03 +02:00
parent 278f011a6d
commit 48a145d147
4 changed files with 69 additions and 8 deletions

View File

@@ -10,6 +10,7 @@ ScreenJob lets an agent execute tasks that require a real desktop UI plus termin
- Mouse/keyboard control (`click`, `type`, `press_key`)
- Terminal execution (`execute_command`, `sleep`)
- Structured completion payload (`task_complete(return=..., data=...)`)
- Automatic final verification screen capture on completion
- Safety gate, auth, history, and live monitoring
## Important Environment Note
@@ -30,7 +31,12 @@ Agents can use ScreenJob to launch and control GUI workflows, including orchestr
1. Submit job via CLI or API.
2. Agent performs tool loop.
3. Read final `response.return` and `response.data` from job status.
3. Read final `response.return`, `response.data`, and `verification` from job status.
Keyboard combo rule:
- For shortcuts, use one `press_key` call with combo syntax, for example: `win+r`, `ctrl+shift+esc`.
- Do not split modifier combos into separate calls.
## API Quick Reference
@@ -79,10 +85,18 @@ Result contract in job payload:
"status": "completed",
"response": {
"return": "Task completed successfully",
"data": "file1.txt\nfile2.txt"
"data": "file1.txt\nfile2.txt",
"verification": {
"ok": true,
"path": "C:/.../screens/screen_final_verification_step_006.png"
}
},
"return": "Task completed successfully",
"data": "file1.txt\nfile2.txt"
"data": "file1.txt\nfile2.txt",
"verification": {
"ok": true,
"path": "C:/.../screens/screen_final_verification_step_006.png"
}
}
```