126 lines
4.2 KiB
Markdown
126 lines
4.2 KiB
Markdown
# ScreenJob Skill (OpenClaw Agents)
|
|
|
|
## What ScreenJob Solves
|
|
|
|
ScreenJob lets an agent execute tasks that require a real desktop UI plus terminal access, with structured tool calls and job tracking.
|
|
|
|
## Main Features
|
|
|
|
- Hybrid control model: screenshot grounding plus Windows-native window/dialog/element helpers when available
|
|
- Screen perception (`see_screen`, `enhance`)
|
|
- Mouse/keyboard control (`click`, `type`, `press_key`)
|
|
- Native window/dialog control (`list_windows`, `find_window`, `focus_window`, `detect_dialog`, `dialog_action`, `dialog_set_filename`, `list_ui_elements`)
|
|
- Terminal execution (`execute_command`, `sleep`)
|
|
- Structured completion payload (`task_complete(return=..., data=...)`)
|
|
- Safety gate, auth, history, and live monitoring
|
|
|
|
## Important Environment Note
|
|
|
|
ScreenJob runs on a separate computer (the human/operator machine), not inside the agent's own runtime environment.
|
|
|
|
## Why It Is Useful
|
|
|
|
Agents can use ScreenJob to launch and control GUI workflows, including orchestrating other GUI agents/tools on a human computer.
|
|
|
|
## Example Tasks
|
|
|
|
- Open amazon.de and buy a USB-C to USB-C cable for 10 EUR or less.
|
|
- Open google.com, go to my account, and change my profile picture to a provided image URL.
|
|
- Run `ls -a` in `C:/Users/username/Documents` and return the output in `data`.
|
|
|
|
## Practical Usage
|
|
|
|
1. Submit job via CLI or API.
|
|
2. Agent performs tool loop.
|
|
3. Read final `response.return` and `response.data` from job status.
|
|
|
|
Keyboard combo rule:
|
|
|
|
- For shortcuts, use one `press_key` call with combo syntax, for example: `win+r`, `ctrl+shift+esc`.
|
|
- Do not split modifier combos into separate calls.
|
|
|
|
Enhance-first click rule:
|
|
|
|
- Before clicking small buttons/icons, dense UI, or ambiguous targets, call `enhance` first.
|
|
- Preferred preset for tiny controls: `enhance(coordinate, region="small", mode="ui")`.
|
|
- For tiny labels/text: use `mode="text"` to improve readability.
|
|
- Optional zoom control: set `scale` from `2` to `6` (defaults are tuned by region).
|
|
- After checking the enhanced image, click using the same target coordinate (or a small directional offset if needed).
|
|
|
|
Windows-native routing rule:
|
|
|
|
- First classify whether the current surface is a normal app window, browser window, `#32770` dialog, Explorer file picker, or another system surface.
|
|
- Prefer native window/dialog/element tools for focus changes, save/open dialogs, modal confirmations, and exposed controls.
|
|
- Fall back to screenshots plus mouse/keyboard only when native automation is unavailable or the UI is custom-drawn.
|
|
|
|
Verification rule:
|
|
|
|
- Before `task_complete`, verify actual on-screen content matches the expected outcome.
|
|
- Use `see_screen` (and `enhance` if needed) for this check.
|
|
- Include a concise `observed_result` in `data` when completing the task.
|
|
|
|
Patience / rerun rule:
|
|
|
|
- If a job is still `running`, do not assume it is stuck just because it looks slow, repetitive, or token-heavy.
|
|
- Prefer waiting longer and checking for a final status/result before starting a replacement run.
|
|
- Only restart or replace a running job when there is clear evidence it is failed, irrecoverably stuck, or the user explicitly asks for a restart.
|
|
- If you do replace a run, say why in one short sentence and reference the specific blocker you observed.
|
|
|
|
## API Quick Reference
|
|
|
|
Base URL:
|
|
|
|
- `http://127.0.0.1:8787` (default)
|
|
|
|
Auth (required on all `/api/*` routes):
|
|
|
|
- `Authorization: Bearer <SCREENJOB_TOKEN>`
|
|
- or `X-ScreenJob-Token: <SCREENJOB_TOKEN>`
|
|
|
|
Create a job:
|
|
|
|
- `POST /api/jobs`
|
|
- Body:
|
|
|
|
```json
|
|
{
|
|
"job": "Open amazon.de and go to my orders",
|
|
"model": "gpt-5.4-mini",
|
|
"disabled_tools": [],
|
|
"safety_override": false
|
|
}
|
|
```
|
|
|
|
- Response:
|
|
|
|
```json
|
|
{ "job_id": "job_..." }
|
|
```
|
|
|
|
Check progress/result:
|
|
|
|
- `GET /api/jobs/{job_id}`
|
|
- `GET /api/jobs/{job_id}/status`
|
|
- `GET /api/jobs/{job_id}/events`
|
|
- `GET /api/jobs`
|
|
- `POST /api/jobs/{job_id}/cancel`
|
|
- `GET /api/stats`
|
|
|
|
Result contract in job payload:
|
|
|
|
```json
|
|
{
|
|
"status": "completed",
|
|
"response": {
|
|
"return": "Task completed successfully",
|
|
"data": "file1.txt\nfile2.txt"
|
|
},
|
|
"return": "Task completed successfully",
|
|
"data": "file1.txt\nfile2.txt"
|
|
}
|
|
```
|
|
|
|
Artifacts (screenshots/enhanced images):
|
|
|
|
- `GET /api/jobs/{job_id}/artifact?path=<absolute_artifact_path>&token=<SCREENJOB_TOKEN>`
|