Files
screenjob/SKILL.md
Space-Banane 4123765aba
Some checks failed
CI / test (push) Failing after 8s
Commit remaining workspace updates
2026-05-31 20:43:36 +02:00

126 lines
4.2 KiB
Markdown

# ScreenJob Skill (OpenClaw Agents)
## What ScreenJob Solves
ScreenJob lets an agent execute tasks that require a real desktop UI plus terminal access, with structured tool calls and job tracking.
## Main Features
- Hybrid control model: screenshot grounding plus Windows-native window/dialog/element helpers when available
- Screen perception (`see_screen`, `enhance`)
- Mouse/keyboard control (`click`, `type`, `press_key`)
- Native window/dialog control (`list_windows`, `find_window`, `focus_window`, `detect_dialog`, `dialog_action`, `dialog_set_filename`, `list_ui_elements`)
- Terminal execution (`execute_command`, `sleep`)
- Structured completion payload (`task_complete(return=..., data=...)`)
- Safety gate, auth, history, and live monitoring
## Important Environment Note
ScreenJob runs on a separate computer (the human/operator machine), not inside the agent's own runtime environment.
## Why It Is Useful
Agents can use ScreenJob to launch and control GUI workflows, including orchestrating other GUI agents/tools on a human computer.
## Example Tasks
- Open amazon.de and buy a USB-C to USB-C cable for 10 EUR or less.
- Open google.com, go to my account, and change my profile picture to a provided image URL.
- Run `ls -a` in `C:/Users/username/Documents` and return the output in `data`.
## Practical Usage
1. Submit job via CLI or API.
2. Agent performs tool loop.
3. Read final `response.return` and `response.data` from job status.
Keyboard combo rule:
- For shortcuts, use one `press_key` call with combo syntax, for example: `win+r`, `ctrl+shift+esc`.
- Do not split modifier combos into separate calls.
Enhance-first click rule:
- Before clicking small buttons/icons, dense UI, or ambiguous targets, call `enhance` first.
- Preferred preset for tiny controls: `enhance(coordinate, region="small", mode="ui")`.
- For tiny labels/text: use `mode="text"` to improve readability.
- Optional zoom control: set `scale` from `2` to `6` (defaults are tuned by region).
- After checking the enhanced image, click using the same target coordinate (or a small directional offset if needed).
Windows-native routing rule:
- First classify whether the current surface is a normal app window, browser window, `#32770` dialog, Explorer file picker, or another system surface.
- Prefer native window/dialog/element tools for focus changes, save/open dialogs, modal confirmations, and exposed controls.
- Fall back to screenshots plus mouse/keyboard only when native automation is unavailable or the UI is custom-drawn.
Verification rule:
- Before `task_complete`, verify actual on-screen content matches the expected outcome.
- Use `see_screen` (and `enhance` if needed) for this check.
- Include a concise `observed_result` in `data` when completing the task.
Patience / rerun rule:
- If a job is still `running`, do not assume it is stuck just because it looks slow, repetitive, or token-heavy.
- Prefer waiting longer and checking for a final status/result before starting a replacement run.
- Only restart or replace a running job when there is clear evidence it is failed, irrecoverably stuck, or the user explicitly asks for a restart.
- If you do replace a run, say why in one short sentence and reference the specific blocker you observed.
## API Quick Reference
Base URL:
- `http://127.0.0.1:8787` (default)
Auth (required on all `/api/*` routes):
- `Authorization: Bearer <SCREENJOB_TOKEN>`
- or `X-ScreenJob-Token: <SCREENJOB_TOKEN>`
Create a job:
- `POST /api/jobs`
- Body:
```json
{
"job": "Open amazon.de and go to my orders",
"model": "gpt-5.4-mini",
"disabled_tools": [],
"safety_override": false
}
```
- Response:
```json
{ "job_id": "job_..." }
```
Check progress/result:
- `GET /api/jobs/{job_id}`
- `GET /api/jobs/{job_id}/status`
- `GET /api/jobs/{job_id}/events`
- `GET /api/jobs`
- `POST /api/jobs/{job_id}/cancel`
- `GET /api/stats`
Result contract in job payload:
```json
{
"status": "completed",
"response": {
"return": "Task completed successfully",
"data": "file1.txt\nfile2.txt"
},
"return": "Task completed successfully",
"data": "file1.txt\nfile2.txt"
}
```
Artifacts (screenshots/enhanced images):
- `GET /api/jobs/{job_id}/artifact?path=<absolute_artifact_path>&token=<SCREENJOB_TOKEN>`