Files
screenjob/SKILL.md
Space-Banane cceed18cf1
All checks were successful
CI / test (push) Successful in 7s
feat: (literally) "enhance" functionality with new parameters and improved image processing
2026-05-27 22:14:32 +02:00

111 lines
3.0 KiB
Markdown

# ScreenJob Skill (OpenClaw Agents)
## What ScreenJob Solves
ScreenJob lets an agent execute tasks that require a real desktop UI plus terminal access, with structured tool calls and job tracking.
## Main Features
- Screen perception (`see_screen`, `enhance`)
- Mouse/keyboard control (`click`, `type`, `press_key`)
- Terminal execution (`execute_command`, `sleep`)
- Structured completion payload (`task_complete(return=..., data=...)`)
- Safety gate, auth, history, and live monitoring
## Important Environment Note
ScreenJob runs on a separate computer (the human/operator machine), not inside the agent's own runtime environment.
## Why It Is Useful
Agents can use ScreenJob to launch and control GUI workflows, including orchestrating other GUI agents/tools on a human computer.
## Example Tasks
- Open amazon.de and buy a USB-C to USB-C cable for 10 EUR or less.
- Open google.com, go to my account, and change my profile picture to a provided image URL.
- Run `ls -a` in `C:/Users/username/Documents` and return the output in `data`.
## Practical Usage
1. Submit job via CLI or API.
2. Agent performs tool loop.
3. Read final `response.return` and `response.data` from job status.
Keyboard combo rule:
- For shortcuts, use one `press_key` call with combo syntax, for example: `win+r`, `ctrl+shift+esc`.
- Do not split modifier combos into separate calls.
Enhance-first click rule:
- Before clicking small buttons/icons, dense UI, or ambiguous targets, call `enhance` first.
- Preferred preset for tiny controls: `enhance(coordinate, region="small", mode="ui")`.
- For tiny labels/text: use `mode="text"` to improve readability.
- Optional zoom control: set `scale` from `2` to `6` (defaults are tuned by region).
- After checking the enhanced image, click using the same target coordinate (or a small directional offset if needed).
Verification rule:
- Before `task_complete`, verify actual on-screen content matches the expected outcome.
- Use `see_screen` (and `enhance` if needed) for this check.
- Include a concise `observed_result` in `data` when completing the task.
## API Quick Reference
Base URL:
- `http://127.0.0.1:8787` (default)
Auth (required on all `/api/*` routes):
- `Authorization: Bearer <SCREENJOB_TOKEN>`
- or `X-ScreenJob-Token: <SCREENJOB_TOKEN>`
Create a job:
- `POST /api/jobs`
- Body:
```json
{
"job": "Open amazon.de and go to my orders",
"model": "gpt-5.4-mini",
"disabled_tools": [],
"safety_override": false
}
```
- Response:
```json
{ "job_id": "job_..." }
```
Check progress/result:
- `GET /api/jobs/{job_id}`
- `GET /api/jobs/{job_id}/status`
- `GET /api/jobs/{job_id}/events`
- `GET /api/jobs`
- `POST /api/jobs/{job_id}/cancel`
- `GET /api/stats`
Result contract in job payload:
```json
{
"status": "completed",
"response": {
"return": "Task completed successfully",
"data": "file1.txt\nfile2.txt"
},
"return": "Task completed successfully",
"data": "file1.txt\nfile2.txt"
}
```
Artifacts (screenshots/enhanced images):
- `GET /api/jobs/{job_id}/artifact?path=<absolute_artifact_path>&token=<SCREENJOB_TOKEN>`