3.5 KiB
3.5 KiB
ScreenJob Skill (OpenClaw Agents)
What ScreenJob Solves
ScreenJob lets an agent execute tasks that require a real desktop UI plus terminal access, with structured tool calls and job tracking.
Main Features
- Screen perception (
see_screen,enhance) - Mouse/keyboard control (
click,type,press_key) - Terminal execution (
execute_command,sleep) - Structured completion payload (
task_complete(return=..., data=...)) - Safety gate, auth, history, and live monitoring
Important Environment Note
ScreenJob runs on a separate computer (the human/operator machine), not inside the agent's own runtime environment.
Why It Is Useful
Agents can use ScreenJob to launch and control GUI workflows, including orchestrating other GUI agents/tools on a human computer.
Example Tasks
- Open amazon.de and buy a USB-C to USB-C cable for 10 EUR or less.
- Open google.com, go to my account, and change my profile picture to a provided image URL.
- Run
ls -ainC:/Users/username/Documentsand return the output indata.
Practical Usage
- Submit job via CLI or API.
- Agent performs tool loop.
- Read final
response.returnandresponse.datafrom job status.
Keyboard combo rule:
- For shortcuts, use one
press_keycall with combo syntax, for example:win+r,ctrl+shift+esc. - Do not split modifier combos into separate calls.
Enhance-first click rule:
- Before clicking small buttons/icons, dense UI, or ambiguous targets, call
enhancefirst. - Preferred preset for tiny controls:
enhance(coordinate, region="small", mode="ui"). - For tiny labels/text: use
mode="text"to improve readability. - Optional zoom control: set
scalefrom2to6(defaults are tuned by region). - After checking the enhanced image, click using the same target coordinate (or a small directional offset if needed).
Verification rule:
- Before
task_complete, verify actual on-screen content matches the expected outcome. - Use
see_screen(andenhanceif needed) for this check. - Include a concise
observed_resultindatawhen completing the task.
Patience / rerun rule:
- If a job is still
running, do not assume it is stuck just because it looks slow, repetitive, or token-heavy. - Prefer waiting longer and checking for a final status/result before starting a replacement run.
- Only restart or replace a running job when there is clear evidence it is failed, irrecoverably stuck, or the user explicitly asks for a restart.
- If you do replace a run, say why in one short sentence and reference the specific blocker you observed.
API Quick Reference
Base URL:
http://127.0.0.1:8787(default)
Auth (required on all /api/* routes):
Authorization: Bearer <SCREENJOB_TOKEN>- or
X-ScreenJob-Token: <SCREENJOB_TOKEN>
Create a job:
POST /api/jobs- Body:
{
"job": "Open amazon.de and go to my orders",
"model": "gpt-5.4-mini",
"disabled_tools": [],
"safety_override": false
}
- Response:
{ "job_id": "job_..." }
Check progress/result:
GET /api/jobs/{job_id}GET /api/jobs/{job_id}/statusGET /api/jobs/{job_id}/eventsGET /api/jobsPOST /api/jobs/{job_id}/cancelGET /api/stats
Result contract in job payload:
{
"status": "completed",
"response": {
"return": "Task completed successfully",
"data": "file1.txt\nfile2.txt"
},
"return": "Task completed successfully",
"data": "file1.txt\nfile2.txt"
}
Artifacts (screenshots/enhanced images):
GET /api/jobs/{job_id}/artifact?path=<absolute_artifact_path>&token=<SCREENJOB_TOKEN>