Files
screenjob/SKILL.md
Luna a521142b89
All checks were successful
CI / test (push) Successful in 8s
docs: add patience rule for rerunning jobs
2026-05-31 18:35:35 +00:00

3.5 KiB

ScreenJob Skill (OpenClaw Agents)

What ScreenJob Solves

ScreenJob lets an agent execute tasks that require a real desktop UI plus terminal access, with structured tool calls and job tracking.

Main Features

  • Screen perception (see_screen, enhance)
  • Mouse/keyboard control (click, type, press_key)
  • Terminal execution (execute_command, sleep)
  • Structured completion payload (task_complete(return=..., data=...))
  • Safety gate, auth, history, and live monitoring

Important Environment Note

ScreenJob runs on a separate computer (the human/operator machine), not inside the agent's own runtime environment.

Why It Is Useful

Agents can use ScreenJob to launch and control GUI workflows, including orchestrating other GUI agents/tools on a human computer.

Example Tasks

  • Open amazon.de and buy a USB-C to USB-C cable for 10 EUR or less.
  • Open google.com, go to my account, and change my profile picture to a provided image URL.
  • Run ls -a in C:/Users/username/Documents and return the output in data.

Practical Usage

  1. Submit job via CLI or API.
  2. Agent performs tool loop.
  3. Read final response.return and response.data from job status.

Keyboard combo rule:

  • For shortcuts, use one press_key call with combo syntax, for example: win+r, ctrl+shift+esc.
  • Do not split modifier combos into separate calls.

Enhance-first click rule:

  • Before clicking small buttons/icons, dense UI, or ambiguous targets, call enhance first.
  • Preferred preset for tiny controls: enhance(coordinate, region="small", mode="ui").
  • For tiny labels/text: use mode="text" to improve readability.
  • Optional zoom control: set scale from 2 to 6 (defaults are tuned by region).
  • After checking the enhanced image, click using the same target coordinate (or a small directional offset if needed).

Verification rule:

  • Before task_complete, verify actual on-screen content matches the expected outcome.
  • Use see_screen (and enhance if needed) for this check.
  • Include a concise observed_result in data when completing the task.

Patience / rerun rule:

  • If a job is still running, do not assume it is stuck just because it looks slow, repetitive, or token-heavy.
  • Prefer waiting longer and checking for a final status/result before starting a replacement run.
  • Only restart or replace a running job when there is clear evidence it is failed, irrecoverably stuck, or the user explicitly asks for a restart.
  • If you do replace a run, say why in one short sentence and reference the specific blocker you observed.

API Quick Reference

Base URL:

  • http://127.0.0.1:8787 (default)

Auth (required on all /api/* routes):

  • Authorization: Bearer <SCREENJOB_TOKEN>
  • or X-ScreenJob-Token: <SCREENJOB_TOKEN>

Create a job:

  • POST /api/jobs
  • Body:
{
  "job": "Open amazon.de and go to my orders",
  "model": "gpt-5.4-mini",
  "disabled_tools": [],
  "safety_override": false
}
  • Response:
{ "job_id": "job_..." }

Check progress/result:

  • GET /api/jobs/{job_id}
  • GET /api/jobs/{job_id}/status
  • GET /api/jobs/{job_id}/events
  • GET /api/jobs
  • POST /api/jobs/{job_id}/cancel
  • GET /api/stats

Result contract in job payload:

{
  "status": "completed",
  "response": {
    "return": "Task completed successfully",
    "data": "file1.txt\nfile2.txt"
  },
  "return": "Task completed successfully",
  "data": "file1.txt\nfile2.txt"
}

Artifacts (screenshots/enhanced images):

  • GET /api/jobs/{job_id}/artifact?path=<absolute_artifact_path>&token=<SCREENJOB_TOKEN>