Files
screenjob/SKILL.md
Space-Banane 4123765aba
Some checks failed
CI / test (push) Failing after 8s
Commit remaining workspace updates
2026-05-31 20:43:36 +02:00

4.2 KiB

ScreenJob Skill (OpenClaw Agents)

What ScreenJob Solves

ScreenJob lets an agent execute tasks that require a real desktop UI plus terminal access, with structured tool calls and job tracking.

Main Features

  • Hybrid control model: screenshot grounding plus Windows-native window/dialog/element helpers when available
  • Screen perception (see_screen, enhance)
  • Mouse/keyboard control (click, type, press_key)
  • Native window/dialog control (list_windows, find_window, focus_window, detect_dialog, dialog_action, dialog_set_filename, list_ui_elements)
  • Terminal execution (execute_command, sleep)
  • Structured completion payload (task_complete(return=..., data=...))
  • Safety gate, auth, history, and live monitoring

Important Environment Note

ScreenJob runs on a separate computer (the human/operator machine), not inside the agent's own runtime environment.

Why It Is Useful

Agents can use ScreenJob to launch and control GUI workflows, including orchestrating other GUI agents/tools on a human computer.

Example Tasks

  • Open amazon.de and buy a USB-C to USB-C cable for 10 EUR or less.
  • Open google.com, go to my account, and change my profile picture to a provided image URL.
  • Run ls -a in C:/Users/username/Documents and return the output in data.

Practical Usage

  1. Submit job via CLI or API.
  2. Agent performs tool loop.
  3. Read final response.return and response.data from job status.

Keyboard combo rule:

  • For shortcuts, use one press_key call with combo syntax, for example: win+r, ctrl+shift+esc.
  • Do not split modifier combos into separate calls.

Enhance-first click rule:

  • Before clicking small buttons/icons, dense UI, or ambiguous targets, call enhance first.
  • Preferred preset for tiny controls: enhance(coordinate, region="small", mode="ui").
  • For tiny labels/text: use mode="text" to improve readability.
  • Optional zoom control: set scale from 2 to 6 (defaults are tuned by region).
  • After checking the enhanced image, click using the same target coordinate (or a small directional offset if needed).

Windows-native routing rule:

  • First classify whether the current surface is a normal app window, browser window, #32770 dialog, Explorer file picker, or another system surface.
  • Prefer native window/dialog/element tools for focus changes, save/open dialogs, modal confirmations, and exposed controls.
  • Fall back to screenshots plus mouse/keyboard only when native automation is unavailable or the UI is custom-drawn.

Verification rule:

  • Before task_complete, verify actual on-screen content matches the expected outcome.
  • Use see_screen (and enhance if needed) for this check.
  • Include a concise observed_result in data when completing the task.

Patience / rerun rule:

  • If a job is still running, do not assume it is stuck just because it looks slow, repetitive, or token-heavy.
  • Prefer waiting longer and checking for a final status/result before starting a replacement run.
  • Only restart or replace a running job when there is clear evidence it is failed, irrecoverably stuck, or the user explicitly asks for a restart.
  • If you do replace a run, say why in one short sentence and reference the specific blocker you observed.

API Quick Reference

Base URL:

  • http://127.0.0.1:8787 (default)

Auth (required on all /api/* routes):

  • Authorization: Bearer <SCREENJOB_TOKEN>
  • or X-ScreenJob-Token: <SCREENJOB_TOKEN>

Create a job:

  • POST /api/jobs
  • Body:
{
  "job": "Open amazon.de and go to my orders",
  "model": "gpt-5.4-mini",
  "disabled_tools": [],
  "safety_override": false
}
  • Response:
{ "job_id": "job_..." }

Check progress/result:

  • GET /api/jobs/{job_id}
  • GET /api/jobs/{job_id}/status
  • GET /api/jobs/{job_id}/events
  • GET /api/jobs
  • POST /api/jobs/{job_id}/cancel
  • GET /api/stats

Result contract in job payload:

{
  "status": "completed",
  "response": {
    "return": "Task completed successfully",
    "data": "file1.txt\nfile2.txt"
  },
  "return": "Task completed successfully",
  "data": "file1.txt\nfile2.txt"
}

Artifacts (screenshots/enhanced images):

  • GET /api/jobs/{job_id}/artifact?path=<absolute_artifact_path>&token=<SCREENJOB_TOKEN>