Files
screenjob/SKILL.md
Space-Banane cceed18cf1
All checks were successful
CI / test (push) Successful in 7s
feat: (literally) "enhance" functionality with new parameters and improved image processing
2026-05-27 22:14:32 +02:00

3.0 KiB

ScreenJob Skill (OpenClaw Agents)

What ScreenJob Solves

ScreenJob lets an agent execute tasks that require a real desktop UI plus terminal access, with structured tool calls and job tracking.

Main Features

  • Screen perception (see_screen, enhance)
  • Mouse/keyboard control (click, type, press_key)
  • Terminal execution (execute_command, sleep)
  • Structured completion payload (task_complete(return=..., data=...))
  • Safety gate, auth, history, and live monitoring

Important Environment Note

ScreenJob runs on a separate computer (the human/operator machine), not inside the agent's own runtime environment.

Why It Is Useful

Agents can use ScreenJob to launch and control GUI workflows, including orchestrating other GUI agents/tools on a human computer.

Example Tasks

  • Open amazon.de and buy a USB-C to USB-C cable for 10 EUR or less.
  • Open google.com, go to my account, and change my profile picture to a provided image URL.
  • Run ls -a in C:/Users/username/Documents and return the output in data.

Practical Usage

  1. Submit job via CLI or API.
  2. Agent performs tool loop.
  3. Read final response.return and response.data from job status.

Keyboard combo rule:

  • For shortcuts, use one press_key call with combo syntax, for example: win+r, ctrl+shift+esc.
  • Do not split modifier combos into separate calls.

Enhance-first click rule:

  • Before clicking small buttons/icons, dense UI, or ambiguous targets, call enhance first.
  • Preferred preset for tiny controls: enhance(coordinate, region="small", mode="ui").
  • For tiny labels/text: use mode="text" to improve readability.
  • Optional zoom control: set scale from 2 to 6 (defaults are tuned by region).
  • After checking the enhanced image, click using the same target coordinate (or a small directional offset if needed).

Verification rule:

  • Before task_complete, verify actual on-screen content matches the expected outcome.
  • Use see_screen (and enhance if needed) for this check.
  • Include a concise observed_result in data when completing the task.

API Quick Reference

Base URL:

  • http://127.0.0.1:8787 (default)

Auth (required on all /api/* routes):

  • Authorization: Bearer <SCREENJOB_TOKEN>
  • or X-ScreenJob-Token: <SCREENJOB_TOKEN>

Create a job:

  • POST /api/jobs
  • Body:
{
  "job": "Open amazon.de and go to my orders",
  "model": "gpt-5.4-mini",
  "disabled_tools": [],
  "safety_override": false
}
  • Response:
{ "job_id": "job_..." }

Check progress/result:

  • GET /api/jobs/{job_id}
  • GET /api/jobs/{job_id}/status
  • GET /api/jobs/{job_id}/events
  • GET /api/jobs
  • POST /api/jobs/{job_id}/cancel
  • GET /api/stats

Result contract in job payload:

{
  "status": "completed",
  "response": {
    "return": "Task completed successfully",
    "data": "file1.txt\nfile2.txt"
  },
  "return": "Task completed successfully",
  "data": "file1.txt\nfile2.txt"
}

Artifacts (screenshots/enhanced images):

  • GET /api/jobs/{job_id}/artifact?path=<absolute_artifact_path>&token=<SCREENJOB_TOKEN>