screenjob/SKILL.md

# ScreenJob Skill (OpenClaw Agents)

## What ScreenJob Solves

ScreenJob lets an agent execute tasks that require a real desktop UI plus terminal access, with structured tool calls and job tracking.

## Main Features

- Hybrid control model: screenshot grounding plus Windows-native window/dialog/element helpers when available
- Screen perception (`see_screen`, `enhance`)
- Mouse/keyboard control (`click`, `type`, `press_key`)
- Native window/dialog control (`list_windows`, `find_window`, `focus_window`, `detect_dialog`, `dialog_action`, `dialog_set_filename`, `list_ui_elements`)
- Terminal execution (`execute_command`, `sleep`)
- Structured completion payload (`task_complete(return=..., data=...)`)
- Safety gate, auth, history, and live monitoring

## Important Environment Note

ScreenJob runs on a separate computer (the human/operator machine), not inside the agent's own runtime environment.

## Why It Is Useful

Agents can use ScreenJob to launch and control GUI workflows, including orchestrating other GUI agents/tools on a human computer.

## Example Tasks

- Open amazon.de and buy a USB-C to USB-C cable for 10 EUR or less.
- Open google.com, go to my account, and change my profile picture to a provided image URL.
- Run `ls -a` in `C:/Users/username/Documents` and return the output in `data`.

## Practical Usage

1. Submit job via CLI or API.
2. Agent performs tool loop.
3. Read final `response.return` and `response.data` from job status.

Keyboard combo rule:

- For shortcuts, use one `press_key` call with combo syntax, for example: `win+r`, `ctrl+shift+esc`.
- Do not split modifier combos into separate calls.

Enhance-first click rule:

- Before clicking small buttons/icons, dense UI, or ambiguous targets, call `enhance` first.
- Preferred preset for tiny controls: `enhance(coordinate, region="small", mode="ui")`.
- For tiny labels/text: use `mode="text"` to improve readability.
- Optional zoom control: set `scale` from `2` to `6` (defaults are tuned by region).
- After checking the enhanced image, click using the same target coordinate (or a small directional offset if needed).

Windows-native routing rule:

- First classify whether the current surface is a normal app window, browser window, `#32770` dialog, Explorer file picker, or another system surface.
- Prefer native window/dialog/element tools for focus changes, save/open dialogs, modal confirmations, and exposed controls.
- Fall back to screenshots plus mouse/keyboard only when native automation is unavailable or the UI is custom-drawn.

Verification rule:

- Before `task_complete`, verify actual on-screen content matches the expected outcome.
- Use `see_screen` (and `enhance` if needed) for this check.
- Include a concise `observed_result` in `data` when completing the task.

Patience / rerun rule:

- If a job is still `running`, do not assume it is stuck just because it looks slow, repetitive, or token-heavy.
- Prefer waiting longer and checking for a final status/result before starting a replacement run.
- Only restart or replace a running job when there is clear evidence it is failed, irrecoverably stuck, or the user explicitly asks for a restart.
- If you do replace a run, say why in one short sentence and reference the specific blocker you observed.

## API Quick Reference

Base URL:

- `http://127.0.0.1:8787` (default)

Auth (required on all `/api/*` routes):

- `Authorization: Bearer <SCREENJOB_TOKEN>`
- or `X-ScreenJob-Token: <SCREENJOB_TOKEN>`

Create a job:

- `POST /api/jobs`
- Body:

```json
{
  "job": "Open amazon.de and go to my orders",
  "model": "gpt-5.4-mini",
  "disabled_tools": [],
  "safety_override": false
}
```

- Response:

```json
{ "job_id": "job_..." }
```

Check progress/result:

- `GET /api/jobs/{job_id}`
- `GET /api/jobs/{job_id}/status`
- `GET /api/jobs/{job_id}/events`
- `GET /api/jobs`
- `POST /api/jobs/{job_id}/cancel`
- `GET /api/stats`

Result contract in job payload:

```json
{
  "status": "completed",
  "response": {
    "return": "Task completed successfully",
    "data": "file1.txt\nfile2.txt"
  },
  "return": "Task completed successfully",
  "data": "file1.txt\nfile2.txt"
}
```

Artifacts (screenshots/enhanced images):

- `GET /api/jobs/{job_id}/artifact?path=<absolute_artifact_path>&token=<SCREENJOB_TOKEN>`