space/screenjob

Fork 0

Files

Space-Banane 114ddd80d6

CI / test (push) Successful in 7s

Details

Add Windows service host and system tray controller

2026-05-28 13:30:27 +02:00

5.7 KiB

Raw Permalink Blame History

ScreenJob

ScreenJob is an autonomous desktop-and-terminal execution service.
It lets an LLM use controlled local tools (screen, click, type, shell) to complete GUI-heavy tasks on a real computer.

What It Solves

Runs agent-driven tasks that require a graphical interface.
Exposes both CLI and HTTP API modes.
Stores job history and events in SQLite.
Streams live monitoring updates over WebSocket.
Returns structured agent output as:
- return: human-readable completion message
- data: structured payload (for example command output)

Core Features

Tool-based agent loop (execute_command, see_screen, enhance, click, type, press_key, sleep, task_complete)
Safety pre-check with override support
Per-job tool disable list
Live/final usage and cost estimates
Read-only Tailwind monitoring UI
Persistent job and event history

Project Layout

main.py
screenjob.py
requirements.txt
start_backend.ps1
src/
  agent.py
  app_main.py
  cli.py
  config.py
  models.py
  pricing.py
  runtime.py
  safety.py
  server.py
  storage.py
  task_manager.py
  ui.py
  utils.py
tests/
  test_agent_tools.py
  test_pricing.py
  test_server_api.py
  test_storage.py
.gitea/workflows/ci.yml

Setup

Install Python 3.11+.
Install dependencies:

pip install -r requirements.txt

Create .env in project root:

OPENAI_API_KEY=...
SCREENJOB_TOKEN=choose_a_strong_token

# Optional
SCREENJOB_DEFAULT_MODEL=gpt-5.4-mini
SCREENJOB_SAFETY_MODEL=gpt-5.4-mini
SCREENJOB_HOST=127.0.0.1
SCREENJOB_PORT=8787
DISABLE_UI=false

Usage

CLI

python main.py run "Open amazon.de and go to my orders"

CLI JSON output includes both legacy and structured fields:

{
  "completed": true,
  "result": "Task completed successfully",
  "response": {
    "return": "Task completed successfully",
    "data": "file1.txt\nfile2.txt"
  },
  "return": "Task completed successfully",
  "data": "file1.txt\nfile2.txt"
}

Server

python main.py server

Or use the PowerShell launcher:

.\start_backend.ps1

Windows Service

Run these from an elevated PowerShell session (Run as Administrator): Requires .NET SDK 10+ (installer publishes a native service host executable).

Install and start at boot:

.\install_backend_service.ps1 -ForceReinstall -StartAfterInstall -DelayedAutoStart

Check status:

Get-Service -Name ScreenJobBackend

Stop/start manually:

Stop-Service -Name ScreenJobBackend
Start-Service -Name ScreenJobBackend

Uninstall:

.\uninstall_backend_service.ps1

Service logs are written to:

screenjob_runs/service/backend-service.stdout.log
screenjob_runs/service/backend-service.stderr.log

System Tray Icon (Windows)

Start tray icon now:

powershell -NoProfile -ExecutionPolicy Bypass -STA -File .\screenjob_tray.ps1

Install startup shortcut (current user):

.\install_tray_startup_shortcut.ps1

Install startup shortcut for all users:

.\install_tray_startup_shortcut.ps1 -AllUsers

Remove startup shortcut:

.\install_tray_startup_shortcut.ps1 -Remove

Tray menu actions:

Refresh service status
Start/Stop/Restart service (prompts for admin/UAC)
Open dashboard URL from .env SCREENJOB_HOST / SCREENJOB_PORT
Open service logs folder
Exit tray icon process

Auth for all API routes:

Authorization: Bearer <SCREENJOB_TOKEN>
X-ScreenJob-Token: <SCREENJOB_TOKEN>
Query fallback ?token= (mainly for UI/websocket/artifact fetch)

Create Job

POST /api/jobs

{
  "job": "run \"ls -a\" in C:/Users/username/Documents and return output",
  "model": "gpt-5.4-mini",
  "disabled_tools": [],
  "safety_override": false
}

Response:

{ "job_id": "job_..." }

Job Status / History

GET /api/jobs/{job_id}
GET /api/jobs/{job_id}/status
GET /api/jobs/{job_id}/events
GET /api/jobs
POST /api/jobs/{job_id}/cancel
GET /api/stats

Each job payload includes:

result (compat string)
response.return
response.data
top-level return and data aliases

Monitoring UI

URL: /
Read-only dashboard (no run controls)
Requires token input
Live updates via /ws
Analytics dashboards for success rate by objective category and daily averages
Set DISABLE_UI=true to disable UI

Analytics API

GET /api/analytics
Returns objective-category success rates plus average steps/cost over time

Agent Instructions (Practical)

Prefer execute_command for deterministic actions (opening URLs, filesystem checks).
Use see_screen before UI interaction.
Use enhance before clicking small/ambiguous targets; prefer region="small" for compact controls.
Use enhance mode="text" for tiny labels/text, or mode="ui" for general UI.
Optionally set enhance scale (2-6) for tighter zoom control.
Use press_key for non-text keys (Enter, Tab, arrows, Escape).
For shortcuts, use one press_key call with combo syntax (example: win+r).
Use click offsets via offset_up/down/left/right and optional sleep_after_seconds.
When done, call:
- task_complete(return="...", data=...)
Before task_complete, verify expected on-screen content with see_screen (and enhance if needed), and include an observed_result summary in data.

data should contain useful structured output for the requester (text, object, list, etc.).

Verification

Local:

pytest -q

CI:

.gitea/workflows/ci.yml runs compile checks + tests on push/PR.

Compatibility Entry Point

python screenjob.py "<job>" remains supported as a wrapper to main.py.

License

Apache License 2.0. See LICENSE.

5.7 KiB Raw Permalink Blame History