5.7 KiB
ScreenJob
ScreenJob is an autonomous desktop-and-terminal execution service.
It lets an LLM use controlled local tools (screen, click, type, shell) to complete GUI-heavy tasks on a real computer.
What It Solves
- Runs agent-driven tasks that require a graphical interface.
- Exposes both CLI and HTTP API modes.
- Stores job history and events in SQLite.
- Streams live monitoring updates over WebSocket.
- Returns structured agent output as:
return: human-readable completion messagedata: structured payload (for example command output)
Core Features
- Tool-based agent loop (
execute_command,see_screen,enhance,click,type,press_key,sleep,task_complete) - Safety pre-check with override support
- Per-job tool disable list
- Live/final usage and cost estimates
- Read-only Tailwind monitoring UI
- Persistent job and event history
Project Layout
main.py
screenjob.py
requirements.txt
start_backend.ps1
src/
agent.py
app_main.py
cli.py
config.py
models.py
pricing.py
runtime.py
safety.py
server.py
storage.py
task_manager.py
ui.py
utils.py
tests/
test_agent_tools.py
test_pricing.py
test_server_api.py
test_storage.py
.gitea/workflows/ci.yml
Setup
- Install Python 3.11+.
- Install dependencies:
pip install -r requirements.txt
- Create
.envin project root:
OPENAI_API_KEY=...
SCREENJOB_TOKEN=choose_a_strong_token
# Optional
SCREENJOB_DEFAULT_MODEL=gpt-5.4-mini
SCREENJOB_SAFETY_MODEL=gpt-5.4-mini
SCREENJOB_HOST=127.0.0.1
SCREENJOB_PORT=8787
DISABLE_UI=false
Usage
CLI
python main.py run "Open amazon.de and go to my orders"
CLI JSON output includes both legacy and structured fields:
{
"completed": true,
"result": "Task completed successfully",
"response": {
"return": "Task completed successfully",
"data": "file1.txt\nfile2.txt"
},
"return": "Task completed successfully",
"data": "file1.txt\nfile2.txt"
}
Server
python main.py server
Or use the PowerShell launcher:
.\start_backend.ps1
Windows Service
Run these from an elevated PowerShell session (Run as Administrator): Requires .NET SDK 10+ (installer publishes a native service host executable).
Install and start at boot:
.\install_backend_service.ps1 -ForceReinstall -StartAfterInstall -DelayedAutoStart
Check status:
Get-Service -Name ScreenJobBackend
Stop/start manually:
Stop-Service -Name ScreenJobBackend
Start-Service -Name ScreenJobBackend
Uninstall:
.\uninstall_backend_service.ps1
Service logs are written to:
screenjob_runs/service/backend-service.stdout.log
screenjob_runs/service/backend-service.stderr.log
System Tray Icon (Windows)
Start tray icon now:
powershell -NoProfile -ExecutionPolicy Bypass -STA -File .\screenjob_tray.ps1
Install startup shortcut (current user):
.\install_tray_startup_shortcut.ps1
Install startup shortcut for all users:
.\install_tray_startup_shortcut.ps1 -AllUsers
Remove startup shortcut:
.\install_tray_startup_shortcut.ps1 -Remove
Tray menu actions:
- Refresh service status
- Start/Stop/Restart service (prompts for admin/UAC)
- Open dashboard URL from
.envSCREENJOB_HOST/SCREENJOB_PORT - Open service logs folder
- Exit tray icon process
Auth for all API routes:
Authorization: Bearer <SCREENJOB_TOKEN>X-ScreenJob-Token: <SCREENJOB_TOKEN>- Query fallback
?token=(mainly for UI/websocket/artifact fetch)
Create Job
POST /api/jobs
{
"job": "run \"ls -a\" in C:/Users/username/Documents and return output",
"model": "gpt-5.4-mini",
"disabled_tools": [],
"safety_override": false
}
Response:
{ "job_id": "job_..." }
Job Status / History
GET /api/jobs/{job_id}GET /api/jobs/{job_id}/statusGET /api/jobs/{job_id}/eventsGET /api/jobsPOST /api/jobs/{job_id}/cancelGET /api/stats
Each job payload includes:
result(compat string)response.returnresponse.data- top-level
returnanddataaliases
Monitoring UI
- URL:
/ - Read-only dashboard (no run controls)
- Requires token input
- Live updates via
/ws - Analytics dashboards for success rate by objective category and daily averages
- Set
DISABLE_UI=trueto disable UI
Analytics API
GET /api/analytics- Returns objective-category success rates plus average steps/cost over time
Agent Instructions (Practical)
- Prefer
execute_commandfor deterministic actions (opening URLs, filesystem checks). - Use
see_screenbefore UI interaction. - Use
enhancebefore clicking small/ambiguous targets; preferregion="small"for compact controls. - Use
enhancemode="text"for tiny labels/text, ormode="ui"for general UI. - Optionally set
enhancescale(2-6) for tighter zoom control. - Use
press_keyfor non-text keys (Enter, Tab, arrows, Escape). - For shortcuts, use one
press_keycall with combo syntax (example:win+r). - Use
clickoffsets viaoffset_up/down/left/rightand optionalsleep_after_seconds. - When done, call:
task_complete(return="...", data=...)
- Before
task_complete, verify expected on-screen content withsee_screen(andenhanceif needed), and include anobserved_resultsummary indata.
data should contain useful structured output for the requester (text, object, list, etc.).
Verification
Local:
pytest -q
CI:
.gitea/workflows/ci.ymlruns compile checks + tests on push/PR.
Compatibility Entry Point
python screenjob.py "<job>"remains supported as a wrapper tomain.py.
License
Apache License 2.0. See LICENSE.