620fcc4aa663717a7211941217848126332034e7
ScreenJob
ScreenJob is an autonomous desktop-and-terminal execution service.
It lets an LLM use controlled local tools (screen, click, type, shell) to complete GUI-heavy tasks on a real computer.
What It Solves
- Runs agent-driven tasks that require a graphical interface.
- Exposes both CLI and HTTP API modes.
- Stores job history and events in SQLite.
- Streams live monitoring updates over WebSocket.
- Returns structured agent output as:
return: human-readable completion messagedata: structured payload (for example command output)
Core Features
- Tool-based agent loop (
execute_command,see_screen,enhance,click,type,press_key,sleep,task_complete) - Safety pre-check with override support
- Per-job tool disable list
- Live/final usage and cost estimates
- Read-only Tailwind monitoring UI
- Persistent job and event history
Project Layout
main.py
screenjob.py
requirements.txt
start_backend.ps1
src/
agent.py
app_main.py
cli.py
config.py
models.py
pricing.py
runtime.py
safety.py
server.py
storage.py
task_manager.py
ui.py
utils.py
tests/
test_agent_tools.py
test_pricing.py
test_server_api.py
test_storage.py
.gitea/workflows/ci.yml
Setup
- Install Python 3.11+.
- Install dependencies:
pip install -r requirements.txt
- Create
.envin project root:
OPENAI_API_KEY=...
SCREENJOB_TOKEN=choose_a_strong_token
# Optional
SCREENJOB_DEFAULT_MODEL=gpt-5.4-mini
SCREENJOB_SAFETY_MODEL=gpt-5.4-mini
SCREENJOB_HOST=127.0.0.1
SCREENJOB_PORT=8787
DISABLE_UI=false
Usage
CLI
python main.py run "Open amazon.de and go to my orders"
CLI JSON output includes both legacy and structured fields:
{
"completed": true,
"result": "Task completed successfully",
"response": {
"return": "Task completed successfully",
"data": "file1.txt\nfile2.txt"
},
"return": "Task completed successfully",
"data": "file1.txt\nfile2.txt"
}
Server
python main.py server
Or use the PowerShell launcher:
.\start_backend.ps1
Auth for all API routes:
Authorization: Bearer <SCREENJOB_TOKEN>X-ScreenJob-Token: <SCREENJOB_TOKEN>- Query fallback
?token=(mainly for UI/websocket/artifact fetch)
Create Job
POST /api/jobs
{
"job": "run \"ls -a\" in C:/Users/username/Documents and return output",
"model": "gpt-5.4-mini",
"disabled_tools": [],
"safety_override": false
}
Response:
{ "job_id": "job_..." }
Job Status / History
GET /api/jobs/{job_id}GET /api/jobs/{job_id}/statusGET /api/jobs/{job_id}/eventsGET /api/jobsPOST /api/jobs/{job_id}/cancelGET /api/stats
Each job payload includes:
result(compat string)response.returnresponse.data- top-level
returnanddataaliases
Monitoring UI
- URL:
/ - Read-only dashboard (no run controls)
- Requires token input
- Live updates via
/ws - Set
DISABLE_UI=trueto disable UI
Agent Instructions (Practical)
- Prefer
execute_commandfor deterministic actions (opening URLs, filesystem checks). - Use
see_screenbefore UI interaction. - Use
enhancewhen text is unclear. - Use
press_keyfor non-text keys (Enter, Tab, arrows, Escape). - For shortcuts, use one
press_keycall with combo syntax (example:win+r). - Use
clickoffsets viaoffset_up/down/left/rightand optionalsleep_after_seconds. - When done, call:
task_complete(return="...", data=...)
- Before
task_complete, verify expected on-screen content withsee_screen(andenhanceif needed), and include anobserved_resultsummary indata.
data should contain useful structured output for the requester (text, object, list, etc.).
Verification
Local:
pytest -q
CI:
.gitea/workflows/ci.ymlruns compile checks + tests on push/PR.
Compatibility Entry Point
python screenjob.py "<job>"remains supported as a wrapper tomain.py.
License
Apache License 2.0. See LICENSE.
Description
Agents interacting with Agents to make your Computer do things without you
Languages
Python
70%
JavaScript
12.3%
PowerShell
11%
C#
3.7%
HTML
2.8%
Other
0.2%