273 lines
5.7 KiB
Markdown
273 lines
5.7 KiB
Markdown
# ScreenJob
|
|
|
|
ScreenJob is an autonomous desktop-and-terminal execution service.
|
|
It lets an LLM use controlled local tools (screen, click, type, shell) to complete GUI-heavy tasks on a real computer.
|
|
|
|
## What It Solves
|
|
|
|
- Runs agent-driven tasks that require a graphical interface.
|
|
- Exposes both CLI and HTTP API modes.
|
|
- Stores job history and events in SQLite.
|
|
- Streams live monitoring updates over WebSocket.
|
|
- Returns structured agent output as:
|
|
- `return`: human-readable completion message
|
|
- `data`: structured payload (for example command output)
|
|
|
|
## Core Features
|
|
|
|
- Tool-based agent loop (`execute_command`, `see_screen`, `enhance`, `click`, `type`, `press_key`, `sleep`, `task_complete`)
|
|
- Safety pre-check with override support
|
|
- Per-job tool disable list
|
|
- Live/final usage and cost estimates
|
|
- Read-only Tailwind monitoring UI
|
|
- Persistent job and event history
|
|
|
|
## Project Layout
|
|
|
|
```text
|
|
main.py
|
|
screenjob.py
|
|
requirements.txt
|
|
start_backend.ps1
|
|
src/
|
|
agent.py
|
|
app_main.py
|
|
cli.py
|
|
config.py
|
|
models.py
|
|
pricing.py
|
|
runtime.py
|
|
safety.py
|
|
server.py
|
|
storage.py
|
|
task_manager.py
|
|
ui.py
|
|
utils.py
|
|
tests/
|
|
test_agent_tools.py
|
|
test_pricing.py
|
|
test_server_api.py
|
|
test_storage.py
|
|
.gitea/workflows/ci.yml
|
|
```
|
|
|
|
## Setup
|
|
|
|
1. Install Python 3.11+.
|
|
2. Install dependencies:
|
|
|
|
```powershell
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
3. Create `.env` in project root:
|
|
|
|
```env
|
|
OPENAI_API_KEY=...
|
|
SCREENJOB_TOKEN=choose_a_strong_token
|
|
|
|
# Optional
|
|
SCREENJOB_DEFAULT_MODEL=gpt-5.4-mini
|
|
SCREENJOB_SAFETY_MODEL=gpt-5.4-mini
|
|
SCREENJOB_HOST=127.0.0.1
|
|
SCREENJOB_PORT=8787
|
|
DISABLE_UI=false
|
|
```
|
|
|
|
## Usage
|
|
|
|
### CLI
|
|
|
|
```powershell
|
|
python main.py run "Open amazon.de and go to my orders"
|
|
```
|
|
|
|
CLI JSON output includes both legacy and structured fields:
|
|
|
|
```json
|
|
{
|
|
"completed": true,
|
|
"result": "Task completed successfully",
|
|
"response": {
|
|
"return": "Task completed successfully",
|
|
"data": "file1.txt\nfile2.txt"
|
|
},
|
|
"return": "Task completed successfully",
|
|
"data": "file1.txt\nfile2.txt"
|
|
}
|
|
```
|
|
|
|
### Server
|
|
|
|
```powershell
|
|
python main.py server
|
|
```
|
|
|
|
Or use the PowerShell launcher:
|
|
|
|
```powershell
|
|
.\start_backend.ps1
|
|
```
|
|
|
|
### Windows Service
|
|
|
|
Run these from an elevated PowerShell session (Run as Administrator):
|
|
Requires .NET SDK 10+ (installer publishes a native service host executable).
|
|
|
|
Install and start at boot:
|
|
|
|
```powershell
|
|
.\install_backend_service.ps1 -ForceReinstall -StartAfterInstall -DelayedAutoStart
|
|
```
|
|
|
|
Check status:
|
|
|
|
```powershell
|
|
Get-Service -Name ScreenJobBackend
|
|
```
|
|
|
|
Stop/start manually:
|
|
|
|
```powershell
|
|
Stop-Service -Name ScreenJobBackend
|
|
Start-Service -Name ScreenJobBackend
|
|
```
|
|
|
|
Uninstall:
|
|
|
|
```powershell
|
|
.\uninstall_backend_service.ps1
|
|
```
|
|
|
|
Service logs are written to:
|
|
|
|
```text
|
|
screenjob_runs/service/backend-service.stdout.log
|
|
screenjob_runs/service/backend-service.stderr.log
|
|
```
|
|
|
|
### System Tray Icon (Windows)
|
|
|
|
Start tray icon now:
|
|
|
|
```powershell
|
|
powershell -NoProfile -ExecutionPolicy Bypass -STA -File .\screenjob_tray.ps1
|
|
```
|
|
|
|
Install startup shortcut (current user):
|
|
|
|
```powershell
|
|
.\install_tray_startup_shortcut.ps1
|
|
```
|
|
|
|
Install startup shortcut for all users:
|
|
|
|
```powershell
|
|
.\install_tray_startup_shortcut.ps1 -AllUsers
|
|
```
|
|
|
|
Remove startup shortcut:
|
|
|
|
```powershell
|
|
.\install_tray_startup_shortcut.ps1 -Remove
|
|
```
|
|
|
|
Tray menu actions:
|
|
|
|
- Refresh service status
|
|
- Start/Stop/Restart service (prompts for admin/UAC)
|
|
- Open dashboard URL from `.env` `SCREENJOB_HOST` / `SCREENJOB_PORT`
|
|
- Open service logs folder
|
|
- Exit tray icon process
|
|
|
|
Auth for all API routes:
|
|
|
|
- `Authorization: Bearer <SCREENJOB_TOKEN>`
|
|
- `X-ScreenJob-Token: <SCREENJOB_TOKEN>`
|
|
- Query fallback `?token=` (mainly for UI/websocket/artifact fetch)
|
|
|
|
### Create Job
|
|
|
|
`POST /api/jobs`
|
|
|
|
```json
|
|
{
|
|
"job": "run \"ls -a\" in C:/Users/username/Documents and return output",
|
|
"model": "gpt-5.4-mini",
|
|
"disabled_tools": [],
|
|
"safety_override": false
|
|
}
|
|
```
|
|
|
|
Response:
|
|
|
|
```json
|
|
{ "job_id": "job_..." }
|
|
```
|
|
|
|
### Job Status / History
|
|
|
|
- `GET /api/jobs/{job_id}`
|
|
- `GET /api/jobs/{job_id}/status`
|
|
- `GET /api/jobs/{job_id}/events`
|
|
- `GET /api/jobs`
|
|
- `POST /api/jobs/{job_id}/cancel`
|
|
- `GET /api/stats`
|
|
|
|
Each job payload includes:
|
|
|
|
- `result` (compat string)
|
|
- `response.return`
|
|
- `response.data`
|
|
- top-level `return` and `data` aliases
|
|
|
|
### Monitoring UI
|
|
|
|
- URL: `/`
|
|
- Read-only dashboard (no run controls)
|
|
- Requires token input
|
|
- Live updates via `/ws`
|
|
- Analytics dashboards for success rate by objective category and daily averages
|
|
- Set `DISABLE_UI=true` to disable UI
|
|
|
|
### Analytics API
|
|
|
|
- `GET /api/analytics`
|
|
- Returns objective-category success rates plus average steps/cost over time
|
|
|
|
## Agent Instructions (Practical)
|
|
|
|
- Prefer `execute_command` for deterministic actions (opening URLs, filesystem checks).
|
|
- Use `see_screen` before UI interaction.
|
|
- Use `enhance` before clicking small/ambiguous targets; prefer `region="small"` for compact controls.
|
|
- Use `enhance` `mode="text"` for tiny labels/text, or `mode="ui"` for general UI.
|
|
- Optionally set `enhance` `scale` (2-6) for tighter zoom control.
|
|
- Use `press_key` for non-text keys (Enter, Tab, arrows, Escape).
|
|
- For shortcuts, use one `press_key` call with combo syntax (example: `win+r`).
|
|
- Use `click` offsets via `offset_up/down/left/right` and optional `sleep_after_seconds`.
|
|
- When done, call:
|
|
- `task_complete(return="...", data=...)`
|
|
- Before `task_complete`, verify expected on-screen content with `see_screen` (and `enhance` if needed), and include an `observed_result` summary in `data`.
|
|
|
|
`data` should contain useful structured output for the requester (text, object, list, etc.).
|
|
|
|
## Verification
|
|
|
|
Local:
|
|
|
|
```powershell
|
|
pytest -q
|
|
```
|
|
|
|
CI:
|
|
|
|
- `.gitea/workflows/ci.yml` runs compile checks + tests on push/PR.
|
|
|
|
## Compatibility Entry Point
|
|
|
|
- `python screenjob.py "<job>"` remains supported as a wrapper to `main.py`.
|
|
|
|
## License
|
|
|
|
Apache License 2.0. See `LICENSE`.
|