feat: finalize production cleanup with structured agent responses and project governance

This commit is contained in:
Space-Banane
2026-05-27 18:08:52 +02:00
parent a19b285232
commit c09f0ee9c0
17 changed files with 737 additions and 126 deletions

236
README.md
View File

@@ -1,23 +1,66 @@
# ScreenJob
Desktop-and-terminal task agent with:
ScreenJob is an autonomous desktop-and-terminal execution service.
It lets an LLM use controlled local tools (screen, click, type, shell) to complete GUI-heavy tasks on a real computer.
- CLI runner
- FastAPI job server
- SQLite task history
- WebSocket-powered monitoring UI
- Safety pre-check and per-job tool disable controls
- Live/final token and cost estimation
## What It Solves
## Install
- Runs agent-driven tasks that require a graphical interface.
- Exposes both CLI and HTTP API modes.
- Stores job history and events in SQLite.
- Streams live monitoring updates over WebSocket.
- Returns structured agent output as:
- `return`: human-readable completion message
- `data`: structured payload (for example command output)
```powershell
pip install openai pillow pyautogui python-dotenv fastapi uvicorn
## Core Features
- Tool-based agent loop (`execute_command`, `see_screen`, `enhance`, `click`, `type`, `press_key`, `sleep`, `task_complete`)
- Safety pre-check with override support
- Per-job tool disable list
- Live/final usage and cost estimates
- Read-only Tailwind monitoring UI
- Persistent job and event history
## Project Layout
```text
main.py
screenjob.py
requirements.txt
docker-compose.yml
src/
agent.py
app_main.py
cli.py
config.py
models.py
pricing.py
runtime.py
safety.py
server.py
storage.py
task_manager.py
ui.py
utils.py
tests/
test_agent_tools.py
test_pricing.py
test_server_api.py
test_storage.py
.gitea/workflows/ci.yml
```
## Environment
## Setup
Create `.env` in project root:
1. Install Python 3.11+.
2. Install dependencies:
```powershell
pip install -r requirements.txt
```
3. Create `.env` in project root:
```env
OPENAI_API_KEY=...
@@ -31,44 +74,50 @@ SCREENJOB_PORT=8787
DISABLE_UI=false
```
## Entry Points
## Usage
- `python main.py run "<job>"`
- `python main.py server`
- Backward-compatible wrapper: `python screenjob.py "<job>"`
## CLI Usage
### CLI
```powershell
python main.py run "Open amazon.de and go to my orders"
```
Useful flags:
CLI JSON output includes both legacy and structured fields:
- `--model gpt-5.4-mini`
- `--disable-tool click --disable-tool type`
- `--skip-safety-check`
- `--max-steps 80`
```json
{
"completed": true,
"result": "Task completed successfully",
"response": {
"return": "Task completed successfully",
"data": "file1.txt\nfile2.txt"
},
"return": "Task completed successfully",
"data": "file1.txt\nfile2.txt"
}
```
## HTTP API
### Server
All API routes require token auth using `SCREENJOB_TOKEN`:
```powershell
python main.py server
```
- `Authorization: Bearer <token>` or
- `X-ScreenJob-Token: <token>`
- (for browser/image fetch) `?token=<token>` query parameter
Auth for all API routes:
- `Authorization: Bearer <SCREENJOB_TOKEN>`
- `X-ScreenJob-Token: <SCREENJOB_TOKEN>`
- Query fallback `?token=` (mainly for UI/websocket/artifact fetch)
### Create Job
`POST /api/jobs`
Body:
```json
{
"job": "Open amazon.de and go to my orders",
"job": "run \"ls -a\" in C:/Users/username/Documents and return output",
"model": "gpt-5.4-mini",
"disabled_tools": ["click"],
"disabled_tools": [],
"safety_override": false
}
```
@@ -79,103 +128,68 @@ Response:
{ "job_id": "job_..." }
```
### Status / Output
### Job Status / History
- `GET /api/jobs/{job_id}`: full status + output + live/final usage/cost
- `GET /api/jobs/{job_id}/status`: status alias
- `GET /api/jobs/{job_id}/events`: detailed timeline
- `GET /api/jobs/{job_id}/artifact?path=<absolute_path>&token=<token>`: authenticated artifact file fetch for screenshots/enhancements
- `GET /api/jobs`: list active + past jobs
- `POST /api/jobs/{job_id}/cancel`: graceful cancellation
- `GET /api/stats`: aggregate metrics
- `GET /api/jobs/{job_id}`
- `GET /api/jobs/{job_id}/status`
- `GET /api/jobs/{job_id}/events`
- `GET /api/jobs`
- `POST /api/jobs/{job_id}/cancel`
- `GET /api/stats`
## Monitoring UI
Each job payload includes:
- Served at `/` when `DISABLE_UI=false`
- Tailwind-based read-only dashboard
- Requires entering `SCREENJOB_TOKEN` in UI before data loads
- Uses WebSocket `/ws` for live updates (tool calls, step events, usage/cost updates)
- No task launch controls in UI (monitoring only)
- `result` (compat string)
- `response.return`
- `response.data`
- top-level `return` and `data` aliases
If `DISABLE_UI=true`, `/` returns `{ "ui_disabled": true }` and only API endpoints remain.
### Monitoring UI
## Safety
- URL: `/`
- Read-only dashboard (no run controls)
- Requires token input
- Live updates via `/ws`
- Set `DISABLE_UI=true` to disable UI
Before execution, each task is classified by a model safety gate:
## Agent Instructions (Practical)
- Safe: task runs
- Unsafe: task is rejected and recorded
- Override: set `safety_override=true` (or `--skip-safety-check` in CLI)
- Prefer `execute_command` for deterministic actions (opening URLs, filesystem checks).
- Use `see_screen` before UI interaction.
- Use `enhance` when text is unclear.
- Use `press_key` for non-text keys (Enter, Tab, arrows, Escape).
- Use `click` offsets via `offset_up/down/left/right` and optional `sleep_after_seconds`.
- When done, call:
- `task_complete(return="...", data=...)`
## Tool Controls
`data` should contain useful structured output for the requester (text, object, list, etc.).
Per-job tool allowlisting via disable list:
## Docker Compose
- API: `disabled_tools: ["type", "click"]`
- CLI: `--disable-tool type --disable-tool click`
Run server in container:
Available tools:
- `execute_command(command)`
- `sleep(seconds)`
- `see_screen()`
- `enhance(coordinate)`
- `click(coordinate, offset_up/down/left/right, sleep_after_seconds)`
- `type(text)`
- `press_key(key, repeats=1)`
- `task_complete(result)`
## Cost Estimation
Live/final cost is computed from OpenAI response usage (`input`, `cached_input`, `output`) and model pricing rates in `src/pricing.py`.
- Live: exposed in `GET /api/jobs/{job_id}` during execution
- Final: persisted in SQLite and returned in status output
## Persistence
- SQLite DB: `screenjob.db`
- Runs/artifacts: `screenjob_runs/run_YYYYMMDD_HHMMSS/...`
- Full event log per job (for history and UI)
## Project Layout
```text
main.py
screenjob.py
src/
__init__.py
agent.py
app_main.py
cli.py
config.py
models.py
pricing.py
runtime.py
safety.py
server.py
storage.py
task_manager.py
ui.py
tests/
conftest.py
test_pricing.py
test_server_api.py
test_storage.py
.gitea/
workflows/
ci.yml
```powershell
docker compose up --build
```
Service uses official Python image and reads `.env`.
## Verification
Run local verification:
Local:
```powershell
pytest -q
```
Gitea CI pipeline:
CI:
- File: `.gitea/workflows/ci.yml`
- Runs compile checks + pytest on push and PR.
- `.gitea/workflows/ci.yml` runs compile checks + tests on push/PR.
## Compatibility Entry Point
- `python screenjob.py "<job>"` remains supported as a wrapper to `main.py`.
## License
Apache License 2.0. See `LICENSE`.