feat: finalize production cleanup with structured agent responses and project governance

2026-05-27 18:08:52 +02:00
parent a19b285232
commit c09f0ee9c0
17 changed files with 737 additions and 126 deletions
--- a/README.md
+++ b/README.md
@@ -1,23 +1,66 @@
 # ScreenJob

-Desktop-and-terminal task agent with:
+ScreenJob is an autonomous desktop-and-terminal execution service.  
+It lets an LLM use controlled local tools (screen, click, type, shell) to complete GUI-heavy tasks on a real computer.

- CLI runner
- FastAPI job server
- SQLite task history
- WebSocket-powered monitoring UI
- Safety pre-check and per-job tool disable controls
- Live/final token and cost estimation
+## What It Solves

-## Install
+- Runs agent-driven tasks that require a graphical interface.
+- Exposes both CLI and HTTP API modes.
+- Stores job history and events in SQLite.
+- Streams live monitoring updates over WebSocket.
+- Returns structured agent output as:
+  - `return`: human-readable completion message
+  - `data`: structured payload (for example command output)

-```powershell
-pip install openai pillow pyautogui python-dotenv fastapi uvicorn
+## Core Features
+
+- Tool-based agent loop (`execute_command`, `see_screen`, `enhance`, `click`, `type`, `press_key`, `sleep`, `task_complete`)
+- Safety pre-check with override support
+- Per-job tool disable list
+- Live/final usage and cost estimates
+- Read-only Tailwind monitoring UI
+- Persistent job and event history
+
+## Project Layout
+
+```text
+main.py
+screenjob.py
+requirements.txt
+docker-compose.yml
+src/
+  agent.py
+  app_main.py
+  cli.py
+  config.py
+  models.py
+  pricing.py
+  runtime.py
+  safety.py
+  server.py
+  storage.py
+  task_manager.py
+  ui.py
+  utils.py
+tests/
+  test_agent_tools.py
+  test_pricing.py
+  test_server_api.py
+  test_storage.py
+.gitea/workflows/ci.yml
 ```

-## Environment
+## Setup

-Create `.env` in project root:
+1. Install Python 3.11+.
+2. Install dependencies:
+
+```powershell
+pip install -r requirements.txt
+```
+
+3. Create `.env` in project root:

 ```env
 OPENAI_API_KEY=...
@@ -31,44 +74,50 @@ SCREENJOB_PORT=8787
 DISABLE_UI=false
 ```

-## Entry Points
+## Usage

- `python main.py run "<job>"`
- `python main.py server`
- Backward-compatible wrapper: `python screenjob.py "<job>"`
-
-## CLI Usage
+### CLI

 ```powershell
 python main.py run "Open amazon.de and go to my orders"
 ```

-Useful flags:
+CLI JSON output includes both legacy and structured fields:

- `--model gpt-5.4-mini`
- `--disable-tool click --disable-tool type`
- `--skip-safety-check`
- `--max-steps 80`
+```json
+{
+  "completed": true,
+  "result": "Task completed successfully",
+  "response": {
+    "return": "Task completed successfully",
+    "data": "file1.txt\nfile2.txt"
+  },
+  "return": "Task completed successfully",
+  "data": "file1.txt\nfile2.txt"
+}
+```

-## HTTP API
+### Server

-All API routes require token auth using `SCREENJOB_TOKEN`:
+```powershell
+python main.py server
+```

- `Authorization: Bearer <token>` or
- `X-ScreenJob-Token: <token>`
- (for browser/image fetch) `?token=<token>` query parameter
+Auth for all API routes:
+
+- `Authorization: Bearer <SCREENJOB_TOKEN>`
+- `X-ScreenJob-Token: <SCREENJOB_TOKEN>`
+- Query fallback `?token=` (mainly for UI/websocket/artifact fetch)

 ### Create Job

 `POST /api/jobs`

-Body:
-
 ```json
 {
-  "job": "Open amazon.de and go to my orders",
+  "job": "run \"ls -a\" in C:/Users/username/Documents and return output",
  "model": "gpt-5.4-mini",
-  "disabled_tools": ["click"],
+  "disabled_tools": [],
  "safety_override": false
 }
 ```
@@ -79,103 +128,68 @@ Response:
 { "job_id": "job_..." }
 ```

-### Status / Output
+### Job Status / History

- `GET /api/jobs/{job_id}`: full status + output + live/final usage/cost
- `GET /api/jobs/{job_id}/status`: status alias
- `GET /api/jobs/{job_id}/events`: detailed timeline
- `GET /api/jobs/{job_id}/artifact?path=<absolute_path>&token=<token>`: authenticated artifact file fetch for screenshots/enhancements
- `GET /api/jobs`: list active + past jobs
- `POST /api/jobs/{job_id}/cancel`: graceful cancellation
- `GET /api/stats`: aggregate metrics
+- `GET /api/jobs/{job_id}`
+- `GET /api/jobs/{job_id}/status`
+- `GET /api/jobs/{job_id}/events`
+- `GET /api/jobs`
+- `POST /api/jobs/{job_id}/cancel`
+- `GET /api/stats`

-## Monitoring UI
+Each job payload includes:

- Served at `/` when `DISABLE_UI=false`
- Tailwind-based read-only dashboard
- Requires entering `SCREENJOB_TOKEN` in UI before data loads
- Uses WebSocket `/ws` for live updates (tool calls, step events, usage/cost updates)
- No task launch controls in UI (monitoring only)
+- `result` (compat string)
+- `response.return`
+- `response.data`
+- top-level `return` and `data` aliases

-If `DISABLE_UI=true`, `/` returns `{ "ui_disabled": true }` and only API endpoints remain.
+### Monitoring UI

-## Safety
+- URL: `/`
+- Read-only dashboard (no run controls)
+- Requires token input
+- Live updates via `/ws`
+- Set `DISABLE_UI=true` to disable UI

-Before execution, each task is classified by a model safety gate:
+## Agent Instructions (Practical)

- Safe: task runs
- Unsafe: task is rejected and recorded
- Override: set `safety_override=true` (or `--skip-safety-check` in CLI)
+- Prefer `execute_command` for deterministic actions (opening URLs, filesystem checks).
+- Use `see_screen` before UI interaction.
+- Use `enhance` when text is unclear.
+- Use `press_key` for non-text keys (Enter, Tab, arrows, Escape).
+- Use `click` offsets via `offset_up/down/left/right` and optional `sleep_after_seconds`.
+- When done, call:
+  - `task_complete(return="...", data=...)`

-## Tool Controls
+`data` should contain useful structured output for the requester (text, object, list, etc.).

-Per-job tool allowlisting via disable list:
+## Docker Compose

- API: `disabled_tools: ["type", "click"]`
- CLI: `--disable-tool type --disable-tool click`
+Run server in container:

-Available tools:
-
- `execute_command(command)`
- `sleep(seconds)`
- `see_screen()`
- `enhance(coordinate)`
- `click(coordinate, offset_up/down/left/right, sleep_after_seconds)`
- `type(text)`
- `press_key(key, repeats=1)`
- `task_complete(result)`
-
-## Cost Estimation
-
-Live/final cost is computed from OpenAI response usage (`input`, `cached_input`, `output`) and model pricing rates in `src/pricing.py`.
-
- Live: exposed in `GET /api/jobs/{job_id}` during execution
- Final: persisted in SQLite and returned in status output
-
-## Persistence
-
- SQLite DB: `screenjob.db`
- Runs/artifacts: `screenjob_runs/run_YYYYMMDD_HHMMSS/...`
- Full event log per job (for history and UI)
-
-## Project Layout
-
-```text
-main.py
-screenjob.py
-src/
-  __init__.py
-  agent.py
-  app_main.py
-  cli.py
-  config.py
-  models.py
-  pricing.py
-  runtime.py
-  safety.py
-  server.py
-  storage.py
-  task_manager.py
-  ui.py
-tests/
-  conftest.py
-  test_pricing.py
-  test_server_api.py
-  test_storage.py
-.gitea/
-  workflows/
-    ci.yml
+```powershell
+docker compose up --build
 ```

+Service uses official Python image and reads `.env`.
+
 ## Verification

-Run local verification:
+Local:

 ```powershell
 pytest -q
 ```

-Gitea CI pipeline:
+CI:

- File: `.gitea/workflows/ci.yml`
- Runs compile checks + pytest on push and PR.
+- `.gitea/workflows/ci.yml` runs compile checks + tests on push/PR.
+
+## Compatibility Entry Point
+
+- `python screenjob.py "<job>"` remains supported as a wrapper to `main.py`.
+
+## License
+
+Apache License 2.0. See `LICENSE`.