feat: add authenticated artifact streaming and UI visual previews

2026-05-27 17:50:21 +02:00
parent 10355bf11a
commit 8fe6ad2d75
6 changed files with 184 additions and 57 deletions
--- a/README.md
+++ b/README.md
@@ -1,42 +1,123 @@
 # ScreenJob

-Single-file behavior, split into maintainable modules under `src/`.
+Desktop-and-terminal task agent with:

-## Entry point
-
- Primary: `python main.py "<task>"`
- Backward compatible: `python screenjob.py "<task>"`
+- CLI runner
+- FastAPI job server
+- SQLite task history
+- WebSocket-powered monitoring UI
+- Safety pre-check and per-job tool disable controls
+- Live/final token and cost estimation

 ## Install

 ```powershell
-pip install openai pillow pyautogui python-dotenv
+pip install openai pillow pyautogui python-dotenv fastapi uvicorn
 ```

-## Configure
+## Environment

-Create a `.env` file in project root:
+Create `.env` in project root:

 ```env
-OPENAI_API_KEY=your_key_here
+OPENAI_API_KEY=...
+SCREENJOB_TOKEN=choose_a_strong_token
+
+# Optional
+SCREENJOB_DEFAULT_MODEL=gpt-5.4-mini
+SCREENJOB_SAFETY_MODEL=gpt-5.4-mini
+SCREENJOB_HOST=127.0.0.1
+SCREENJOB_PORT=8787
+DISABLE_UI=false
 ```

-## Usage
+## Entry Points
+
+- `python main.py run "<job>"`
+- `python main.py server`
+- Backward-compatible wrapper: `python screenjob.py "<job>"`
+
+## CLI Usage

 ```powershell
-python main.py "Open amazon.de and go to my orders"
+python main.py run "Open amazon.de and go to my orders"
 ```

-Optional flags:
+Useful flags:

-```powershell
-python main.py "Open amazon.de" --model gpt-5.2 --max-steps 80
+- `--model gpt-5.4-mini`
+- `--disable-tool click --disable-tool type`
+- `--skip-safety-check`
+- `--max-steps 80`
+
+## HTTP API
+
+All API routes require token auth using `SCREENJOB_TOKEN`:
+
+- `Authorization: Bearer <token>` or
+- `X-ScreenJob-Token: <token>`
+- (for browser/image fetch) `?token=<token>` query parameter
+
+### Create Job
+
+`POST /api/jobs`
+
+Body:
+
+```json
+{
+  "job": "Open amazon.de and go to my orders",
+  "model": "gpt-5.4-mini",
+  "disabled_tools": ["click"],
+  "safety_override": false
+}
 ```

-## Tools exposed to the model
+Response:
+
+```json
+{ "job_id": "job_..." }
+```
+
+### Status / Output
+
+- `GET /api/jobs/{job_id}`: full status + output + live/final usage/cost
+- `GET /api/jobs/{job_id}/status`: status alias
+- `GET /api/jobs/{job_id}/events`: detailed timeline
+- `GET /api/jobs/{job_id}/artifact?path=<absolute_path>&token=<token>`: authenticated artifact file fetch for screenshots/enhancements
+- `GET /api/jobs`: list active + past jobs
+- `POST /api/jobs/{job_id}/cancel`: graceful cancellation
+- `GET /api/stats`: aggregate metrics
+
+## Monitoring UI
+
+- Served at `/` when `DISABLE_UI=false`
+- Tailwind-based read-only dashboard
+- Requires entering `SCREENJOB_TOKEN` in UI before data loads
+- Uses WebSocket `/ws` for live updates (tool calls, step events, usage/cost updates)
+- No task launch controls in UI (monitoring only)
+
+If `DISABLE_UI=true`, `/` returns `{ "ui_disabled": true }` and only API endpoints remain.
+
+## Safety
+
+Before execution, each task is classified by a model safety gate:
+
+- Safe: task runs
+- Unsafe: task is rejected and recorded
+- Override: set `safety_override=true` (or `--skip-safety-check` in CLI)
+
+## Tool Controls
+
+Per-job tool allowlisting via disable list:
+
+- API: `disabled_tools: ["type", "click"]`
+- CLI: `--disable-tool type --disable-tool click`
+
+Available tools:

 - `execute_command(command)`
- `sleep(seconds)` (replaces shell-based sleep calls)
+- `sleep(seconds)`
 - `see_screen()`
 - `enhance(coordinate)`
 - `click(coordinate, offset_up/down/left/right, sleep_after_seconds)`
@@ -44,51 +125,36 @@ python main.py "Open amazon.de" --model gpt-5.2 --max-steps 80
 - `press_key(key, repeats=1)`
 - `task_complete(result)`

-### Offset examples
+## Cost Estimation

- `{"coordinate":{"x":1000,"y":500},"offset_up":"2px"}`
- `{"coordinate":{"x":1000,"y":500},"offset_right":4}`
+Live/final cost is computed from OpenAI response usage (`input`, `cached_input`, `output`) and model pricing rates in `src/pricing.py`.

-### Multi-tool calls in one step
+- Live: exposed in `GET /api/jobs/{job_id}` during execution
+- Final: persisted in SQLite and returned in status output

-The agent supports multiple tool calls in a single model response and executes them in order.  
-Example sequence in one step:
+## Persistence

-1. `click(...)`
-2. `sleep({"seconds": 1.5})`
+- SQLite DB: `screenjob.db`
+- Runs/artifacts: `screenjob_runs/run_YYYYMMDD_HHMMSS/...`
+- Full event log per job (for history and UI)

-You can also use `click(..., sleep_after_seconds=1.5)` for a one-call variant.
-
-## Output
-
-Each run creates:
-
- `screenjob_runs/run_YYYYMMDD_HHMMSS/logs/screenjob.log`
- `screenjob_runs/run_YYYYMMDD_HHMMSS/screens/*.png`
- `screenjob_runs/run_YYYYMMDD_HHMMSS/enhanced/*.png`
-
-Final stdout is JSON:
-
-```json
-{
-  "completed": true,
-  "result": "...",
-  "steps": 13,
-  "elapsed_seconds": 59.691,
-  "artifacts_dir": "C:\\...\\screenjob_runs\\run_..."
-}
-```
-
-## Project layout
+## Project Layout

 ```text
 main.py
 screenjob.py
 src/
  __init__.py
-  cli.py
  agent.py
+  app_main.py
+  cli.py
+  config.py
  models.py
-  utils.py
+  pricing.py
+  runtime.py
+  safety.py
+  server.py
+  storage.py
+  task_manager.py
+  ui.py
 ```
-