feat: add authenticated artifact streaming and UI visual previews

This commit is contained in:
Space-Banane
2026-05-27 17:50:21 +02:00
parent 10355bf11a
commit 8fe6ad2d75
6 changed files with 184 additions and 57 deletions

166
README.md
View File

@@ -1,42 +1,123 @@
# ScreenJob
Single-file behavior, split into maintainable modules under `src/`.
Desktop-and-terminal task agent with:
## Entry point
- Primary: `python main.py "<task>"`
- Backward compatible: `python screenjob.py "<task>"`
- CLI runner
- FastAPI job server
- SQLite task history
- WebSocket-powered monitoring UI
- Safety pre-check and per-job tool disable controls
- Live/final token and cost estimation
## Install
```powershell
pip install openai pillow pyautogui python-dotenv
pip install openai pillow pyautogui python-dotenv fastapi uvicorn
```
## Configure
## Environment
Create a `.env` file in project root:
Create `.env` in project root:
```env
OPENAI_API_KEY=your_key_here
OPENAI_API_KEY=...
SCREENJOB_TOKEN=choose_a_strong_token
# Optional
SCREENJOB_DEFAULT_MODEL=gpt-5.4-mini
SCREENJOB_SAFETY_MODEL=gpt-5.4-mini
SCREENJOB_HOST=127.0.0.1
SCREENJOB_PORT=8787
DISABLE_UI=false
```
## Usage
## Entry Points
- `python main.py run "<job>"`
- `python main.py server`
- Backward-compatible wrapper: `python screenjob.py "<job>"`
## CLI Usage
```powershell
python main.py "Open amazon.de and go to my orders"
python main.py run "Open amazon.de and go to my orders"
```
Optional flags:
Useful flags:
```powershell
python main.py "Open amazon.de" --model gpt-5.2 --max-steps 80
- `--model gpt-5.4-mini`
- `--disable-tool click --disable-tool type`
- `--skip-safety-check`
- `--max-steps 80`
## HTTP API
All API routes require token auth using `SCREENJOB_TOKEN`:
- `Authorization: Bearer <token>` or
- `X-ScreenJob-Token: <token>`
- (for browser/image fetch) `?token=<token>` query parameter
### Create Job
`POST /api/jobs`
Body:
```json
{
"job": "Open amazon.de and go to my orders",
"model": "gpt-5.4-mini",
"disabled_tools": ["click"],
"safety_override": false
}
```
## Tools exposed to the model
Response:
```json
{ "job_id": "job_..." }
```
### Status / Output
- `GET /api/jobs/{job_id}`: full status + output + live/final usage/cost
- `GET /api/jobs/{job_id}/status`: status alias
- `GET /api/jobs/{job_id}/events`: detailed timeline
- `GET /api/jobs/{job_id}/artifact?path=<absolute_path>&token=<token>`: authenticated artifact file fetch for screenshots/enhancements
- `GET /api/jobs`: list active + past jobs
- `POST /api/jobs/{job_id}/cancel`: graceful cancellation
- `GET /api/stats`: aggregate metrics
## Monitoring UI
- Served at `/` when `DISABLE_UI=false`
- Tailwind-based read-only dashboard
- Requires entering `SCREENJOB_TOKEN` in UI before data loads
- Uses WebSocket `/ws` for live updates (tool calls, step events, usage/cost updates)
- No task launch controls in UI (monitoring only)
If `DISABLE_UI=true`, `/` returns `{ "ui_disabled": true }` and only API endpoints remain.
## Safety
Before execution, each task is classified by a model safety gate:
- Safe: task runs
- Unsafe: task is rejected and recorded
- Override: set `safety_override=true` (or `--skip-safety-check` in CLI)
## Tool Controls
Per-job tool allowlisting via disable list:
- API: `disabled_tools: ["type", "click"]`
- CLI: `--disable-tool type --disable-tool click`
Available tools:
- `execute_command(command)`
- `sleep(seconds)` (replaces shell-based sleep calls)
- `sleep(seconds)`
- `see_screen()`
- `enhance(coordinate)`
- `click(coordinate, offset_up/down/left/right, sleep_after_seconds)`
@@ -44,51 +125,36 @@ python main.py "Open amazon.de" --model gpt-5.2 --max-steps 80
- `press_key(key, repeats=1)`
- `task_complete(result)`
### Offset examples
## Cost Estimation
- `{"coordinate":{"x":1000,"y":500},"offset_up":"2px"}`
- `{"coordinate":{"x":1000,"y":500},"offset_right":4}`
Live/final cost is computed from OpenAI response usage (`input`, `cached_input`, `output`) and model pricing rates in `src/pricing.py`.
### Multi-tool calls in one step
- Live: exposed in `GET /api/jobs/{job_id}` during execution
- Final: persisted in SQLite and returned in status output
The agent supports multiple tool calls in a single model response and executes them in order.
Example sequence in one step:
## Persistence
1. `click(...)`
2. `sleep({"seconds": 1.5})`
- SQLite DB: `screenjob.db`
- Runs/artifacts: `screenjob_runs/run_YYYYMMDD_HHMMSS/...`
- Full event log per job (for history and UI)
You can also use `click(..., sleep_after_seconds=1.5)` for a one-call variant.
## Output
Each run creates:
- `screenjob_runs/run_YYYYMMDD_HHMMSS/logs/screenjob.log`
- `screenjob_runs/run_YYYYMMDD_HHMMSS/screens/*.png`
- `screenjob_runs/run_YYYYMMDD_HHMMSS/enhanced/*.png`
Final stdout is JSON:
```json
{
"completed": true,
"result": "...",
"steps": 13,
"elapsed_seconds": 59.691,
"artifacts_dir": "C:\\...\\screenjob_runs\\run_..."
}
```
## Project layout
## Project Layout
```text
main.py
screenjob.py
src/
__init__.py
cli.py
agent.py
app_main.py
cli.py
config.py
models.py
utils.py
pricing.py
runtime.py
safety.py
server.py
storage.py
task_manager.py
ui.py
```