feat: add authenticated artifact streaming and UI visual previews
This commit is contained in:
166
README.md
166
README.md
@@ -1,42 +1,123 @@
|
||||
# ScreenJob
|
||||
|
||||
Single-file behavior, split into maintainable modules under `src/`.
|
||||
Desktop-and-terminal task agent with:
|
||||
|
||||
## Entry point
|
||||
|
||||
- Primary: `python main.py "<task>"`
|
||||
- Backward compatible: `python screenjob.py "<task>"`
|
||||
- CLI runner
|
||||
- FastAPI job server
|
||||
- SQLite task history
|
||||
- WebSocket-powered monitoring UI
|
||||
- Safety pre-check and per-job tool disable controls
|
||||
- Live/final token and cost estimation
|
||||
|
||||
## Install
|
||||
|
||||
```powershell
|
||||
pip install openai pillow pyautogui python-dotenv
|
||||
pip install openai pillow pyautogui python-dotenv fastapi uvicorn
|
||||
```
|
||||
|
||||
## Configure
|
||||
## Environment
|
||||
|
||||
Create a `.env` file in project root:
|
||||
Create `.env` in project root:
|
||||
|
||||
```env
|
||||
OPENAI_API_KEY=your_key_here
|
||||
OPENAI_API_KEY=...
|
||||
SCREENJOB_TOKEN=choose_a_strong_token
|
||||
|
||||
# Optional
|
||||
SCREENJOB_DEFAULT_MODEL=gpt-5.4-mini
|
||||
SCREENJOB_SAFETY_MODEL=gpt-5.4-mini
|
||||
SCREENJOB_HOST=127.0.0.1
|
||||
SCREENJOB_PORT=8787
|
||||
DISABLE_UI=false
|
||||
```
|
||||
|
||||
## Usage
|
||||
## Entry Points
|
||||
|
||||
- `python main.py run "<job>"`
|
||||
- `python main.py server`
|
||||
- Backward-compatible wrapper: `python screenjob.py "<job>"`
|
||||
|
||||
## CLI Usage
|
||||
|
||||
```powershell
|
||||
python main.py "Open amazon.de and go to my orders"
|
||||
python main.py run "Open amazon.de and go to my orders"
|
||||
```
|
||||
|
||||
Optional flags:
|
||||
Useful flags:
|
||||
|
||||
```powershell
|
||||
python main.py "Open amazon.de" --model gpt-5.2 --max-steps 80
|
||||
- `--model gpt-5.4-mini`
|
||||
- `--disable-tool click --disable-tool type`
|
||||
- `--skip-safety-check`
|
||||
- `--max-steps 80`
|
||||
|
||||
## HTTP API
|
||||
|
||||
All API routes require token auth using `SCREENJOB_TOKEN`:
|
||||
|
||||
- `Authorization: Bearer <token>` or
|
||||
- `X-ScreenJob-Token: <token>`
|
||||
- (for browser/image fetch) `?token=<token>` query parameter
|
||||
|
||||
### Create Job
|
||||
|
||||
`POST /api/jobs`
|
||||
|
||||
Body:
|
||||
|
||||
```json
|
||||
{
|
||||
"job": "Open amazon.de and go to my orders",
|
||||
"model": "gpt-5.4-mini",
|
||||
"disabled_tools": ["click"],
|
||||
"safety_override": false
|
||||
}
|
||||
```
|
||||
|
||||
## Tools exposed to the model
|
||||
Response:
|
||||
|
||||
```json
|
||||
{ "job_id": "job_..." }
|
||||
```
|
||||
|
||||
### Status / Output
|
||||
|
||||
- `GET /api/jobs/{job_id}`: full status + output + live/final usage/cost
|
||||
- `GET /api/jobs/{job_id}/status`: status alias
|
||||
- `GET /api/jobs/{job_id}/events`: detailed timeline
|
||||
- `GET /api/jobs/{job_id}/artifact?path=<absolute_path>&token=<token>`: authenticated artifact file fetch for screenshots/enhancements
|
||||
- `GET /api/jobs`: list active + past jobs
|
||||
- `POST /api/jobs/{job_id}/cancel`: graceful cancellation
|
||||
- `GET /api/stats`: aggregate metrics
|
||||
|
||||
## Monitoring UI
|
||||
|
||||
- Served at `/` when `DISABLE_UI=false`
|
||||
- Tailwind-based read-only dashboard
|
||||
- Requires entering `SCREENJOB_TOKEN` in UI before data loads
|
||||
- Uses WebSocket `/ws` for live updates (tool calls, step events, usage/cost updates)
|
||||
- No task launch controls in UI (monitoring only)
|
||||
|
||||
If `DISABLE_UI=true`, `/` returns `{ "ui_disabled": true }` and only API endpoints remain.
|
||||
|
||||
## Safety
|
||||
|
||||
Before execution, each task is classified by a model safety gate:
|
||||
|
||||
- Safe: task runs
|
||||
- Unsafe: task is rejected and recorded
|
||||
- Override: set `safety_override=true` (or `--skip-safety-check` in CLI)
|
||||
|
||||
## Tool Controls
|
||||
|
||||
Per-job tool allowlisting via disable list:
|
||||
|
||||
- API: `disabled_tools: ["type", "click"]`
|
||||
- CLI: `--disable-tool type --disable-tool click`
|
||||
|
||||
Available tools:
|
||||
|
||||
- `execute_command(command)`
|
||||
- `sleep(seconds)` (replaces shell-based sleep calls)
|
||||
- `sleep(seconds)`
|
||||
- `see_screen()`
|
||||
- `enhance(coordinate)`
|
||||
- `click(coordinate, offset_up/down/left/right, sleep_after_seconds)`
|
||||
@@ -44,51 +125,36 @@ python main.py "Open amazon.de" --model gpt-5.2 --max-steps 80
|
||||
- `press_key(key, repeats=1)`
|
||||
- `task_complete(result)`
|
||||
|
||||
### Offset examples
|
||||
## Cost Estimation
|
||||
|
||||
- `{"coordinate":{"x":1000,"y":500},"offset_up":"2px"}`
|
||||
- `{"coordinate":{"x":1000,"y":500},"offset_right":4}`
|
||||
Live/final cost is computed from OpenAI response usage (`input`, `cached_input`, `output`) and model pricing rates in `src/pricing.py`.
|
||||
|
||||
### Multi-tool calls in one step
|
||||
- Live: exposed in `GET /api/jobs/{job_id}` during execution
|
||||
- Final: persisted in SQLite and returned in status output
|
||||
|
||||
The agent supports multiple tool calls in a single model response and executes them in order.
|
||||
Example sequence in one step:
|
||||
## Persistence
|
||||
|
||||
1. `click(...)`
|
||||
2. `sleep({"seconds": 1.5})`
|
||||
- SQLite DB: `screenjob.db`
|
||||
- Runs/artifacts: `screenjob_runs/run_YYYYMMDD_HHMMSS/...`
|
||||
- Full event log per job (for history and UI)
|
||||
|
||||
You can also use `click(..., sleep_after_seconds=1.5)` for a one-call variant.
|
||||
|
||||
## Output
|
||||
|
||||
Each run creates:
|
||||
|
||||
- `screenjob_runs/run_YYYYMMDD_HHMMSS/logs/screenjob.log`
|
||||
- `screenjob_runs/run_YYYYMMDD_HHMMSS/screens/*.png`
|
||||
- `screenjob_runs/run_YYYYMMDD_HHMMSS/enhanced/*.png`
|
||||
|
||||
Final stdout is JSON:
|
||||
|
||||
```json
|
||||
{
|
||||
"completed": true,
|
||||
"result": "...",
|
||||
"steps": 13,
|
||||
"elapsed_seconds": 59.691,
|
||||
"artifacts_dir": "C:\\...\\screenjob_runs\\run_..."
|
||||
}
|
||||
```
|
||||
|
||||
## Project layout
|
||||
## Project Layout
|
||||
|
||||
```text
|
||||
main.py
|
||||
screenjob.py
|
||||
src/
|
||||
__init__.py
|
||||
cli.py
|
||||
agent.py
|
||||
app_main.py
|
||||
cli.py
|
||||
config.py
|
||||
models.py
|
||||
utils.py
|
||||
pricing.py
|
||||
runtime.py
|
||||
safety.py
|
||||
server.py
|
||||
storage.py
|
||||
task_manager.py
|
||||
ui.py
|
||||
```
|
||||
|
||||
|
||||
Reference in New Issue
Block a user