feat: add authenticated artifact streaming and UI visual previews

This commit is contained in:
Space-Banane
2026-05-27 17:50:21 +02:00
parent 10355bf11a
commit 8fe6ad2d75
6 changed files with 184 additions and 57 deletions

2
.gitignore vendored
View File

@@ -15,8 +15,8 @@ env/
# Runtime artifacts # Runtime artifacts
screenjob_runs/ screenjob_runs/
result.json result.json
screenjob.db
# IDE # IDE
.vscode/ .vscode/
.idea/ .idea/

166
README.md
View File

@@ -1,42 +1,123 @@
# ScreenJob # ScreenJob
Single-file behavior, split into maintainable modules under `src/`. Desktop-and-terminal task agent with:
## Entry point - CLI runner
- FastAPI job server
- Primary: `python main.py "<task>"` - SQLite task history
- Backward compatible: `python screenjob.py "<task>"` - WebSocket-powered monitoring UI
- Safety pre-check and per-job tool disable controls
- Live/final token and cost estimation
## Install ## Install
```powershell ```powershell
pip install openai pillow pyautogui python-dotenv pip install openai pillow pyautogui python-dotenv fastapi uvicorn
``` ```
## Configure ## Environment
Create a `.env` file in project root: Create `.env` in project root:
```env ```env
OPENAI_API_KEY=your_key_here OPENAI_API_KEY=...
SCREENJOB_TOKEN=choose_a_strong_token
# Optional
SCREENJOB_DEFAULT_MODEL=gpt-5.4-mini
SCREENJOB_SAFETY_MODEL=gpt-5.4-mini
SCREENJOB_HOST=127.0.0.1
SCREENJOB_PORT=8787
DISABLE_UI=false
``` ```
## Usage ## Entry Points
- `python main.py run "<job>"`
- `python main.py server`
- Backward-compatible wrapper: `python screenjob.py "<job>"`
## CLI Usage
```powershell ```powershell
python main.py "Open amazon.de and go to my orders" python main.py run "Open amazon.de and go to my orders"
``` ```
Optional flags: Useful flags:
```powershell - `--model gpt-5.4-mini`
python main.py "Open amazon.de" --model gpt-5.2 --max-steps 80 - `--disable-tool click --disable-tool type`
- `--skip-safety-check`
- `--max-steps 80`
## HTTP API
All API routes require token auth using `SCREENJOB_TOKEN`:
- `Authorization: Bearer <token>` or
- `X-ScreenJob-Token: <token>`
- (for browser/image fetch) `?token=<token>` query parameter
### Create Job
`POST /api/jobs`
Body:
```json
{
"job": "Open amazon.de and go to my orders",
"model": "gpt-5.4-mini",
"disabled_tools": ["click"],
"safety_override": false
}
``` ```
## Tools exposed to the model Response:
```json
{ "job_id": "job_..." }
```
### Status / Output
- `GET /api/jobs/{job_id}`: full status + output + live/final usage/cost
- `GET /api/jobs/{job_id}/status`: status alias
- `GET /api/jobs/{job_id}/events`: detailed timeline
- `GET /api/jobs/{job_id}/artifact?path=<absolute_path>&token=<token>`: authenticated artifact file fetch for screenshots/enhancements
- `GET /api/jobs`: list active + past jobs
- `POST /api/jobs/{job_id}/cancel`: graceful cancellation
- `GET /api/stats`: aggregate metrics
## Monitoring UI
- Served at `/` when `DISABLE_UI=false`
- Tailwind-based read-only dashboard
- Requires entering `SCREENJOB_TOKEN` in UI before data loads
- Uses WebSocket `/ws` for live updates (tool calls, step events, usage/cost updates)
- No task launch controls in UI (monitoring only)
If `DISABLE_UI=true`, `/` returns `{ "ui_disabled": true }` and only API endpoints remain.
## Safety
Before execution, each task is classified by a model safety gate:
- Safe: task runs
- Unsafe: task is rejected and recorded
- Override: set `safety_override=true` (or `--skip-safety-check` in CLI)
## Tool Controls
Per-job tool allowlisting via disable list:
- API: `disabled_tools: ["type", "click"]`
- CLI: `--disable-tool type --disable-tool click`
Available tools:
- `execute_command(command)` - `execute_command(command)`
- `sleep(seconds)` (replaces shell-based sleep calls) - `sleep(seconds)`
- `see_screen()` - `see_screen()`
- `enhance(coordinate)` - `enhance(coordinate)`
- `click(coordinate, offset_up/down/left/right, sleep_after_seconds)` - `click(coordinate, offset_up/down/left/right, sleep_after_seconds)`
@@ -44,51 +125,36 @@ python main.py "Open amazon.de" --model gpt-5.2 --max-steps 80
- `press_key(key, repeats=1)` - `press_key(key, repeats=1)`
- `task_complete(result)` - `task_complete(result)`
### Offset examples ## Cost Estimation
- `{"coordinate":{"x":1000,"y":500},"offset_up":"2px"}` Live/final cost is computed from OpenAI response usage (`input`, `cached_input`, `output`) and model pricing rates in `src/pricing.py`.
- `{"coordinate":{"x":1000,"y":500},"offset_right":4}`
### Multi-tool calls in one step - Live: exposed in `GET /api/jobs/{job_id}` during execution
- Final: persisted in SQLite and returned in status output
The agent supports multiple tool calls in a single model response and executes them in order. ## Persistence
Example sequence in one step:
1. `click(...)` - SQLite DB: `screenjob.db`
2. `sleep({"seconds": 1.5})` - Runs/artifacts: `screenjob_runs/run_YYYYMMDD_HHMMSS/...`
- Full event log per job (for history and UI)
You can also use `click(..., sleep_after_seconds=1.5)` for a one-call variant. ## Project Layout
## Output
Each run creates:
- `screenjob_runs/run_YYYYMMDD_HHMMSS/logs/screenjob.log`
- `screenjob_runs/run_YYYYMMDD_HHMMSS/screens/*.png`
- `screenjob_runs/run_YYYYMMDD_HHMMSS/enhanced/*.png`
Final stdout is JSON:
```json
{
"completed": true,
"result": "...",
"steps": 13,
"elapsed_seconds": 59.691,
"artifacts_dir": "C:\\...\\screenjob_runs\\run_..."
}
```
## Project layout
```text ```text
main.py main.py
screenjob.py screenjob.py
src/ src/
__init__.py __init__.py
cli.py
agent.py agent.py
app_main.py
cli.py
config.py
models.py models.py
utils.py pricing.py
runtime.py
safety.py
server.py
storage.py
task_manager.py
ui.py
``` ```

View File

@@ -1 +1 @@
# Root package marker for local imports like: from src.cli import main # Root package marker for local imports.

View File

@@ -6,6 +6,7 @@ from pathlib import Path
from typing import Any from typing import Any
from fastapi import Depends, FastAPI, Header, HTTPException, Query, WebSocket, WebSocketDisconnect from fastapi import Depends, FastAPI, Header, HTTPException, Query, WebSocket, WebSocketDisconnect
from fastapi.responses import FileResponse
from fastapi.responses import HTMLResponse, JSONResponse from fastapi.responses import HTMLResponse, JSONResponse
from pydantic import BaseModel, Field from pydantic import BaseModel, Field
@@ -86,7 +87,13 @@ def create_app(config: AppConfig | None = None) -> FastAPI:
async def _on_startup() -> None: async def _on_startup() -> None:
ws_hub.set_loop(asyncio.get_running_loop()) ws_hub.set_loop(asyncio.get_running_loop())
def _extract_token(authorization: str | None, x_screenjob_token: str | None) -> str: def _extract_token(
authorization: str | None,
x_screenjob_token: str | None,
query_token: str | None,
) -> str:
if query_token:
return query_token.strip()
if x_screenjob_token: if x_screenjob_token:
return x_screenjob_token.strip() return x_screenjob_token.strip()
if authorization: if authorization:
@@ -99,9 +106,10 @@ def create_app(config: AppConfig | None = None) -> FastAPI:
def require_token( def require_token(
authorization: str | None = Header(default=None), authorization: str | None = Header(default=None),
x_screenjob_token: str | None = Header(default=None), x_screenjob_token: str | None = Header(default=None),
token: str | None = Query(default=None),
) -> None: ) -> None:
token = _extract_token(authorization, x_screenjob_token) resolved = _extract_token(authorization, x_screenjob_token, token)
if not token or not secrets.compare_digest(token, app_config.screenjob_token): if not resolved or not secrets.compare_digest(resolved, app_config.screenjob_token):
raise HTTPException(status_code=401, detail="Unauthorized") raise HTTPException(status_code=401, detail="Unauthorized")
@app.post("/api/jobs") @app.post("/api/jobs")
@@ -130,6 +138,13 @@ def create_app(config: AppConfig | None = None) -> FastAPI:
raise HTTPException(status_code=404, detail="Job not found") raise HTTPException(status_code=404, detail="Job not found")
return job return job
@app.get("/api/jobs/{job_id}/status")
def get_job_status(job_id: str, _: None = Depends(require_token)) -> dict[str, Any]:
job = manager.get_job(job_id)
if job is None:
raise HTTPException(status_code=404, detail="Job not found")
return job
@app.get("/api/jobs/{job_id}/events") @app.get("/api/jobs/{job_id}/events")
def get_job_events( def get_job_events(
job_id: str, job_id: str,
@@ -149,6 +164,28 @@ def create_app(config: AppConfig | None = None) -> FastAPI:
accepted = manager.cancel_job(job_id) accepted = manager.cancel_job(job_id)
return {"job_id": job_id, "cancel_requested": bool(accepted)} return {"job_id": job_id, "cancel_requested": bool(accepted)}
@app.get("/api/jobs/{job_id}/artifact")
def get_job_artifact(
job_id: str,
path: str = Query(..., min_length=1),
_: None = Depends(require_token),
) -> FileResponse:
job = manager.get_job(job_id)
if job is None:
raise HTTPException(status_code=404, detail="Job not found")
artifacts_dir_raw = str(job.get("artifacts_dir") or "").strip()
if not artifacts_dir_raw:
raise HTTPException(status_code=404, detail="Artifacts not available yet")
artifacts_dir = Path(artifacts_dir_raw).resolve()
requested = Path(path).resolve()
try:
requested.relative_to(artifacts_dir)
except ValueError as exc:
raise HTTPException(status_code=400, detail="Artifact path is outside job artifacts directory") from exc
if not requested.exists() or not requested.is_file():
raise HTTPException(status_code=404, detail="Artifact not found")
return FileResponse(str(requested))
@app.get("/api/stats") @app.get("/api/stats")
def stats(_: None = Depends(require_token)) -> dict[str, Any]: def stats(_: None = Depends(require_token)) -> dict[str, Any]:
return manager.stats() return manager.stats()

View File

@@ -191,6 +191,13 @@ class JobManager:
def on_event(event: dict[str, Any]) -> None: def on_event(event: dict[str, Any]) -> None:
self._publish(job_id, event) self._publish(job_id, event)
if event.get("event_type") == "job_started":
run_id = str(((event.get("payload") or {}).get("run_id") or "")).strip()
if run_id:
self.db.update_job(
job_id,
artifacts_dir=str((self.config.runs_dir / f"run_{run_id}").resolve()),
)
if event.get("event_type") == "usage_update": if event.get("event_type") == "usage_update":
usage = (event.get("payload") or {}).get("usage") or {} usage = (event.get("payload") or {}).get("usage") or {}
self.db.update_job( self.db.update_job(

View File

@@ -37,6 +37,10 @@ def monitoring_page_html() -> str:
<div class="lg:col-span-3 bg-slate-900/70 border border-slate-800 rounded-xl p-4 space-y-3"> <div class="lg:col-span-3 bg-slate-900/70 border border-slate-800 rounded-xl p-4 space-y-3">
<h2 class="font-semibold">Job Detail</h2> <h2 class="font-semibold">Job Detail</h2>
<pre id="jobDetail" class="bg-slate-950 border border-slate-800 rounded p-3 text-xs overflow-auto max-h-[24vh]"></pre> <pre id="jobDetail" class="bg-slate-950 border border-slate-800 rounded p-3 text-xs overflow-auto max-h-[24vh]"></pre>
<h3 class="font-semibold text-sm">Latest Visual</h3>
<div class="bg-slate-950 border border-slate-800 rounded p-2">
<img id="latestVisual" alt="Latest visual update" class="max-h-[24vh] w-full object-contain rounded" />
</div>
<h3 class="font-semibold text-sm">Live Events</h3> <h3 class="font-semibold text-sm">Live Events</h3>
<div id="events" class="bg-slate-950 border border-slate-800 rounded p-3 text-xs overflow-auto max-h-[36vh] space-y-1"></div> <div id="events" class="bg-slate-950 border border-slate-800 rounded p-3 text-xs overflow-auto max-h-[36vh] space-y-1"></div>
</div> </div>
@@ -51,6 +55,7 @@ def monitoring_page_html() -> str:
const jobDetailEl = document.getElementById("jobDetail"); const jobDetailEl = document.getElementById("jobDetail");
const eventsEl = document.getElementById("events"); const eventsEl = document.getElementById("events");
const statsEl = document.getElementById("stats"); const statsEl = document.getElementById("stats");
const latestVisualEl = document.getElementById("latestVisual");
const state = { const state = {
token: localStorage.getItem("screenjob_token") || "", token: localStorage.getItem("screenjob_token") || "",
@@ -123,6 +128,15 @@ def monitoring_page_html() -> str:
} }
} }
function updateLatestVisualFromEvent(ev) {
if (!ev || ev.event_type !== "visual_update") return;
if (!state.selectedJobId || ev.job_id !== state.selectedJobId) return;
const imagePath = ev.payload && ev.payload.image_meta && ev.payload.image_meta.path;
if (!imagePath) return;
const q = encodeURIComponent(imagePath);
latestVisualEl.src = `/api/jobs/${state.selectedJobId}/artifact?path=${q}&token=${encodeURIComponent(state.token)}`;
}
async function refreshJobs() { async function refreshJobs() {
const payload = await api("/api/jobs?limit=100"); const payload = await api("/api/jobs?limit=100");
state.jobs = payload.jobs || []; state.jobs = payload.jobs || [];
@@ -143,7 +157,10 @@ def monitoring_page_html() -> str:
]); ]);
jobDetailEl.textContent = JSON.stringify(job, null, 2); jobDetailEl.textContent = JSON.stringify(job, null, 2);
eventsEl.innerHTML = ""; eventsEl.innerHTML = "";
for (const ev of (events.events || []).slice().reverse()) pushEventLine(ev); const list = (events.events || []).slice().reverse();
for (const ev of list) pushEventLine(ev);
const visual = list.find((ev) => ev.event_type === "visual_update");
if (visual) updateLatestVisualFromEvent(visual);
} }
function connectWs() { function connectWs() {
@@ -158,6 +175,7 @@ def monitoring_page_html() -> str:
try { try {
const payload = JSON.parse(event.data); const payload = JSON.parse(event.data);
pushEventLine(payload); pushEventLine(payload);
updateLatestVisualFromEvent(payload);
if (!state.selectedJobId || payload.job_id === state.selectedJobId) { if (!state.selectedJobId || payload.job_id === state.selectedJobId) {
await refreshJobDetail(); await refreshJobDetail();
} }
@@ -190,4 +208,3 @@ def monitoring_page_html() -> str:
</body> </body>
</html> </html>
""" """