feat: finalize production cleanup with structured agent responses and project governance
This commit is contained in:
236
README.md
236
README.md
@@ -1,23 +1,66 @@
|
||||
# ScreenJob
|
||||
|
||||
Desktop-and-terminal task agent with:
|
||||
ScreenJob is an autonomous desktop-and-terminal execution service.
|
||||
It lets an LLM use controlled local tools (screen, click, type, shell) to complete GUI-heavy tasks on a real computer.
|
||||
|
||||
- CLI runner
|
||||
- FastAPI job server
|
||||
- SQLite task history
|
||||
- WebSocket-powered monitoring UI
|
||||
- Safety pre-check and per-job tool disable controls
|
||||
- Live/final token and cost estimation
|
||||
## What It Solves
|
||||
|
||||
## Install
|
||||
- Runs agent-driven tasks that require a graphical interface.
|
||||
- Exposes both CLI and HTTP API modes.
|
||||
- Stores job history and events in SQLite.
|
||||
- Streams live monitoring updates over WebSocket.
|
||||
- Returns structured agent output as:
|
||||
- `return`: human-readable completion message
|
||||
- `data`: structured payload (for example command output)
|
||||
|
||||
```powershell
|
||||
pip install openai pillow pyautogui python-dotenv fastapi uvicorn
|
||||
## Core Features
|
||||
|
||||
- Tool-based agent loop (`execute_command`, `see_screen`, `enhance`, `click`, `type`, `press_key`, `sleep`, `task_complete`)
|
||||
- Safety pre-check with override support
|
||||
- Per-job tool disable list
|
||||
- Live/final usage and cost estimates
|
||||
- Read-only Tailwind monitoring UI
|
||||
- Persistent job and event history
|
||||
|
||||
## Project Layout
|
||||
|
||||
```text
|
||||
main.py
|
||||
screenjob.py
|
||||
requirements.txt
|
||||
docker-compose.yml
|
||||
src/
|
||||
agent.py
|
||||
app_main.py
|
||||
cli.py
|
||||
config.py
|
||||
models.py
|
||||
pricing.py
|
||||
runtime.py
|
||||
safety.py
|
||||
server.py
|
||||
storage.py
|
||||
task_manager.py
|
||||
ui.py
|
||||
utils.py
|
||||
tests/
|
||||
test_agent_tools.py
|
||||
test_pricing.py
|
||||
test_server_api.py
|
||||
test_storage.py
|
||||
.gitea/workflows/ci.yml
|
||||
```
|
||||
|
||||
## Environment
|
||||
## Setup
|
||||
|
||||
Create `.env` in project root:
|
||||
1. Install Python 3.11+.
|
||||
2. Install dependencies:
|
||||
|
||||
```powershell
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
3. Create `.env` in project root:
|
||||
|
||||
```env
|
||||
OPENAI_API_KEY=...
|
||||
@@ -31,44 +74,50 @@ SCREENJOB_PORT=8787
|
||||
DISABLE_UI=false
|
||||
```
|
||||
|
||||
## Entry Points
|
||||
## Usage
|
||||
|
||||
- `python main.py run "<job>"`
|
||||
- `python main.py server`
|
||||
- Backward-compatible wrapper: `python screenjob.py "<job>"`
|
||||
|
||||
## CLI Usage
|
||||
### CLI
|
||||
|
||||
```powershell
|
||||
python main.py run "Open amazon.de and go to my orders"
|
||||
```
|
||||
|
||||
Useful flags:
|
||||
CLI JSON output includes both legacy and structured fields:
|
||||
|
||||
- `--model gpt-5.4-mini`
|
||||
- `--disable-tool click --disable-tool type`
|
||||
- `--skip-safety-check`
|
||||
- `--max-steps 80`
|
||||
```json
|
||||
{
|
||||
"completed": true,
|
||||
"result": "Task completed successfully",
|
||||
"response": {
|
||||
"return": "Task completed successfully",
|
||||
"data": "file1.txt\nfile2.txt"
|
||||
},
|
||||
"return": "Task completed successfully",
|
||||
"data": "file1.txt\nfile2.txt"
|
||||
}
|
||||
```
|
||||
|
||||
## HTTP API
|
||||
### Server
|
||||
|
||||
All API routes require token auth using `SCREENJOB_TOKEN`:
|
||||
```powershell
|
||||
python main.py server
|
||||
```
|
||||
|
||||
- `Authorization: Bearer <token>` or
|
||||
- `X-ScreenJob-Token: <token>`
|
||||
- (for browser/image fetch) `?token=<token>` query parameter
|
||||
Auth for all API routes:
|
||||
|
||||
- `Authorization: Bearer <SCREENJOB_TOKEN>`
|
||||
- `X-ScreenJob-Token: <SCREENJOB_TOKEN>`
|
||||
- Query fallback `?token=` (mainly for UI/websocket/artifact fetch)
|
||||
|
||||
### Create Job
|
||||
|
||||
`POST /api/jobs`
|
||||
|
||||
Body:
|
||||
|
||||
```json
|
||||
{
|
||||
"job": "Open amazon.de and go to my orders",
|
||||
"job": "run \"ls -a\" in C:/Users/username/Documents and return output",
|
||||
"model": "gpt-5.4-mini",
|
||||
"disabled_tools": ["click"],
|
||||
"disabled_tools": [],
|
||||
"safety_override": false
|
||||
}
|
||||
```
|
||||
@@ -79,103 +128,68 @@ Response:
|
||||
{ "job_id": "job_..." }
|
||||
```
|
||||
|
||||
### Status / Output
|
||||
### Job Status / History
|
||||
|
||||
- `GET /api/jobs/{job_id}`: full status + output + live/final usage/cost
|
||||
- `GET /api/jobs/{job_id}/status`: status alias
|
||||
- `GET /api/jobs/{job_id}/events`: detailed timeline
|
||||
- `GET /api/jobs/{job_id}/artifact?path=<absolute_path>&token=<token>`: authenticated artifact file fetch for screenshots/enhancements
|
||||
- `GET /api/jobs`: list active + past jobs
|
||||
- `POST /api/jobs/{job_id}/cancel`: graceful cancellation
|
||||
- `GET /api/stats`: aggregate metrics
|
||||
- `GET /api/jobs/{job_id}`
|
||||
- `GET /api/jobs/{job_id}/status`
|
||||
- `GET /api/jobs/{job_id}/events`
|
||||
- `GET /api/jobs`
|
||||
- `POST /api/jobs/{job_id}/cancel`
|
||||
- `GET /api/stats`
|
||||
|
||||
## Monitoring UI
|
||||
Each job payload includes:
|
||||
|
||||
- Served at `/` when `DISABLE_UI=false`
|
||||
- Tailwind-based read-only dashboard
|
||||
- Requires entering `SCREENJOB_TOKEN` in UI before data loads
|
||||
- Uses WebSocket `/ws` for live updates (tool calls, step events, usage/cost updates)
|
||||
- No task launch controls in UI (monitoring only)
|
||||
- `result` (compat string)
|
||||
- `response.return`
|
||||
- `response.data`
|
||||
- top-level `return` and `data` aliases
|
||||
|
||||
If `DISABLE_UI=true`, `/` returns `{ "ui_disabled": true }` and only API endpoints remain.
|
||||
### Monitoring UI
|
||||
|
||||
## Safety
|
||||
- URL: `/`
|
||||
- Read-only dashboard (no run controls)
|
||||
- Requires token input
|
||||
- Live updates via `/ws`
|
||||
- Set `DISABLE_UI=true` to disable UI
|
||||
|
||||
Before execution, each task is classified by a model safety gate:
|
||||
## Agent Instructions (Practical)
|
||||
|
||||
- Safe: task runs
|
||||
- Unsafe: task is rejected and recorded
|
||||
- Override: set `safety_override=true` (or `--skip-safety-check` in CLI)
|
||||
- Prefer `execute_command` for deterministic actions (opening URLs, filesystem checks).
|
||||
- Use `see_screen` before UI interaction.
|
||||
- Use `enhance` when text is unclear.
|
||||
- Use `press_key` for non-text keys (Enter, Tab, arrows, Escape).
|
||||
- Use `click` offsets via `offset_up/down/left/right` and optional `sleep_after_seconds`.
|
||||
- When done, call:
|
||||
- `task_complete(return="...", data=...)`
|
||||
|
||||
## Tool Controls
|
||||
`data` should contain useful structured output for the requester (text, object, list, etc.).
|
||||
|
||||
Per-job tool allowlisting via disable list:
|
||||
## Docker Compose
|
||||
|
||||
- API: `disabled_tools: ["type", "click"]`
|
||||
- CLI: `--disable-tool type --disable-tool click`
|
||||
Run server in container:
|
||||
|
||||
Available tools:
|
||||
|
||||
- `execute_command(command)`
|
||||
- `sleep(seconds)`
|
||||
- `see_screen()`
|
||||
- `enhance(coordinate)`
|
||||
- `click(coordinate, offset_up/down/left/right, sleep_after_seconds)`
|
||||
- `type(text)`
|
||||
- `press_key(key, repeats=1)`
|
||||
- `task_complete(result)`
|
||||
|
||||
## Cost Estimation
|
||||
|
||||
Live/final cost is computed from OpenAI response usage (`input`, `cached_input`, `output`) and model pricing rates in `src/pricing.py`.
|
||||
|
||||
- Live: exposed in `GET /api/jobs/{job_id}` during execution
|
||||
- Final: persisted in SQLite and returned in status output
|
||||
|
||||
## Persistence
|
||||
|
||||
- SQLite DB: `screenjob.db`
|
||||
- Runs/artifacts: `screenjob_runs/run_YYYYMMDD_HHMMSS/...`
|
||||
- Full event log per job (for history and UI)
|
||||
|
||||
## Project Layout
|
||||
|
||||
```text
|
||||
main.py
|
||||
screenjob.py
|
||||
src/
|
||||
__init__.py
|
||||
agent.py
|
||||
app_main.py
|
||||
cli.py
|
||||
config.py
|
||||
models.py
|
||||
pricing.py
|
||||
runtime.py
|
||||
safety.py
|
||||
server.py
|
||||
storage.py
|
||||
task_manager.py
|
||||
ui.py
|
||||
tests/
|
||||
conftest.py
|
||||
test_pricing.py
|
||||
test_server_api.py
|
||||
test_storage.py
|
||||
.gitea/
|
||||
workflows/
|
||||
ci.yml
|
||||
```powershell
|
||||
docker compose up --build
|
||||
```
|
||||
|
||||
Service uses official Python image and reads `.env`.
|
||||
|
||||
## Verification
|
||||
|
||||
Run local verification:
|
||||
Local:
|
||||
|
||||
```powershell
|
||||
pytest -q
|
||||
```
|
||||
|
||||
Gitea CI pipeline:
|
||||
CI:
|
||||
|
||||
- File: `.gitea/workflows/ci.yml`
|
||||
- Runs compile checks + pytest on push and PR.
|
||||
- `.gitea/workflows/ci.yml` runs compile checks + tests on push/PR.
|
||||
|
||||
## Compatibility Entry Point
|
||||
|
||||
- `python screenjob.py "<job>"` remains supported as a wrapper to `main.py`.
|
||||
|
||||
## License
|
||||
|
||||
Apache License 2.0. See `LICENSE`.
|
||||
|
||||
Reference in New Issue
Block a user