chore: initialize screenjob project baseline

2026-05-27 17:31:49 +02:00
commit 84b0df520c
9 changed files with 1045 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,94 @@
+# ScreenJob
+
+Single-file behavior, split into maintainable modules under `src/`.
+
+## Entry point
+
+- Primary: `python main.py "<task>"`
+- Backward compatible: `python screenjob.py "<task>"`
+
+## Install
+
+```powershell
+pip install openai pillow pyautogui python-dotenv
+```
+
+## Configure
+
+Create a `.env` file in project root:
+
+```env
+OPENAI_API_KEY=your_key_here
+```
+
+## Usage
+
+```powershell
+python main.py "Open amazon.de and go to my orders"
+```
+
+Optional flags:
+
+```powershell
+python main.py "Open amazon.de" --model gpt-5.2 --max-steps 80
+```
+
+## Tools exposed to the model
+
+- `execute_command(command)`
+- `sleep(seconds)` (replaces shell-based sleep calls)
+- `see_screen()`
+- `enhance(coordinate)`
+- `click(coordinate, offset_up/down/left/right, sleep_after_seconds)`
+- `type(text)`
+- `press_key(key, repeats=1)`
+- `task_complete(result)`
+
+### Offset examples
+
+- `{"coordinate":{"x":1000,"y":500},"offset_up":"2px"}`
+- `{"coordinate":{"x":1000,"y":500},"offset_right":4}`
+
+### Multi-tool calls in one step
+
+The agent supports multiple tool calls in a single model response and executes them in order.  
+Example sequence in one step:
+
+1. `click(...)`
+2. `sleep({"seconds": 1.5})`
+
+You can also use `click(..., sleep_after_seconds=1.5)` for a one-call variant.
+
+## Output
+
+Each run creates:
+
+- `screenjob_runs/run_YYYYMMDD_HHMMSS/logs/screenjob.log`
+- `screenjob_runs/run_YYYYMMDD_HHMMSS/screens/*.png`
+- `screenjob_runs/run_YYYYMMDD_HHMMSS/enhanced/*.png`
+
+Final stdout is JSON:
+
+```json
+{
+  "completed": true,
+  "result": "...",
+  "steps": 13,
+  "elapsed_seconds": 59.691,
+  "artifacts_dir": "C:\\...\\screenjob_runs\\run_..."
+}
+```
+
+## Project layout
+
+```text
+main.py
+screenjob.py
+src/
+  __init__.py
+  cli.py
+  agent.py
+  models.py
+  utils.py
+```
+