# ScreenJob Single-file behavior, split into maintainable modules under `src/`. ## Entry point - Primary: `python main.py ""` - Backward compatible: `python screenjob.py ""` ## Install ```powershell pip install openai pillow pyautogui python-dotenv ``` ## Configure Create a `.env` file in project root: ```env OPENAI_API_KEY=your_key_here ``` ## Usage ```powershell python main.py "Open amazon.de and go to my orders" ``` Optional flags: ```powershell python main.py "Open amazon.de" --model gpt-5.2 --max-steps 80 ``` ## Tools exposed to the model - `execute_command(command)` - `sleep(seconds)` (replaces shell-based sleep calls) - `see_screen()` - `enhance(coordinate)` - `click(coordinate, offset_up/down/left/right, sleep_after_seconds)` - `type(text)` - `press_key(key, repeats=1)` - `task_complete(result)` ### Offset examples - `{"coordinate":{"x":1000,"y":500},"offset_up":"2px"}` - `{"coordinate":{"x":1000,"y":500},"offset_right":4}` ### Multi-tool calls in one step The agent supports multiple tool calls in a single model response and executes them in order. Example sequence in one step: 1. `click(...)` 2. `sleep({"seconds": 1.5})` You can also use `click(..., sleep_after_seconds=1.5)` for a one-call variant. ## Output Each run creates: - `screenjob_runs/run_YYYYMMDD_HHMMSS/logs/screenjob.log` - `screenjob_runs/run_YYYYMMDD_HHMMSS/screens/*.png` - `screenjob_runs/run_YYYYMMDD_HHMMSS/enhanced/*.png` Final stdout is JSON: ```json { "completed": true, "result": "...", "steps": 13, "elapsed_seconds": 59.691, "artifacts_dir": "C:\\...\\screenjob_runs\\run_..." } ``` ## Project layout ```text main.py screenjob.py src/ __init__.py cli.py agent.py models.py utils.py ```