1.7 KiB
1.7 KiB
ScreenJob
Single-file behavior, split into maintainable modules under src/.
Entry point
- Primary:
python main.py "<task>" - Backward compatible:
python screenjob.py "<task>"
Install
pip install openai pillow pyautogui python-dotenv
Configure
Create a .env file in project root:
OPENAI_API_KEY=your_key_here
Usage
python main.py "Open amazon.de and go to my orders"
Optional flags:
python main.py "Open amazon.de" --model gpt-5.2 --max-steps 80
Tools exposed to the model
execute_command(command)sleep(seconds)(replaces shell-based sleep calls)see_screen()enhance(coordinate)click(coordinate, offset_up/down/left/right, sleep_after_seconds)type(text)press_key(key, repeats=1)task_complete(result)
Offset examples
{"coordinate":{"x":1000,"y":500},"offset_up":"2px"}{"coordinate":{"x":1000,"y":500},"offset_right":4}
Multi-tool calls in one step
The agent supports multiple tool calls in a single model response and executes them in order.
Example sequence in one step:
click(...)sleep({"seconds": 1.5})
You can also use click(..., sleep_after_seconds=1.5) for a one-call variant.
Output
Each run creates:
screenjob_runs/run_YYYYMMDD_HHMMSS/logs/screenjob.logscreenjob_runs/run_YYYYMMDD_HHMMSS/screens/*.pngscreenjob_runs/run_YYYYMMDD_HHMMSS/enhanced/*.png
Final stdout is JSON:
{
"completed": true,
"result": "...",
"steps": 13,
"elapsed_seconds": 59.691,
"artifacts_dir": "C:\\...\\screenjob_runs\\run_..."
}
Project layout
main.py
screenjob.py
src/
__init__.py
cli.py
agent.py
models.py
utils.py