ScreenJob

Single-file behavior, split into maintainable modules under src/.

Entry point

  • Primary: python main.py "<task>"
  • Backward compatible: python screenjob.py "<task>"

Install

pip install openai pillow pyautogui python-dotenv

Configure

Create a .env file in project root:

OPENAI_API_KEY=your_key_here

Usage

python main.py "Open amazon.de and go to my orders"

Optional flags:

python main.py "Open amazon.de" --model gpt-5.2 --max-steps 80

Tools exposed to the model

  • execute_command(command)
  • sleep(seconds) (replaces shell-based sleep calls)
  • see_screen()
  • enhance(coordinate)
  • click(coordinate, offset_up/down/left/right, sleep_after_seconds)
  • type(text)
  • press_key(key, repeats=1)
  • task_complete(result)

Offset examples

  • {"coordinate":{"x":1000,"y":500},"offset_up":"2px"}
  • {"coordinate":{"x":1000,"y":500},"offset_right":4}

Multi-tool calls in one step

The agent supports multiple tool calls in a single model response and executes them in order.
Example sequence in one step:

  1. click(...)
  2. sleep({"seconds": 1.5})

You can also use click(..., sleep_after_seconds=1.5) for a one-call variant.

Output

Each run creates:

  • screenjob_runs/run_YYYYMMDD_HHMMSS/logs/screenjob.log
  • screenjob_runs/run_YYYYMMDD_HHMMSS/screens/*.png
  • screenjob_runs/run_YYYYMMDD_HHMMSS/enhanced/*.png

Final stdout is JSON:

{
  "completed": true,
  "result": "...",
  "steps": 13,
  "elapsed_seconds": 59.691,
  "artifacts_dir": "C:\\...\\screenjob_runs\\run_..."
}

Project layout

main.py
screenjob.py
src/
  __init__.py
  cli.py
  agent.py
  models.py
  utils.py
Description
Agents interacting with Agents to make your Computer do things without you
Readme Apache-2.0 360 KiB
Languages
Python 70%
JavaScript 12.3%
PowerShell 11%
C# 3.7%
HTML 2.8%
Other 0.2%