space/screenjob

Go to file

Space-Banane 84b0df520c chore: initialize screenjob project baseline

2026-05-27 17:31:49 +02:00

chore: initialize screenjob project baseline

2026-05-27 17:31:49 +02:00

.gitignore

chore: initialize screenjob project baseline

2026-05-27 17:31:49 +02:00

main.py

chore: initialize screenjob project baseline

2026-05-27 17:31:49 +02:00

README.md

chore: initialize screenjob project baseline

2026-05-27 17:31:49 +02:00

screenjob.py

chore: initialize screenjob project baseline

2026-05-27 17:31:49 +02:00

README.md

ScreenJob

Single-file behavior, split into maintainable modules under src/.

Entry point

Primary: python main.py "<task>"
Backward compatible: python screenjob.py "<task>"

Install

pip install openai pillow pyautogui python-dotenv

Configure

Create a .env file in project root:

OPENAI_API_KEY=your_key_here

Usage

python main.py "Open amazon.de and go to my orders"

Optional flags:

python main.py "Open amazon.de" --model gpt-5.2 --max-steps 80

Tools exposed to the model

execute_command(command)
sleep(seconds) (replaces shell-based sleep calls)
see_screen()
enhance(coordinate)
click(coordinate, offset_up/down/left/right, sleep_after_seconds)
type(text)
press_key(key, repeats=1)
task_complete(result)

Offset examples

{"coordinate":{"x":1000,"y":500},"offset_up":"2px"}
{"coordinate":{"x":1000,"y":500},"offset_right":4}

Multi-tool calls in one step

The agent supports multiple tool calls in a single model response and executes them in order.
Example sequence in one step:

click(...)
sleep({"seconds": 1.5})

You can also use click(..., sleep_after_seconds=1.5) for a one-call variant.

Output

Each run creates:

screenjob_runs/run_YYYYMMDD_HHMMSS/logs/screenjob.log
screenjob_runs/run_YYYYMMDD_HHMMSS/screens/*.png
screenjob_runs/run_YYYYMMDD_HHMMSS/enhanced/*.png

Final stdout is JSON:

{
  "completed": true,
  "result": "...",
  "steps": 13,
  "elapsed_seconds": 59.691,
  "artifacts_dir": "C:\\...\\screenjob_runs\\run_..."
}

Project layout

main.py
screenjob.py
src/
  __init__.py
  cli.py
  agent.py
  models.py
  utils.py

Languages

Python 70%

JavaScript 12.3%

PowerShell 11%

C# 3.7%

HTML 2.8%

Other 0.2%