Commit remaining workspace updates

Switch backend startup to interactive session
docs: add patience rule for rerunning jobs
2026-05-31 20:43:36 +02:00 · 2026-05-31 20:43:36 +02:00 · 2026-05-31 18:35:35 +00:00 · 2026-05-28 13:44:31 +02:00 · 2026-05-28 13:30:27 +02:00
28 changed files with 6136 additions and 138 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -20,3 +20,8 @@ screenjob.db
 # IDE
 .vscode/
 .idea/
 # Service host build/publish artifacts
 service_host/**/bin/
 service_host/**/obj/
 service_host/publish/
--- a/README.md
+++ b/README.md
@@ -1,7 +1,7 @@
 # ScreenJob
 ScreenJob is an autonomous desktop-and-terminal execution service.  
-It lets an LLM use controlled local tools (screen, click, type, shell) to complete GUI-heavy tasks on a real computer.
+It lets an LLM use controlled local tools (screen, mouse, keyboard, clipboard, shell) to complete GUI-heavy tasks on a real computer.
 ## What It Solves
@@ -15,7 +15,8 @@ It lets an LLM use controlled local tools (screen, click, type, shell) to comple
 ## Core Features
- Tool-based agent loop (`execute_command`, `see_screen`, `enhance`, `click`, `type`, `press_key`, `sleep`, `task_complete`)
+- Hybrid control model: screenshot grounding plus Windows-native window, dialog, and UI-element helpers when available
 - Tool-based agent loop (`execute_command`, `see_screen`, `enhance`, `list_windows`, `find_window`, `focus_window`, `close_window`, `wait_for_window`, `wait_for_focus_change`, `detect_dialog`, `dialog_action`, `dialog_set_filename`, `wait_for_dialog_close`, `list_ui_elements`, `invoke_ui_element`, `set_ui_element_value`, `select_ui_element`, `wait_for_ui_element`, `click`, `scroll`, `drag`, `move_mouse`, `type`, `press_key`, `clipboard_get`, `clipboard_set`, `get_cursor_position`, `get_active_window`, `sleep`, `task_complete`)
 - Safety pre-check with override support
 - Per-job tool disable list
 - Live/final usage and cost estimates
@@ -109,6 +110,80 @@ Or use the PowerShell launcher:
 .\start_backend.ps1
 ```
 ### Backend Startup
 For screenshot-driven automation, start the backend in the logged-in user session.
 That gives `pyautogui` access to the interactive desktop, which Windows services do not.
 If you previously installed the legacy service, remove it once from an elevated PowerShell session with `.\uninstall_backend_service.ps1`.
 Install a sign-in launcher for the current user:
 ```powershell
 .\install_backend_service.ps1
 ```
 Install it for all users:
 ```powershell
 .\install_backend_service.ps1 -AllUsers
 ```
 Start it immediately after installing:
 ```powershell
 .\install_backend_service.ps1 -StartNow
 ```
 Remove the launcher:
 ```powershell
 .\uninstall_backend_service.ps1
 ```
 The launcher runs `start_backend.ps1` hidden via `start_backend_hidden.vbs`.
 If you need to start the backend manually, run:
 ```powershell
 .\start_backend.ps1
 ```
 The legacy Windows service host remains in the tree for reference, but it is not the recommended path for GUI tasks.
 ### System Tray Icon (Windows)
 Start tray icon now:
 ```powershell
 powershell -NoProfile -ExecutionPolicy Bypass -STA -File .\screenjob_tray.ps1
 ```
 Install startup shortcut (current user):
 ```powershell
 .\install_tray_startup_shortcut.ps1
 ```
 Install startup shortcut for all users:
 ```powershell
 .\install_tray_startup_shortcut.ps1 -AllUsers
 ```
 Remove startup shortcut:
 ```powershell
 .\install_tray_startup_shortcut.ps1 -Remove
 ```
 Tray menu actions:
 - The service controls are for the legacy Windows service host.
 - Refresh service status
 - Start/Stop/Restart service (prompts for admin/UAC)
 - Open dashboard URL from `.env` `SCREENJOB_HOST` / `SCREENJOB_PORT`
 - Open service logs folder
 - Exit tray icon process
 Auth for all API routes:
 - `Authorization: Bearer <SCREENJOB_TOKEN>`
@@ -123,6 +198,11 @@ Auth for all API routes:
 {
  "job": "run \"ls -a\" in C:/Users/username/Documents and return output",
  "model": "gpt-5.4-mini",
  "native_automation_mode": "prefer",
  "dialog_timeout_seconds": 12,
  "focus_timeout_seconds": 8,
  "ui_element_timeout_seconds": 8,
  "max_retries_per_surface": 3,
  "disabled_tools": [],
  "safety_override": false
 }
@@ -167,17 +247,28 @@ Each job payload includes:
 ## Agent Instructions (Practical)
 - Prefer `execute_command` for deterministic actions (opening URLs, filesystem checks).
 - First classify the current Windows surface, then choose the control channel.
 - Prefer native window/dialog/element tools for focus changes, file pickers, modal confirmations, and browser-owned dialogs when available.
 - Use `see_screen` before UI interaction.
 - Use `enhance` before clicking small/ambiguous targets; prefer `region="small"` for compact controls.
 - Use `enhance` `mode="text"` for tiny labels/text, or `mode="ui"` for general UI.
 - Optionally set `enhance` `scale` (2-6) for tighter zoom control.
 - Use `list_windows`, `find_window`, `focus_window`, and `wait_for_focus_change` instead of blind Alt+Tab retries.
 - Use `detect_dialog`, `dialog_set_filename`, `dialog_action`, and `wait_for_dialog_close` for native open/save/confirm flows.
 - Use `list_ui_elements`, `invoke_ui_element`, `set_ui_element_value`, `select_ui_element`, and `wait_for_ui_element` when controls are exposed natively.
 - Use `press_key` for non-text keys (Enter, Tab, arrows, Escape).
 - For shortcuts, use one `press_key` call with combo syntax (example: `win+r`).
- Use `click` offsets via `offset_up/down/left/right` and optional `sleep_after_seconds`.
+- Use `click` offsets via `offset_up/down/left/right`; set `button` and `click_count` there instead of inventing one-off click tools.
 - Use `move_mouse` when you need hover-only behavior and `drag` for slider, selection, or window moves.
 - Use `scroll` for vertical navigation; positive amounts scroll up and negative amounts scroll down.
 - Use `clipboard_get` / `clipboard_set` for copy-paste workflows, `get_cursor_position` for cursor inspection, and `get_active_window` before interacting with uncertain focus.
 - If native automation is unavailable or disabled, ScreenJob falls back to screenshots plus mouse/keyboard control and emits fallback events.
 - When done, call:
  - `task_complete(return="...", data=...)`
 - Before `task_complete`, verify expected on-screen content with `see_screen` (and `enhance` if needed), and include an `observed_result` summary in `data`.
 Per-job `disabled_tools` must match the built-in tool allowlist. `task_complete` cannot be disabled.
 `data` should contain useful structured output for the requester (text, object, list, etc.).
 ## Verification
--- a/SKILL.md
+++ b/SKILL.md
@@ -6,8 +6,10 @@ ScreenJob lets an agent execute tasks that require a real desktop UI plus termin
 ## Main Features
 - Hybrid control model: screenshot grounding plus Windows-native window/dialog/element helpers when available
 - Screen perception (`see_screen`, `enhance`)
 - Mouse/keyboard control (`click`, `type`, `press_key`)
 - Native window/dialog control (`list_windows`, `find_window`, `focus_window`, `detect_dialog`, `dialog_action`, `dialog_set_filename`, `list_ui_elements`)
 - Terminal execution (`execute_command`, `sleep`)
 - Structured completion payload (`task_complete(return=..., data=...)`)
 - Safety gate, auth, history, and live monitoring
@@ -45,12 +47,25 @@ Enhance-first click rule:
 - Optional zoom control: set `scale` from `2` to `6` (defaults are tuned by region).
 - After checking the enhanced image, click using the same target coordinate (or a small directional offset if needed).
 Windows-native routing rule:
 - First classify whether the current surface is a normal app window, browser window, `#32770` dialog, Explorer file picker, or another system surface.
 - Prefer native window/dialog/element tools for focus changes, save/open dialogs, modal confirmations, and exposed controls.
 - Fall back to screenshots plus mouse/keyboard only when native automation is unavailable or the UI is custom-drawn.
 Verification rule:
 - Before `task_complete`, verify actual on-screen content matches the expected outcome.
 - Use `see_screen` (and `enhance` if needed) for this check.
 - Include a concise `observed_result` in `data` when completing the task.
 Patience / rerun rule:
 - If a job is still `running`, do not assume it is stuck just because it looks slow, repetitive, or token-heavy.
 - Prefer waiting longer and checking for a final status/result before starting a replacement run.
 - Only restart or replace a running job when there is clear evidence it is failed, irrecoverably stuck, or the user explicitly asks for a restart.
 - If you do replace a run, say why in one short sentence and reference the specific blocker you observed.
 ## API Quick Reference
 Base URL:
--- a/install_backend_service.ps1
+++ b/install_backend_service.ps1
@@ -0,0 +1,84 @@
 [CmdletBinding(SupportsShouldProcess = $true)]
 param(
    [switch]$Remove,
    [switch]$AllUsers,
    [switch]$StartNow
 )
 Set-StrictMode -Version Latest
 $ErrorActionPreference = "Stop"
 $scriptDir = Split-Path -Parent $PSCommandPath
 $backendScript = Join-Path $scriptDir "start_backend.ps1"
 $vbsLauncher = Join-Path $scriptDir "start_backend_hidden.vbs"
 $shortcutName = "ScreenJob Backend.lnk"
 if (-not (Test-Path -LiteralPath $backendScript)) {
    throw "Backend launcher script not found: $backendScript"
 }
 if (-not (Test-Path -LiteralPath $vbsLauncher)) {
    throw "Hidden backend launcher file not found: $vbsLauncher"
 }
 function Test-IsAdministrator {
    $identity = [Security.Principal.WindowsIdentity]::GetCurrent()
    $principal = New-Object Security.Principal.WindowsPrincipal($identity)
    return $principal.IsInRole([Security.Principal.WindowsBuiltInRole]::Administrator)
 }
 $legacyService = Get-Service -Name "ScreenJobBackend" -ErrorAction SilentlyContinue
 if ($null -ne $legacyService) {
    if (Test-IsAdministrator) {
        if ($PSCmdlet.ShouldProcess("ScreenJobBackend", "Remove legacy Windows service")) {
            if ($legacyService.Status -ne "Stopped") {
                Stop-Service -Name "ScreenJobBackend" -Force -ErrorAction Stop
            }
            & sc.exe delete ScreenJobBackend | Out-Null
            if ($LASTEXITCODE -ne 0) {
                throw "Failed to delete legacy service 'ScreenJobBackend' (sc.exe exit code $LASTEXITCODE)."
            }
            Write-Host "Removed legacy Windows service: ScreenJobBackend"
        }
    } else {
        Write-Warning "Legacy Windows service 'ScreenJobBackend' is still installed. Run uninstall_backend_service.ps1 from an elevated PowerShell session once to remove it."
    }
 }
 $startupFolder = if ($AllUsers) {
    [Environment]::GetFolderPath("CommonStartup")
 } else {
    [Environment]::GetFolderPath("Startup")
 }
 $shortcutPath = Join-Path $startupFolder $shortcutName
 if ($Remove) {
    if (Test-Path -LiteralPath $shortcutPath) {
        if ($PSCmdlet.ShouldProcess($shortcutPath, "Remove backend startup shortcut")) {
            Remove-Item -LiteralPath $shortcutPath -Force
            Write-Host "Removed backend startup shortcut: $shortcutPath"
        }
    } else {
        Write-Host "No backend startup shortcut found at: $shortcutPath"
    }
    return
 }
 if ($PSCmdlet.ShouldProcess($shortcutPath, "Create backend startup shortcut")) {
    $shell = New-Object -ComObject WScript.Shell
    $shortcut = $shell.CreateShortcut($shortcutPath)
    $shortcut.TargetPath = "$env:SystemRoot\System32\wscript.exe"
    $shortcut.Arguments = '"' + $vbsLauncher + '"'
    $shortcut.WorkingDirectory = $scriptDir
    $shortcut.Description = "Launch ScreenJob backend at sign-in in the current user session."
    $shortcut.Save()
    Write-Host "Created backend startup shortcut: $shortcutPath"
 }
 if ($StartNow) {
    Start-Process -FilePath "$env:SystemRoot\System32\wscript.exe" -ArgumentList @($vbsLauncher) -WorkingDirectory $scriptDir | Out-Null
    Write-Host "Started backend launcher now."
 }
--- a/install_tray_startup_shortcut.ps1
+++ b/install_tray_startup_shortcut.ps1
@@ -0,0 +1,47 @@
 [CmdletBinding(SupportsShouldProcess = $true)]
 param(
    [switch]$Remove,
    [switch]$AllUsers
 )
 Set-StrictMode -Version Latest
 $ErrorActionPreference = "Stop"
 $scriptDir = Split-Path -Parent $PSCommandPath
 $vbsLauncher = Join-Path $scriptDir "start_screenjob_tray_hidden.vbs"
 $shortcutName = "ScreenJob Tray.lnk"
 if (-not (Test-Path -LiteralPath $vbsLauncher)) {
    throw "Launcher file not found: $vbsLauncher"
 }
 $startupFolder = if ($AllUsers) {
    [Environment]::GetFolderPath("CommonStartup")
 } else {
    [Environment]::GetFolderPath("Startup")
 }
 $shortcutPath = Join-Path $startupFolder $shortcutName
 if ($Remove) {
    if (Test-Path -LiteralPath $shortcutPath) {
        if ($PSCmdlet.ShouldProcess($shortcutPath, "Remove startup shortcut")) {
            Remove-Item -LiteralPath $shortcutPath -Force
            Write-Host "Removed startup shortcut: $shortcutPath"
        }
    } else {
        Write-Host "No startup shortcut found at: $shortcutPath"
    }
    return
 }
 if ($PSCmdlet.ShouldProcess($shortcutPath, "Create startup shortcut")) {
    $shell = New-Object -ComObject WScript.Shell
    $shortcut = $shell.CreateShortcut($shortcutPath)
    $shortcut.TargetPath = "$env:SystemRoot\System32\wscript.exe"
    $shortcut.Arguments = '"' + $vbsLauncher + '"'
    $shortcut.WorkingDirectory = $scriptDir
    $shortcut.Description = "Launch ScreenJob tray icon at sign-in."
    $shortcut.Save()
    Write-Host "Created startup shortcut: $shortcutPath"
 }
--- a/screenjob_tray.ps1
+++ b/screenjob_tray.ps1
@@ -0,0 +1,307 @@
 param(
    [string]$ServiceName = "ScreenJobBackend"
 )
 Set-StrictMode -Version Latest
 $ErrorActionPreference = "Stop"
 Add-Type -AssemblyName System.Windows.Forms
 Add-Type -AssemblyName System.Drawing
 $scriptDir = Split-Path -Parent $MyInvocation.MyCommand.Path
 $controlScript = Join-Path $scriptDir "tray_service_control.ps1"
 $logsDir = Join-Path $scriptDir "screenjob_runs\service"
 $defaultHost = "127.0.0.1"
 $defaultPort = "8787"
 function Read-EnvConfig {
    param([string]$EnvFilePath)
    $result = @{}
    if (-not (Test-Path -LiteralPath $EnvFilePath)) {
        return $result
    }
    foreach ($line in Get-Content -Path $EnvFilePath) {
        $trimmed = $line.Trim()
        if ($trimmed.Length -eq 0 -or $trimmed.StartsWith("#")) {
            continue
        }
        $parts = $trimmed.Split("=", 2)
        if ($parts.Count -eq 2) {
            $key = $parts[0].Trim()
            $value = $parts[1].Trim()
            if (($value.StartsWith('"') -and $value.EndsWith('"')) -or ($value.StartsWith("'") -and $value.EndsWith("'"))) {
                $value = $value.Substring(1, $value.Length - 2)
            }
            $result[$key] = $value
        }
    }
    return $result
 }
 function Get-ServiceStatusSafe {
    param([string]$Name)
    try {
        $svc = Get-Service -Name $Name -ErrorAction Stop
        return $svc.Status.ToString()
    } catch {
        return "NotInstalled"
    }
 }
 function Invoke-ServiceActionElevated {
    param(
        [Parameter(Mandatory = $true)][string]$Action,
        [Parameter(Mandatory = $true)][string]$Name
    )
    if (-not (Test-Path -LiteralPath $controlScript)) {
        [System.Windows.Forms.MessageBox]::Show(
            "Missing control script: $controlScript",
            "ScreenJob Tray",
            [System.Windows.Forms.MessageBoxButtons]::OK,
            [System.Windows.Forms.MessageBoxIcon]::Error
        ) | Out-Null
        return
    }
    $argList = @(
        "-NoProfile",
        "-ExecutionPolicy", "Bypass",
        "-File", "`"$controlScript`"",
        "-Action", $Action,
        "-ServiceName", $Name
    )
    try {
        Start-Process -FilePath "powershell.exe" -ArgumentList $argList -Verb RunAs -WindowStyle Hidden | Out-Null
    } catch {
        # User canceled UAC prompt or launch failed.
    }
 }
 function Get-DashboardUrl {
    $envFile = Join-Path $scriptDir ".env"
    $envVars = Read-EnvConfig -EnvFilePath $envFile
    $dashboardHost = $defaultHost
    $dashboardPort = $defaultPort
    if ($envVars.ContainsKey("SCREENJOB_HOST") -and -not [string]::IsNullOrWhiteSpace($envVars["SCREENJOB_HOST"])) {
        $dashboardHost = $envVars["SCREENJOB_HOST"]
    }
    if ($envVars.ContainsKey("SCREENJOB_PORT") -and -not [string]::IsNullOrWhiteSpace($envVars["SCREENJOB_PORT"])) {
        $dashboardPort = $envVars["SCREENJOB_PORT"]
    }
    $connectHost = Resolve-ConnectHost -ConfiguredHost $dashboardHost
    return "http://{0}:{1}/" -f $connectHost, $dashboardPort
 }
 function Resolve-ConnectHost {
    param([string]$ConfiguredHost)
    if ([string]::IsNullOrWhiteSpace($ConfiguredHost)) {
        return "127.0.0.1"
    }
    switch ($ConfiguredHost.Trim().ToLowerInvariant()) {
        "0.0.0.0" { return "127.0.0.1" }
        "::" { return "127.0.0.1" }
        "*" { return "127.0.0.1" }
        default { return $ConfiguredHost }
    }
 }
 function Get-HealthCheckHosts {
    param([string]$ConfiguredHost)
    if ([string]::IsNullOrWhiteSpace($ConfiguredHost)) {
        return @("127.0.0.1", "localhost")
    }
    $normalized = $ConfiguredHost.Trim().ToLowerInvariant()
    switch ($normalized) {
        "0.0.0.0" { return @("127.0.0.1", "localhost", "::1") }
        "::" { return @("127.0.0.1", "localhost", "::1") }
        "*" { return @("127.0.0.1", "localhost", "::1") }
        default { return @($ConfiguredHost) }
    }
 }
 function Test-TcpEndpoint {
    param(
        [Parameter(Mandatory = $true)][string]$HostName,
        [Parameter(Mandatory = $true)][int]$Port,
        [int]$TimeoutMs = 1200
    )
    $client = New-Object System.Net.Sockets.TcpClient
    try {
        $async = $client.BeginConnect($HostName, $Port, $null, $null)
        $connected = $async.AsyncWaitHandle.WaitOne($TimeoutMs, $false)
        if (-not $connected) {
            return $false
        }
        $client.EndConnect($async) | Out-Null
        return $true
    } catch {
        return $false
    } finally {
        $client.Dispose()
    }
 }
 function Get-BackendReachability {
    $envFile = Join-Path $scriptDir ".env"
    $envVars = Read-EnvConfig -EnvFilePath $envFile
    $configuredHost = $defaultHost
    $configuredPort = $defaultPort
    if ($envVars.ContainsKey("SCREENJOB_HOST") -and -not [string]::IsNullOrWhiteSpace($envVars["SCREENJOB_HOST"])) {
        $configuredHost = $envVars["SCREENJOB_HOST"]
    }
    if ($envVars.ContainsKey("SCREENJOB_PORT") -and -not [string]::IsNullOrWhiteSpace($envVars["SCREENJOB_PORT"])) {
        $configuredPort = $envVars["SCREENJOB_PORT"]
    }
    $portNumber = 8787
    [void][int]::TryParse([string]$configuredPort, [ref]$portNumber)
    $hostsToTry = Get-HealthCheckHosts -ConfiguredHost $configuredHost
    foreach ($candidateHost in $hostsToTry) {
        if (Test-TcpEndpoint -HostName $candidateHost -Port $portNumber) {
            return $true
        }
    }
    return $false
 }
 function Update-TrayState {
    param(
        [System.Windows.Forms.NotifyIcon]$NotifyIcon,
        [System.Windows.Forms.ToolStripMenuItem]$StatusItem,
        [string]$Name
    )
    $status = Get-ServiceStatusSafe -Name $Name
    $isBackendReachable = Get-BackendReachability
    $displayStatus = $status
    if ($status -eq "Running" -and -not $isBackendReachable) {
        $displayStatus = "Running (Backend Down)"
    } elseif ($status -eq "Stopped" -and $isBackendReachable) {
        $displayStatus = "Stopped (Backend Up)"
    } elseif ($status -eq "NotInstalled" -and $isBackendReachable) {
        $displayStatus = "NotInstalled (Backend Up)"
    }
    $StatusItem.Text = "Status: $displayStatus"
    switch ($displayStatus) {
        "Running" {
            $NotifyIcon.Icon = [System.Drawing.SystemIcons]::Information
        }
        "Stopped" {
            $NotifyIcon.Icon = [System.Drawing.SystemIcons]::Warning
        }
        default {
            $NotifyIcon.Icon = [System.Drawing.SystemIcons]::Error
        }
    }
    $tooltip = "ScreenJob Backend: $displayStatus"
    if ($tooltip.Length -gt 63) {
        $tooltip = $tooltip.Substring(0, 63)
    }
    $NotifyIcon.Text = $tooltip
 }
 $appContext = New-Object System.Windows.Forms.ApplicationContext
 $notifyIcon = New-Object System.Windows.Forms.NotifyIcon
 $notifyIcon.Visible = $false
 $menu = New-Object System.Windows.Forms.ContextMenuStrip
 $statusItem = New-Object System.Windows.Forms.ToolStripMenuItem "Status: Unknown"
 $statusItem.Enabled = $false
 $refreshItem = New-Object System.Windows.Forms.ToolStripMenuItem "Refresh Status"
 $refreshItem.Add_Click({
    Update-TrayState -NotifyIcon $notifyIcon -StatusItem $statusItem -Name $ServiceName
 })
 $startItem = New-Object System.Windows.Forms.ToolStripMenuItem "Start Service (Admin)"
 $startItem.Add_Click({
    Invoke-ServiceActionElevated -Action "start" -Name $ServiceName
 })
 $stopItem = New-Object System.Windows.Forms.ToolStripMenuItem "Stop Service (Admin)"
 $stopItem.Add_Click({
    Invoke-ServiceActionElevated -Action "stop" -Name $ServiceName
 })
 $restartItem = New-Object System.Windows.Forms.ToolStripMenuItem "Restart Service (Admin)"
 $restartItem.Add_Click({
    Invoke-ServiceActionElevated -Action "restart" -Name $ServiceName
 })
 $dashboardItem = New-Object System.Windows.Forms.ToolStripMenuItem "Open Dashboard"
 $dashboardItem.Add_Click({
    $url = Get-DashboardUrl
    Start-Process $url | Out-Null
 })
 $logsItem = New-Object System.Windows.Forms.ToolStripMenuItem "Open Service Logs"
 $logsItem.Add_Click({
    if (-not (Test-Path -LiteralPath $logsDir)) {
        New-Item -ItemType Directory -Path $logsDir -Force | Out-Null
    }
    Start-Process explorer.exe $logsDir | Out-Null
 })
 $openFolderItem = New-Object System.Windows.Forms.ToolStripMenuItem "Open Project Folder"
 $openFolderItem.Add_Click({
    Start-Process explorer.exe $scriptDir | Out-Null
 })
 $exitItem = New-Object System.Windows.Forms.ToolStripMenuItem "Exit Tray"
 $exitItem.Add_Click({
    $refreshTimer.Stop()
    $notifyIcon.Visible = $false
    $notifyIcon.Dispose()
    $menu.Dispose()
    $appContext.ExitThread()
 })
 [void]$menu.Items.Add($statusItem)
 [void]$menu.Items.Add($refreshItem)
 [void]$menu.Items.Add((New-Object System.Windows.Forms.ToolStripSeparator))
 [void]$menu.Items.Add($startItem)
 [void]$menu.Items.Add($stopItem)
 [void]$menu.Items.Add($restartItem)
 [void]$menu.Items.Add((New-Object System.Windows.Forms.ToolStripSeparator))
 [void]$menu.Items.Add($dashboardItem)
 [void]$menu.Items.Add($logsItem)
 [void]$menu.Items.Add($openFolderItem)
 [void]$menu.Items.Add((New-Object System.Windows.Forms.ToolStripSeparator))
 [void]$menu.Items.Add($exitItem)
 $notifyIcon.ContextMenuStrip = $menu
 $notifyIcon.Visible = $true
 $notifyIcon.Add_DoubleClick({
    $url = Get-DashboardUrl
    Start-Process $url | Out-Null
 })
 $refreshTimer = New-Object System.Windows.Forms.Timer
 $refreshTimer.Interval = 5000
 $refreshTimer.Add_Tick({
    Update-TrayState -NotifyIcon $notifyIcon -StatusItem $statusItem -Name $ServiceName
 })
 Update-TrayState -NotifyIcon $notifyIcon -StatusItem $statusItem -Name $ServiceName
 $refreshTimer.Start()
 [System.Windows.Forms.Application]::Run($appContext)
--- a/service_host/ScreenJob.WindowsServiceHost/BackendProcessService.cs
+++ b/service_host/ScreenJob.WindowsServiceHost/BackendProcessService.cs
@@ -0,0 +1,138 @@
 using System.Diagnostics;
 using Microsoft.Extensions.Hosting;
 using Microsoft.Extensions.Logging;
 namespace ScreenJob.WindowsServiceHost;
 internal sealed class BackendProcessService : BackgroundService
 {
    private readonly ILogger<BackendProcessService> _logger;
    private readonly ServiceOptions _options;
    private readonly object _logLock = new();
    private Process? _backendProcess;
    private string _stdoutLogPath = string.Empty;
    private string _stderrLogPath = string.Empty;
    public BackendProcessService(ILogger<BackendProcessService> logger, ServiceOptions options)
    {
        _logger = logger;
        _options = options;
    }
    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        Directory.CreateDirectory(_options.LogDirectory);
        _stdoutLogPath = Path.Combine(_options.LogDirectory, "backend-service.stdout.log");
        _stderrLogPath = Path.Combine(_options.LogDirectory, "backend-service.stderr.log");
        LogStdOut("Service host starting backend process.");
        LogStdOut($"Script: {_options.BackendScriptPath}");
        LogStdOut($"Working directory: {_options.WorkingDirectory}");
        var powershellPath = Path.Combine(
            Environment.GetFolderPath(Environment.SpecialFolder.Windows),
            "System32",
            "WindowsPowerShell",
            "v1.0",
            "powershell.exe");
        var startInfo = new ProcessStartInfo
        {
            FileName = powershellPath,
            Arguments = $"-NoProfile -ExecutionPolicy Bypass -File \"{_options.BackendScriptPath}\"",
            WorkingDirectory = _options.WorkingDirectory,
            RedirectStandardOutput = true,
            RedirectStandardError = true,
            UseShellExecute = false,
            CreateNoWindow = true
        };
        _backendProcess = new Process { StartInfo = startInfo };
        if (!_backendProcess.Start())
        {
            throw new InvalidOperationException("Failed to start backend process.");
        }
        LogStdOut($"Backend process started with PID {_backendProcess.Id}.");
        _logger.LogInformation("Backend process started with PID {Pid}.", _backendProcess.Id);
        var stdoutPump = PumpStreamAsync(_backendProcess.StandardOutput, LogStdOut, stoppingToken);
        var stderrPump = PumpStreamAsync(_backendProcess.StandardError, LogStdErr, stoppingToken);
        try
        {
            await _backendProcess.WaitForExitAsync(stoppingToken);
            var exitCode = _backendProcess.ExitCode;
            LogStdErr($"Backend process exited unexpectedly with code {exitCode}.");
            _logger.LogError("Backend process exited unexpectedly with code {ExitCode}.", exitCode);
            Environment.ExitCode = exitCode == 0 ? 1 : exitCode;
            throw new InvalidOperationException(
                $"Backend process ended unexpectedly. Service host exit code: {Environment.ExitCode}.");
        }
        catch (OperationCanceledException)
        {
            LogStdOut("Service stop requested.");
        }
        finally
        {
            await Task.WhenAll(stdoutPump, stderrPump);
        }
    }
    public override async Task StopAsync(CancellationToken cancellationToken)
    {
        if (_backendProcess is { HasExited: false })
        {
            try
            {
                LogStdOut("Stopping backend process.");
                _backendProcess.Kill(entireProcessTree: true);
            }
            catch (Exception ex)
            {
                LogStdErr($"Failed to stop backend process cleanly: {ex.Message}");
                _logger.LogError(ex, "Failed to stop backend process cleanly.");
            }
        }
        await base.StopAsync(cancellationToken);
    }
    private async Task PumpStreamAsync(
        StreamReader reader,
        Action<string> sink,
        CancellationToken stoppingToken)
    {
        while (!stoppingToken.IsCancellationRequested)
        {
            var line = await reader.ReadLineAsync();
            if (line is null)
            {
                break;
            }
            sink(line);
        }
    }
    private void LogStdOut(string message)
    {
        WriteLog(_stdoutLogPath, message);
    }
    private void LogStdErr(string message)
    {
        WriteLog(_stderrLogPath, message);
    }
    private void WriteLog(string path, string message)
    {
        var stamp = DateTimeOffset.Now.ToString("yyyy-MM-dd HH:mm:ss");
        var line = $"[{stamp}] {message}{Environment.NewLine}";
        lock (_logLock)
        {
            File.AppendAllText(path, line);
        }
    }
 }
--- a/service_host/ScreenJob.WindowsServiceHost/Program.cs
+++ b/service_host/ScreenJob.WindowsServiceHost/Program.cs
@@ -0,0 +1,18 @@
 using Microsoft.Extensions.DependencyInjection;
 using Microsoft.Extensions.Hosting;
 using ScreenJob.WindowsServiceHost;
 var options = ServiceOptions.Parse(args);
 Host.CreateDefaultBuilder(args)
    .UseWindowsService(serviceOptions =>
    {
        serviceOptions.ServiceName = "ScreenJobBackend";
    })
    .ConfigureServices(services =>
    {
        services.AddSingleton(options);
        services.AddHostedService<BackendProcessService>();
    })
    .Build()
    .Run();
--- a/service_host/ScreenJob.WindowsServiceHost/ScreenJob.WindowsServiceHost.csproj
+++ b/service_host/ScreenJob.WindowsServiceHost/ScreenJob.WindowsServiceHost.csproj
@@ -0,0 +1,12 @@
 <Project Sdk="Microsoft.NET.Sdk.Worker">
  <PropertyGroup>
    <TargetFramework>net10.0-windows</TargetFramework>
    <Nullable>enable</Nullable>
    <ImplicitUsings>enable</ImplicitUsings>
    <OutputType>Exe</OutputType>
  </PropertyGroup>
  <ItemGroup>
    <PackageReference Include="Microsoft.Extensions.Hosting.WindowsServices" Version="10.0.0" />
  </ItemGroup>
 </Project>
--- a/service_host/ScreenJob.WindowsServiceHost/ServiceOptions.cs
+++ b/service_host/ScreenJob.WindowsServiceHost/ServiceOptions.cs
@@ -0,0 +1,77 @@
 namespace ScreenJob.WindowsServiceHost;
 internal sealed record ServiceOptions(
    string BackendScriptPath,
    string WorkingDirectory,
    string LogDirectory)
 {
    public static ServiceOptions Parse(string[] args)
    {
        var map = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase);
        for (var i = 0; i < args.Length; i++)
        {
            var raw = args[i];
            if (!raw.StartsWith("--", StringComparison.Ordinal))
            {
                continue;
            }
            var key = raw[2..];
            if (string.IsNullOrWhiteSpace(key))
            {
                continue;
            }
            if (i + 1 < args.Length && !args[i + 1].StartsWith("--", StringComparison.Ordinal))
            {
                map[key] = args[++i];
            }
            else
            {
                map[key] = "true";
            }
        }
        if (!map.TryGetValue("backend-script", out var backendScript) || string.IsNullOrWhiteSpace(backendScript))
        {
            throw new ArgumentException("Missing required argument: --backend-script <absolute-path-to-start_backend.ps1>.");
        }
        if (!Path.IsPathRooted(backendScript))
        {
            throw new ArgumentException("The --backend-script value must be an absolute path.");
        }
        if (!File.Exists(backendScript))
        {
            throw new FileNotFoundException("Backend script not found.", backendScript);
        }
        if (!map.TryGetValue("working-dir", out var workingDir) || string.IsNullOrWhiteSpace(workingDir))
        {
            workingDir = Path.GetDirectoryName(backendScript)
                ?? throw new ArgumentException("Could not resolve working directory from backend script path.");
        }
        if (!Path.IsPathRooted(workingDir))
        {
            throw new ArgumentException("The --working-dir value must be an absolute path.");
        }
        if (!map.TryGetValue("log-dir", out var logDir) || string.IsNullOrWhiteSpace(logDir))
        {
            logDir = Path.Combine(workingDir, "screenjob_runs", "service");
        }
        if (!Path.IsPathRooted(logDir))
        {
            throw new ArgumentException("The --log-dir value must be an absolute path.");
        }
        return new ServiceOptions(
            Path.GetFullPath(backendScript),
            Path.GetFullPath(workingDir),
            Path.GetFullPath(logDir));
    }
 }
--- a/src/agent.py
+++ b/src/agent.py
--- a/src/app_main.py
+++ b/src/app_main.py
@@ -30,6 +30,7 @@ def main(argv: list[str] | None = None) -> int:
            print("  OPENAI_API_KEY=...")
            print("  SCREENJOB_TOKEN=...")
            print("  DISABLE_UI=true|false (optional)")
            print("  SCREENJOB_PROHIBITED_KEY_COMBOS=ctrl+shift+s,alt+f4 (optional)")
            return 0
        server.main()
        return 0
--- a/src/cli.py
+++ b/src/cli.py
@@ -5,6 +5,7 @@ import json
 import sys
 from pathlib import Path
 from .agent import normalize_disabled_tools
 from .config import load_app_config
 from .models import RuntimeOptions
 from .runtime import create_openai_client, run_job
@@ -40,8 +41,55 @@ def build_parser() -> argparse.ArgumentParser:
        default=4,
        help="Compact model context every N steps to decay old screenshots (0 disables).",
    )
    parser.add_argument(
        "--max-visual-context-images",
        type=int,
        default=3,
        help="Maximum screenshots/enhanced images retained in model-visible context during rebases.",
    )
    parser.add_argument(
        "--native-automation-mode",
        choices=["off", "prefer", "require_fallback"],
        default="prefer",
        help="How strongly the agent should prefer Windows-native automation helpers over pixel fallback.",
    )
    parser.add_argument(
        "--dialog-timeout-seconds",
        type=float,
        default=12.0,
        help="Timeout for dialog-oriented waits and retries.",
    )
    parser.add_argument(
        "--focus-timeout-seconds",
        type=float,
        default=8.0,
        help="Timeout for focus-change waits and verification.",
    )
    parser.add_argument(
        "--ui-element-timeout-seconds",
        type=float,
        default=8.0,
        help="Timeout for native UI element lookup waits.",
    )
    parser.add_argument(
        "--max-retries-per-surface",
        type=int,
        default=3,
        help="Maximum repeated retries on the same classified window/dialog surface before the agent must pivot.",
    )
    parser.add_argument(
        "--pretty-logs",
        action="store_true",
        help="Emit expanded multi-line tool call/result logs for easier debugging.",
    )
    parser.add_argument("--disable-tool", action="append", default=[], help="Disable a tool by name.")
-    parser.add_argument("--skip-safety-check", action="store_true", help="Bypass pre-flight safety check.")
+    parser.add_argument(
        "--skip-safety-check",
        "--skip-safety-chec",
        dest="skip_safety_check",
        action="store_true",
        help="Bypass pre-flight safety check.",
    )
    parser.add_argument("--no-failsafe", action="store_true", help="Disable PyAutoGUI fail-safe.")
    return parser
@@ -57,7 +105,10 @@ def main(argv: list[str] | None = None) -> int:
        return 2
    model = args.model or config.default_model
-    disabled_tools = sorted({str(x).strip() for x in args.disable_tool if str(x).strip()})
+    try:
        disabled_tools = normalize_disabled_tools(args.disable_tool)
    except ValueError as exc:
        parser.error(str(exc))
    if not args.skip_safety_check:
        safety_client = create_openai_client(config.openai_api_key)
@@ -92,7 +143,15 @@ def main(argv: list[str] | None = None) -> int:
        click_pause=args.click_pause,
        reasoning_effort=args.reasoning_effort,
        screen_context_decay_steps=max(0, int(args.screen_context_decay_steps)),
        max_visual_context_images=max(0, int(args.max_visual_context_images)),
        native_automation_mode=args.native_automation_mode,
        dialog_timeout_seconds=max(0.5, float(args.dialog_timeout_seconds)),
        focus_timeout_seconds=max(0.5, float(args.focus_timeout_seconds)),
        ui_element_timeout_seconds=max(0.5, float(args.ui_element_timeout_seconds)),
        max_retries_per_surface=max(1, int(args.max_retries_per_surface)),
        pretty_logs=bool(args.pretty_logs),
        disable_tools=set(disabled_tools),
        prohibited_key_combos=set(config.prohibited_key_combos),
    )
    try:
        result, artifacts = run_job(
--- a/src/config.py
+++ b/src/config.py
@@ -14,6 +14,13 @@ def _env_bool(name: str, default: bool = False) -> bool:
    return raw.strip().lower() in {"1", "true", "yes", "on"}
 def _env_csv(name: str) -> list[str]:
    raw = os.getenv(name)
    if raw is None:
        return []
    return [item.strip() for item in raw.split(",") if item.strip()]
@dataclass(frozen=True)
 class AppConfig:
    openai_api_key: str
@@ -25,6 +32,7 @@ class AppConfig:
    port: int
    runs_dir: Path
    db_path: Path
    prohibited_key_combos: tuple[str, ...] = ()
 def load_app_config(cwd: Path) -> AppConfig:
@@ -38,6 +46,7 @@ def load_app_config(cwd: Path) -> AppConfig:
    runs_dir = cwd / "screenjob_runs"
    db_path = cwd / "screenjob.db"
    disable_ui = _env_bool("DISABLE_UI", default=False)
    prohibited_key_combos = tuple(_env_csv("SCREENJOB_PROHIBITED_KEY_COMBOS"))
    return AppConfig(
        openai_api_key=openai_api_key,
        screenjob_token=screenjob_token,
@@ -48,5 +57,5 @@ def load_app_config(cwd: Path) -> AppConfig:
        port=port,
        runs_dir=runs_dir,
        db_path=db_path,
        prohibited_key_combos=prohibited_key_combos,
    )
--- a/src/desktop_overlay.py
+++ b/src/desktop_overlay.py
@@ -0,0 +1,272 @@
 from __future__ import annotations
 import logging
 import os
 import queue
 import threading
 from dataclasses import dataclass
 from typing import Any
@dataclass(frozen=True)
 class CompletionOverlayPayload:
    job_id: str
    objective: str
    return_message: str
    steps: int
    elapsed_seconds: float
 class DesktopOverlayManager:
    def __init__(self, logger: logging.Logger | None = None, *, auto_dismiss_seconds: float = 10.0) -> None:
        self.logger = logger or logging.getLogger("screenjob.overlay")
        self._queue: queue.Queue[CompletionOverlayPayload] = queue.Queue()
        self._thread: threading.Thread | None = None
        self._lock = threading.Lock()
        self._ready = threading.Event()
        self._disabled = False
        self._warned = False
        self._auto_dismiss_ms = max(0, int(round(float(auto_dismiss_seconds) * 1000)))
    def show_completion(
        self,
        *,
        job_id: str,
        objective: str,
        return_message: str,
        steps: int,
        elapsed_seconds: float,
    ) -> None:
        if os.name != "nt":
            self._disable_once("Desktop completion HUD is only enabled on Windows.")
            return
        if not self._ensure_thread():
            return
        self._queue.put(
            CompletionOverlayPayload(
                job_id=job_id,
                objective=objective,
                return_message=return_message,
                steps=max(0, int(steps)),
                elapsed_seconds=max(0.0, float(elapsed_seconds)),
            )
        )
    def _ensure_thread(self) -> bool:
        with self._lock:
            if self._disabled:
                return False
            if self._thread is None or not self._thread.is_alive():
                self._ready.clear()
                self._thread = threading.Thread(target=self._ui_main, name="screenjob-overlay", daemon=True)
                self._thread.start()
        self._ready.wait(timeout=2.0)
        return not self._disabled
    def _disable_once(self, reason: str) -> None:
        with self._lock:
            self._disabled = True
            already_warned = self._warned
            self._warned = True
            self._ready.set()
        if not already_warned:
            self.logger.warning("%s Overlay notifications disabled.", reason)
    def _format_elapsed(self, elapsed_seconds: float) -> str:
        total_seconds = max(0, int(round(elapsed_seconds)))
        minutes, seconds = divmod(total_seconds, 60)
        hours, minutes = divmod(minutes, 60)
        if hours:
            return f"{hours}h {minutes}m {seconds}s"
        if minutes:
            return f"{minutes}m {seconds}s"
        return f"{seconds}s"
    def _shorten(self, text: str, limit: int) -> str:
        raw = " ".join(str(text or "").split())
        if len(raw) <= limit:
            return raw
        return raw[: max(0, limit - 1)].rstrip() + "..."
    def _ui_main(self) -> None:
        try:
            import tkinter as tk
        except Exception as exc:  # noqa: BLE001
            self._disable_once(f"tkinter is unavailable ({type(exc).__name__}: {exc}).")
            return
        try:
            root = tk.Tk()
            root.withdraw()
            root.update_idletasks()
        except Exception as exc:  # noqa: BLE001
            self._disable_once(f"Desktop overlay could not initialize ({type(exc).__name__}: {exc}).")
            return
        cards: list[dict[str, Any]] = []
        self._ready.set()
        def reposition() -> None:
            screen_width = root.winfo_screenwidth()
            top = 24
            for entry in cards:
                window = entry["window"]
                if not bool(window.winfo_exists()):
                    continue
                window.update_idletasks()
                width = max(320, int(window.winfo_width() or 360))
                height = max(120, int(window.winfo_height() or 160))
                left = max(12, screen_width - width - 24)
                window.geometry(f"{width}x{height}+{left}+{top}")
                top += height + 16
        def dismiss(window: Any) -> None:
            for index, entry in enumerate(list(cards)):
                if entry["window"] is window:
                    after_id = entry.get("after_id")
                    if after_id is not None:
                        try:
                            window.after_cancel(after_id)
                        except Exception:  # noqa: BLE001
                            pass
                    cards.pop(index)
                    break
            try:
                if bool(window.winfo_exists()):
                    window.destroy()
            except Exception:  # noqa: BLE001
                pass
            if cards:
                reposition()
        def add_card(payload: CompletionOverlayPayload) -> None:
            card = tk.Toplevel(root)
            card.withdraw()
            card.overrideredirect(True)
            card.attributes("-topmost", True)
            card.configure(bg="#0f172a")
            frame = tk.Frame(card, bg="#0f172a", highlightthickness=1, highlightbackground="#22c55e", bd=0)
            frame.pack(fill="both", expand=True)
            close_button = tk.Button(
                frame,
                text="×",
                command=lambda win=card: dismiss(win),
                bg="#0f172a",
                fg="#cbd5e1",
                activebackground="#111827",
                activeforeground="#ffffff",
                relief="flat",
                borderwidth=0,
                font=("Segoe UI", 14, "bold"),
                padx=6,
                pady=0,
            )
            close_button.place(relx=1.0, x=-8, y=6, anchor="ne")
            header = tk.Label(
                frame,
                text="Completed",
                bg="#0f172a",
                fg="#86efac",
                font=("Segoe UI", 10, "bold"),
                anchor="w",
            )
            header.pack(fill="x", padx=14, pady=(12, 2))
            title = tk.Label(
                frame,
                text=self._shorten(payload.objective, 72) or "Job complete",
                bg="#0f172a",
                fg="#f8fafc",
                font=("Segoe UI", 11, "bold"),
                justify="left",
                wraplength=320,
                anchor="w",
            )
            title.pack(fill="x", padx=14)
            job_row = tk.Label(
                frame,
                text=f"Job {payload.job_id}",
                bg="#0f172a",
                fg="#94a3b8",
                font=("Segoe UI", 9),
                justify="left",
                anchor="w",
            )
            job_row.pack(fill="x", padx=14, pady=(2, 8))
            message = tk.Label(
                frame,
                text=self._shorten(payload.return_message, 180) or "Task completed.",
                bg="#0f172a",
                fg="#e2e8f0",
                font=("Segoe UI", 9),
                justify="left",
                wraplength=320,
                anchor="w",
            )
            message.pack(fill="x", padx=14)
            footer = tk.Label(
                frame,
                text=f"{payload.steps} step(s)  |  {self._format_elapsed(payload.elapsed_seconds)}",
                bg="#0f172a",
                fg="#94a3b8",
                font=("Segoe UI", 9),
                justify="left",
                anchor="w",
            )
            footer.pack(fill="x", padx=14, pady=(10, 12))
            after_id = None
            if self._auto_dismiss_ms > 0:
                after_id = card.after(self._auto_dismiss_ms, lambda win=card: dismiss(win))
            cards.insert(0, {"window": card, "after_id": after_id})
            while len(cards) > 3:
                stale = cards.pop()
                try:
                    stale_after_id = stale.get("after_id")
                    if stale_after_id is not None:
                        stale["window"].after_cancel(stale_after_id)
                    stale["window"].destroy()
                except Exception:  # noqa: BLE001
                    pass
            card.update_idletasks()
            reposition()
            card.deiconify()
        def pump_queue() -> None:
            try:
                while True:
                    add_card(self._queue.get_nowait())
            except queue.Empty:
                pass
            try:
                root.after(120, pump_queue)
            except Exception:  # noqa: BLE001
                self._disable_once("Desktop overlay event loop stopped unexpectedly.")
        pump_queue()
        try:
            root.mainloop()
        except Exception as exc:  # noqa: BLE001
            self._disable_once(f"Desktop overlay main loop failed ({type(exc).__name__}: {exc}).")
 _overlay_singleton: DesktopOverlayManager | None = None
 _overlay_lock = threading.Lock()
 def get_desktop_overlay_manager(logger: logging.Logger | None = None) -> DesktopOverlayManager:
    global _overlay_singleton
    with _overlay_lock:
        if _overlay_singleton is None:
            _overlay_singleton = DesktopOverlayManager(logger=logger)
        elif logger is not None:
            _overlay_singleton.logger = logger
        return _overlay_singleton
--- a/src/models.py
+++ b/src/models.py
@@ -60,4 +60,12 @@ class RuntimeOptions:
    click_pause: float = 0.10
    reasoning_effort: str = "medium"
    screen_context_decay_steps: int = 4
    max_visual_context_images: int = 3
    native_automation_mode: str = "prefer"
    dialog_timeout_seconds: float = 12.0
    focus_timeout_seconds: float = 8.0
    ui_element_timeout_seconds: float = 8.0
    max_retries_per_surface: int = 3
    pretty_logs: bool = False
    disable_tools: set[str] | None = None
    prohibited_key_combos: set[str] | None = None
--- a/src/server.py
+++ b/src/server.py
@@ -12,6 +12,7 @@ from fastapi.responses import FileResponse
 from fastapi.responses import HTMLResponse, JSONResponse
 from pydantic import BaseModel, Field
 from .agent import normalize_disabled_tools
 from .config import AppConfig, load_app_config
 from .storage import HistoryDB
 from .task_manager import JobManager
@@ -28,6 +29,13 @@ class CreateJobRequest(BaseModel):
    click_pause: float = Field(0.10, ge=0.0, le=2.0)
    reasoning_effort: str = Field("medium", pattern="^(low|medium|high)$")
    screen_context_decay_steps: int = Field(4, ge=0, le=50)
    max_visual_context_images: int = Field(3, ge=0, le=12)
    native_automation_mode: str = Field("prefer", pattern="^(off|prefer|require_fallback)$")
    dialog_timeout_seconds: float = Field(12.0, ge=0.5, le=120.0)
    focus_timeout_seconds: float = Field(8.0, ge=0.5, le=120.0)
    ui_element_timeout_seconds: float = Field(8.0, ge=0.5, le=120.0)
    max_retries_per_surface: int = Field(3, ge=1, le=10)
    pretty_logs: bool = False
    disabled_tools: list[str] = Field(default_factory=list)
    safety_override: bool = False
    no_failsafe: bool = False
@@ -297,6 +305,8 @@ def create_app(config: AppConfig | None = None) -> FastAPI:
    @app.post("/api/jobs")
    def create_job(payload: CreateJobRequest, _: None = Depends(require_token)) -> dict[str, str]:
        try:
            disabled_tools = normalize_disabled_tools(payload.disabled_tools)
            job_id = manager.submit_job(
                objective=payload.job,
                model=payload.model,
@@ -306,10 +316,19 @@ def create_app(config: AppConfig | None = None) -> FastAPI:
                click_pause=payload.click_pause,
                reasoning_effort=payload.reasoning_effort,
                screen_context_decay_steps=payload.screen_context_decay_steps,
-            disabled_tools=payload.disabled_tools,
+                max_visual_context_images=payload.max_visual_context_images,
                native_automation_mode=payload.native_automation_mode,
                dialog_timeout_seconds=payload.dialog_timeout_seconds,
                focus_timeout_seconds=payload.focus_timeout_seconds,
                ui_element_timeout_seconds=payload.ui_element_timeout_seconds,
                max_retries_per_surface=payload.max_retries_per_surface,
                pretty_logs=payload.pretty_logs,
                disabled_tools=disabled_tools,
                safety_override=payload.safety_override,
                no_failsafe=payload.no_failsafe,
            )
        except ValueError as exc:
            raise HTTPException(status_code=400, detail=str(exc)) from exc
        return {"job_id": job_id}
    @app.get("/api/jobs")
--- a/src/task_manager.py
+++ b/src/task_manager.py
@@ -8,7 +8,9 @@ from dataclasses import dataclass
 from pathlib import Path
 from typing import Any, Callable
 from .agent import normalize_disabled_tools
 from .config import AppConfig
 from .desktop_overlay import DesktopOverlayManager, get_desktop_overlay_manager
 from .models import RuntimeOptions
 from .runtime import create_openai_client, run_job
 from .safety import assess_task_safety
@@ -32,10 +34,12 @@ class JobManager:
        config: AppConfig,
        db: HistoryDB,
        broadcast: Callable[[dict[str, Any]], None] | None = None,
        overlay_manager: DesktopOverlayManager | None = None,
    ) -> None:
        self.config = config
        self.db = db
        self.broadcast = broadcast
        self.overlay_manager = overlay_manager or get_desktop_overlay_manager()
        self._running: dict[str, _RunningJob] = {}
        self._lock = threading.Lock()
@@ -50,6 +54,13 @@ class JobManager:
        click_pause: float = 0.10,
        reasoning_effort: str = "medium",
        screen_context_decay_steps: int = 4,
        max_visual_context_images: int = 3,
        native_automation_mode: str = "prefer",
        dialog_timeout_seconds: float = 12.0,
        focus_timeout_seconds: float = 8.0,
        ui_element_timeout_seconds: float = 8.0,
        max_retries_per_surface: int = 3,
        pretty_logs: bool = False,
        disabled_tools: list[str] | None = None,
        safety_override: bool = False,
        no_failsafe: bool = False,
@@ -57,7 +68,7 @@ class JobManager:
        job_id = f"job_{int(time.time())}_{uuid.uuid4().hex[:8]}"
        created_at = utc_now_iso()
        selected_model = (model or self.config.default_model).strip() or self.config.default_model
-        disabled = sorted({tool.strip() for tool in (disabled_tools or []) if tool.strip()})
+        disabled = normalize_disabled_tools(disabled_tools)
        self.db.create_job(
            job_id=job_id,
            objective=objective,
@@ -97,6 +108,13 @@ class JobManager:
                "click_pause": click_pause,
                "reasoning_effort": reasoning_effort,
                "screen_context_decay_steps": screen_context_decay_steps,
                "max_visual_context_images": max_visual_context_images,
                "native_automation_mode": native_automation_mode,
                "dialog_timeout_seconds": dialog_timeout_seconds,
                "focus_timeout_seconds": focus_timeout_seconds,
                "ui_element_timeout_seconds": ui_element_timeout_seconds,
                "max_retries_per_surface": max_retries_per_surface,
                "pretty_logs": pretty_logs,
                "no_failsafe": no_failsafe,
                "cancel_event": cancel_event,
            },
@@ -127,6 +145,13 @@ class JobManager:
        click_pause: float,
        reasoning_effort: str,
        screen_context_decay_steps: int,
        max_visual_context_images: int,
        native_automation_mode: str,
        dialog_timeout_seconds: float,
        focus_timeout_seconds: float,
        ui_element_timeout_seconds: float,
        max_retries_per_surface: int,
        pretty_logs: bool,
        no_failsafe: bool,
        cancel_event: threading.Event,
    ) -> None:
@@ -226,7 +251,15 @@ class JobManager:
            click_pause=click_pause,
            reasoning_effort=reasoning_effort,
            screen_context_decay_steps=max(0, int(screen_context_decay_steps)),
            max_visual_context_images=max(0, int(max_visual_context_images)),
            native_automation_mode=str(native_automation_mode or "prefer").strip().lower() or "prefer",
            dialog_timeout_seconds=max(0.5, float(dialog_timeout_seconds)),
            focus_timeout_seconds=max(0.5, float(focus_timeout_seconds)),
            ui_element_timeout_seconds=max(0.5, float(ui_element_timeout_seconds)),
            max_retries_per_surface=max(1, int(max_retries_per_surface)),
            pretty_logs=bool(pretty_logs),
            disable_tools=set(disabled_tools),
            prohibited_key_combos=set(self.config.prohibited_key_combos),
        )
        try:
            result, artifacts = run_job(
@@ -297,6 +330,14 @@ class JobManager:
                },
            },
        )
        if status == "completed":
            self.overlay_manager.show_completion(
                job_id=job_id,
                objective=objective,
                return_message=result.return_message,
                steps=result.steps,
                elapsed_seconds=max(0.0, float(result.ended_at - result.started_at)),
            )
        with self._lock:
            self._running.pop(job_id, None)
--- a/start_backend.ps1
+++ b/start_backend.ps1
@@ -15,10 +15,76 @@ function Test-EnvVarLine {
    return [bool](Select-String -Path $FilePath -Pattern ("^\s*" + [regex]::Escape($Name) + "=") -Quiet)
 }
-if (-not (Get-Command python -ErrorAction SilentlyContinue)) {
+function Resolve-PythonExecutable {
-    throw "Python was not found in PATH. Install Python 3.11+ and retry."
+    $venvPython = Join-Path $scriptDir ".venv\Scripts\python.exe"
    if (Test-Path -LiteralPath $venvPython) {
        return $venvPython
    }
    $pythonCmd = Get-Command python -ErrorAction SilentlyContinue
    if ($null -ne $pythonCmd -and (Test-Path -LiteralPath $pythonCmd.Source)) {
        return $pythonCmd.Source
    }
    $candidatePyLaunchers = @()
    $pyFromPath = Get-Command py -ErrorAction SilentlyContinue
    if ($null -ne $pyFromPath -and (Test-Path -LiteralPath $pyFromPath.Source)) {
        $candidatePyLaunchers += $pyFromPath.Source
    }
    $candidatePyLaunchers += "C:\Windows\py.exe"
    if ($scriptDir -match "^[A-Za-z]:\\Users\\[^\\]+") {
        $repoUserHome = $Matches[0]
        $candidatePyLaunchers += (Join-Path $repoUserHome "AppData\Local\Programs\Python\Launcher\py.exe")
    }
    foreach ($pyLauncher in ($candidatePyLaunchers | Select-Object -Unique)) {
        if (-not (Test-Path -LiteralPath $pyLauncher)) {
            continue
        }
        try {
            $resolved = (& $pyLauncher -3 -c "import sys; print(sys.executable)" 2>$null | Select-Object -Last 1).Trim()
            if ($resolved -and (Test-Path -LiteralPath $resolved)) {
                return $resolved
            }
        } catch {
            continue
        }
    }
    $candidatePythonPaths = @()
    if ($scriptDir -match "^[A-Za-z]:\\Users\\[^\\]+") {
        $repoUserHome = $Matches[0]
        $pythonBase = Join-Path $repoUserHome "AppData\Local\Programs\Python"
        if (Test-Path -LiteralPath $pythonBase) {
            $candidatePythonPaths += (Get-ChildItem -LiteralPath $pythonBase -Directory -ErrorAction SilentlyContinue |
                Sort-Object Name -Descending |
                ForEach-Object { Join-Path $_.FullName "python.exe" })
        }
    }
    $candidatePythonPaths += @(
        "C:\Python314\python.exe",
        "C:\Python313\python.exe",
        "C:\Python312\python.exe",
        "C:\Python311\python.exe",
        "C:\Program Files\Python314\python.exe",
        "C:\Program Files\Python313\python.exe",
        "C:\Program Files\Python312\python.exe",
        "C:\Program Files\Python311\python.exe"
    )
    foreach ($candidate in ($candidatePythonPaths | Select-Object -Unique)) {
        if (Test-Path -LiteralPath $candidate) {
            return $candidate
        }
    }
    throw "Python was not found. Install Python 3.11+ system-wide, or create .venv in the repo root."
 }
 $pythonExe = Resolve-PythonExecutable
 $envFile = Join-Path $scriptDir ".env"
 if (-not (Test-Path -LiteralPath $envFile)) {
    Write-Warning ".env was not found at $envFile. Server startup may fail if required vars are missing."
@@ -31,5 +97,5 @@ if (-not (Test-Path -LiteralPath $envFile)) {
    }
 }
-Write-Host "Starting ScreenJob backend on configured host/port..." -ForegroundColor Cyan
+Write-Host "Starting ScreenJob backend with Python: $pythonExe" -ForegroundColor Cyan
-python main.py server
+& $pythonExe main.py server
--- a/start_backend_hidden.vbs
+++ b/start_backend_hidden.vbs
@@ -0,0 +1,11 @@
 Option Explicit
 Dim shell, fso, scriptDir, psScript, command
 Set shell = CreateObject("WScript.Shell")
 Set fso = CreateObject("Scripting.FileSystemObject")
 scriptDir = fso.GetParentFolderName(WScript.ScriptFullName)
 psScript = """" & fso.BuildPath(scriptDir, "start_backend.ps1") & """"
 command = "powershell.exe -NoProfile -ExecutionPolicy Bypass -WindowStyle Hidden -STA -File " & psScript
 shell.Run command, 0, False
--- a/start_screenjob_tray_hidden.vbs
+++ b/start_screenjob_tray_hidden.vbs
@@ -0,0 +1,11 @@
 Option Explicit
 Dim shell, fso, scriptDir, psScript, command
 Set shell = CreateObject("WScript.Shell")
 Set fso = CreateObject("Scripting.FileSystemObject")
 scriptDir = fso.GetParentFolderName(WScript.ScriptFullName)
 psScript = """" & fso.BuildPath(scriptDir, "screenjob_tray.ps1") & """"
 command = "powershell.exe -NoProfile -ExecutionPolicy Bypass -WindowStyle Hidden -STA -File " & psScript
 shell.Run command, 0, False
--- a/tests/test_agent_tools.py
+++ b/tests/test_agent_tools.py
@@ -1,8 +1,11 @@
 from __future__ import annotations
 import json
 import logging
 from pathlib import Path
 from typing import Any
 import pytest
 from PIL import Image
 import src.agent as agent_module
@@ -15,8 +18,12 @@ class _DummyPyAutoGUI:
    def __init__(self) -> None:
        self.last_move_to: tuple[int, int] | None = None
-        self.last_click: tuple[int, int] | None = None
+        self.last_move_duration: float | None = None
        self.last_click: dict[str, object] | None = None
        self.last_hotkey: tuple[str, ...] | None = None
        self.last_drag_to: dict[str, object] | None = None
        self.last_scroll: int | None = None
        self.current_position: tuple[int, int] = (640, 360)
    def screenshot(self) -> Image.Image:
        return Image.new("RGB", (1280, 720), color=(24, 24, 24))
@@ -26,9 +33,26 @@ class _DummyPyAutoGUI:
    def moveTo(self, x: int, y: int, duration: float = 0.0) -> None:  # noqa: N802
        self.last_move_to = (x, y)
        self.last_move_duration = duration
        self.current_position = (x, y)
-    def click(self, x: int, y: int) -> None:
+    def click(
-        self.last_click = (x, y)
+        self,
        x: int,
        y: int,
        clicks: int = 1,
        interval: float = 0.0,
        button: str = "left",
    ) -> None:
        self.last_click = {"x": x, "y": y, "clicks": clicks, "interval": interval, "button": button}
        self.current_position = (x, y)
    def dragTo(self, x: int, y: int, duration: float = 0.0, button: str = "left") -> None:  # noqa: N802
        self.last_drag_to = {"x": x, "y": y, "duration": duration, "button": button}
        self.current_position = (x, y)
    def scroll(self, amount: int) -> None:
        self.last_scroll = amount
    def write(self, _: str, interval: float = 0.0) -> None:
        return None
@@ -39,6 +63,10 @@ class _DummyPyAutoGUI:
    def hotkey(self, *keys: str) -> None:
        self.last_hotkey = tuple(keys)
    def position(self):
        x, y = self.current_position
        return type("Point", (), {"x": x, "y": y})()
 def _build_agent(tmp_path: Path, monkeypatch) -> agent_module.ScreenJobAgent:
    dummy_gui = _DummyPyAutoGUI()
@@ -84,11 +112,158 @@ def test_click_supports_directional_offsets(tmp_path: Path, monkeypatch) -> None
            "offset_up": "2px",
            "offset_right": 7,
            "offset": {"x": 3, "y": 4},
            "button": "right",
            "click_count": 2,
            "interval_seconds": "0.5s",
            "duration_seconds": "0.2s",
            "sleep_after_seconds": 0,
        }
    )
    assert click_result["ok"] is True
    assert click_result["clicked"] == {"x": 110, "y": 102}
    assert click_result["button"] == "right"
    assert click_result["click_count"] == 2
    assert click_result["interval_seconds"] == 0.5
    assert click_result["duration_seconds"] == 0.2
    assert agent_module.pyautogui.last_click == {
        "x": 110,
        "y": 102,
        "clicks": 2,
        "interval": 0.5,
        "button": "right",
    }
 def test_scroll_supports_direction_and_amount(tmp_path: Path, monkeypatch) -> None:
    agent = _build_agent(tmp_path, monkeypatch)
    result = agent._tool_scroll(
        {
            "amount": 8,
            "direction": "down",
            "coordinate": {"x": 1400, "y": -5},
            "sleep_after_seconds": 0,
        }
    )
    assert result["ok"] is True
    assert result["amount"] == -8
    assert result["direction"] == "down"
    assert result["moved_to"] == {"x": 1279, "y": 0}
    assert agent_module.pyautogui.last_scroll == -8
 def test_drag_translates_coordinates_and_button(tmp_path: Path, monkeypatch) -> None:
    agent = _build_agent(tmp_path, monkeypatch)
    result = agent._tool_drag(
        {
            "start_coordinate": {"x": -10, "y": 100},
            "end_coordinate": {"x": 1285, "y": 800},
            "button": "middle",
            "duration_seconds": "0.3s",
            "sleep_after_seconds": 0,
        }
    )
    assert result["ok"] is True
    assert result["from"] == {"x": 0, "y": 100}
    assert result["to"] == {"x": 1279, "y": 719}
    assert result["button"] == "middle"
    assert result["duration_seconds"] == 0.3
    assert agent_module.pyautogui.last_drag_to == {
        "x": 1279,
        "y": 719,
        "duration": 0.3,
        "button": "middle",
    }
 def test_move_mouse_clamps_target_coordinate(tmp_path: Path, monkeypatch) -> None:
    agent = _build_agent(tmp_path, monkeypatch)
    result = agent._tool_move_mouse({"coordinate": {"x": 1500, "y": -5}, "duration_seconds": "0.4s"})
    assert result["ok"] is True
    assert result["moved_to"] == {"x": 1279, "y": 0}
    assert result["duration_seconds"] == 0.4
    assert agent_module.pyautogui.last_move_to == (1279, 0)
 def test_clipboard_get_and_set_round_trip(tmp_path: Path, monkeypatch) -> None:
    agent = _build_agent(tmp_path, monkeypatch)
    state = {"text": ""}
    monkeypatch.setattr(agent, "_clipboard_set_text", lambda text: state.__setitem__("text", text))
    monkeypatch.setattr(agent, "_clipboard_get_text", lambda: state["text"])
    monkeypatch.setattr(
        agent,
        "_clipboard_get_metadata",
        lambda: {"has_text": bool(state["text"]), "has_image": True, "available_formats": ["CF_UNICODETEXT", "CF_DIB"]},
    )
    set_result = agent._tool_clipboard_set({"text": "hello clipboard"})
    get_result = agent._tool_clipboard_get({})
    assert set_result["ok"] is True
    assert set_result["length"] == 15
    assert get_result["ok"] is True
    assert get_result["text"] == "hello clipboard"
    assert get_result["length"] == 15
    assert get_result["has_text"] is True
    assert get_result["has_image"] is True
    assert get_result["available_formats"] == ["CF_UNICODETEXT", "CF_DIB"]
 def test_clipboard_set_falls_back_to_powershell_when_native_path_fails(tmp_path: Path, monkeypatch) -> None:
    agent = _build_agent(tmp_path, monkeypatch)
    state = {"text": ""}
    def fail_native(_: str) -> None:
        raise OSError("[WinError 6] The handle is invalid.")
    def shell_fallback(text: str) -> None:
        state["text"] = text
    monkeypatch.setattr(agent, "_clipboard_set_text", fail_native)
    monkeypatch.setattr(agent, "_clipboard_set_text_via_shell", shell_fallback)
    result = agent._tool_clipboard_set({"text": "Example Domain"})
    assert result["ok"] is True
    assert result["used_shell_fallback"] is True
    assert "WinError 6" in result["native_error"]
    assert state["text"] == "Example Domain"
 def test_get_cursor_position_returns_current_mouse_location(tmp_path: Path, monkeypatch) -> None:
    agent = _build_agent(tmp_path, monkeypatch)
    agent_module.pyautogui.current_position = (321, 654)
    result = agent._tool_get_cursor_position({})
    assert result["ok"] is True
    assert result["position"] == {"x": 321, "y": 654}
 def test_get_active_window_returns_metadata_shape(tmp_path: Path, monkeypatch) -> None:
    agent = _build_agent(tmp_path, monkeypatch)
    monkeypatch.setattr(
        agent,
        "_get_active_window_info",
        lambda: {
            "available": True,
            "hwnd": 1234,
            "title": "Settings",
            "class_name": "ApplicationFrameWindow",
            "thread_id": 44,
            "process_id": 77,
            "is_visible": True,
            "rect": {"left": 10, "top": 20, "right": 410, "bottom": 320, "width": 400, "height": 300},
        },
    )
    result = agent._tool_get_active_window({})
    assert result["ok"] is True
    assert result["window"]["title"] == "Settings"
    assert result["window"]["rect"]["width"] == 400
 def test_enhance_defaults_to_small_ui_preset(tmp_path: Path, monkeypatch) -> None:
@@ -135,6 +310,32 @@ def test_press_key_supports_hotkey_combo(tmp_path: Path, monkeypatch) -> None:
    assert agent_module.pyautogui.last_hotkey == ("win", "r")
 def test_press_key_blocks_prohibited_combo(tmp_path: Path, monkeypatch) -> None:
    agent = _build_agent(tmp_path, monkeypatch)
    agent.options.prohibited_key_combos = {"ctrl+shift+s"}
    agent.prohibited_key_combos = agent._normalize_prohibited_key_combos(agent.options.prohibited_key_combos)
    result = agent._tool_press_key({"key": "ctrl+shift+s"})
    assert result["ok"] is False
    assert result["blocked"] is True
    assert result["key"] == "ctrl+shift+s"
    assert "prohibited by runtime configuration" in result["error"]
    assert "another allowed route" in result["hint"]
 def test_press_key_blocks_prohibited_combo_after_alias_normalization(tmp_path: Path, monkeypatch) -> None:
    agent = _build_agent(tmp_path, monkeypatch)
    agent.options.prohibited_key_combos = {"meta+r"}
    agent.prohibited_key_combos = agent._normalize_prohibited_key_combos(agent.options.prohibited_key_combos)
    result = agent._tool_press_key({"key": "win+r"})
    assert result["ok"] is False
    assert result["blocked"] is True
    assert result["key"] == "win+r"
 def test_context_compaction_trigger_and_payload(tmp_path: Path, monkeypatch) -> None:
    agent = _build_agent(tmp_path, monkeypatch)
    agent.objective = "Open settings app"
@@ -147,7 +348,596 @@ def test_context_compaction_trigger_and_payload(tmp_path: Path, monkeypatch) ->
    agent.last_screen_meta = {"width": 1280, "height": 720, "path": "C:/tmp/frame.png"}
    assert agent._should_compact_context() is True
-    compacted = agent._build_compacted_pending_input()
+    visual_message = agent._build_visual_message("Current screen", "data:image/png;base64,abc", agent.last_screen_meta)
    agent._register_visual_context_message(visual_message, agent.last_screen_meta, tool_name="see_screen")
    compacted = agent._build_compacted_pending_input("decay")
    assert len(compacted) == 2
-    assert "Context compaction activated" in compacted[0]["content"][0]["text"]
+    assert "Context compaction activated due to stale context decay." in compacted[0]["content"][0]["text"]
    assert "Open settings app" in compacted[0]["content"][0]["text"]
    assert "Treat prior reasoning as stale" in compacted[0]["content"][0]["text"]
    assert "Retained visual observations:" in compacted[0]["content"][0]["text"]
    assert "do not call see_screen again only because compaction happened" in compacted[0]["content"][0]["text"]
    assert "observe -> decide -> act -> verify" in compacted[0]["content"][0]["text"]
 def test_context_compaction_drops_function_call_outputs_from_rebased_input(tmp_path: Path, monkeypatch) -> None:
    agent = _build_agent(tmp_path, monkeypatch)
    agent.objective = "Open settings app"
    visual_meta = {"path": "C:/tmp/frame.png"}
    visual_message = agent._build_visual_message("Current screen", "data:image/png;base64,abc", visual_meta)
    agent._register_visual_context_message(visual_message, visual_meta, tool_name="see_screen")
    compacted = agent._build_compacted_pending_input(
        "decay",
        carryover_items=[
            {"type": "function_call_output", "call_id": "call_123", "output": "{\"ok\": true}"},
            {"role": "user", "content": [{"type": "input_text", "text": "blocked hint"}]},
        ],
    )
    assert len(compacted) == 3
    assert compacted[1]["role"] == "user"
    assert compacted[1]["content"][0]["text"] == "blocked hint"
    assert all(item.get("type") != "function_call_output" for item in compacted)
 def test_visual_context_budget_keeps_only_latest_three_images(tmp_path: Path, monkeypatch) -> None:
    agent = _build_agent(tmp_path, monkeypatch)
    agent.options.max_visual_context_images = 3
    captured_times = [
        "2026-05-30T10:00:03+00:00",
        "2026-05-30T10:00:01+00:00",
        "2026-05-30T10:00:04+00:00",
        "2026-05-30T10:00:02+00:00",
    ]
    for idx, captured_at in enumerate(captured_times):
        meta = {"path": f"C:/tmp/frame_{idx}.png", "captured_at": captured_at}
        message = agent._build_visual_message(f"frame {idx}", f"data:image/png;base64,{idx}", meta)
        agent._register_visual_context_message(message, meta, tool_name="see_screen")
    assert agent.visual_context_overflow_pending is True
    assert [entry["meta"]["path"] for entry in agent.visual_context_messages] == [
        "C:/tmp/frame_3.png",
        "C:/tmp/frame_0.png",
        "C:/tmp/frame_2.png",
    ]
 def test_compacted_input_uses_latest_visuals_by_capture_time(tmp_path: Path, monkeypatch) -> None:
    agent = _build_agent(tmp_path, monkeypatch)
    agent.options.max_visual_context_images = 3
    agent.objective = "Verify the current app window"
    for idx, captured_at in enumerate(
        [
            "2026-05-30T10:00:04+00:00",
            "2026-05-30T10:00:01+00:00",
            "2026-05-30T10:00:03+00:00",
            "2026-05-30T10:00:02+00:00",
        ]
    ):
        meta = {"path": f"C:/tmp/frame_{idx}.png", "captured_at": captured_at}
        message = agent._build_visual_message(f"frame {idx}", f"data:image/png;base64,{idx}", meta)
        agent._register_visual_context_message(message, meta, tool_name="see_screen")
    compacted = agent._build_compacted_pending_input("visual_budget")
    visual_messages = [
        item
        for item in compacted
        if isinstance(item.get("content"), list)
        and any(part.get("type") == "input_image" for part in item["content"] if isinstance(part, dict))
    ]
    assert len(visual_messages) == 3
    assert [
        json.loads(message["content"][0]["text"].split("Metadata: ", 1)[1].split("\n", 1)[0])["path"]
        for message in visual_messages
    ] == [
        "C:/tmp/frame_3.png",
        "C:/tmp/frame_2.png",
        "C:/tmp/frame_0.png",
    ]
 def test_context_compaction_event_includes_visual_budget_reason_and_paths(tmp_path: Path, monkeypatch) -> None:
    events: list[dict[str, object]] = []
    agent = _build_agent(tmp_path, monkeypatch)
    agent.event_callback = events.append
    agent.step = 5
    agent.recent_tool_summaries = ["step=4 tool=enhance status=ok"]
    agent.visual_context_messages = [
        {"message": {"role": "user", "content": []}, "meta": {"path": "C:/tmp/1.png"}},
        {"message": {"role": "user", "content": []}, "meta": {"path": "C:/tmp/2.png"}},
        {"message": {"role": "user", "content": []}, "meta": {"path": "C:/tmp/3.png"}},
    ]
    agent._emit_context_compacted("visual_budget")
    assert events[-1]["event_type"] == "context_compacted"
    payload = events[-1]["payload"]
    assert payload["rebuild_reason"] == "visual_budget"
    assert payload["visual_context_paths"] == ["C:/tmp/1.png", "C:/tmp/2.png", "C:/tmp/3.png"]
 def test_observation_loop_blocks_repeated_broad_reobservation(tmp_path: Path, monkeypatch) -> None:
    agent = _build_agent(tmp_path, monkeypatch)
    agent.step_history = [
        {
            "step": 21,
            "tool_names": ["get_active_window", "see_screen"],
            "window_signature": "123|#32770|Save as",
            "window_summary": "Save as [#32770]",
            "had_visual": True,
        },
        {
            "step": 22,
            "tool_names": ["get_active_window", "see_screen"],
            "window_signature": "123|#32770|Save as",
            "window_summary": "Save as [#32770]",
            "had_visual": True,
        },
        {
            "step": 23,
            "tool_names": ["get_active_window", "see_screen"],
            "window_signature": "123|#32770|Save as",
            "window_summary": "Save as [#32770]",
            "had_visual": True,
        },
    ]
    blocked = agent._dispatch_tool("see_screen", {})
    assert blocked["ok"] is False
    assert blocked["blocked"] is True
    assert blocked["blocked_reason"] == "observation_loop"
    assert "unchanged foreground window" in blocked["error"]
    assert blocked["window_summary"] == "Save as [#32770]"
 def test_repeated_ambiguous_action_requires_verification_and_then_blocks(tmp_path: Path, monkeypatch) -> None:
    agent = _build_agent(tmp_path, monkeypatch)
    type_args = {"text": "repeat me"}
    first = agent._dispatch_tool("type", type_args)
    assert first["ok"] is True
    assert first["verification_required"] is True
    assert first["verification_channels"] == ["enhance", "get_active_window", "see_screen"]
    blocked_without_verification = agent._dispatch_tool("type", type_args)
    assert blocked_without_verification["blocked"] is True
    assert "see_screen" in blocked_without_verification["error"]
    assert agent._dispatch_tool("see_screen", {})["ok"] is True
    assert agent._dispatch_tool("type", type_args)["ok"] is True
    assert agent._dispatch_tool("see_screen", {})["ok"] is True
    assert agent._dispatch_tool("type", type_args)["ok"] is True
    assert agent._dispatch_tool("see_screen", {})["ok"] is True
    blocked_after_retry_budget = agent._dispatch_tool("type", type_args)
    assert blocked_after_retry_budget["blocked"] is True
    assert "3 time(s) on the same surface" in blocked_after_retry_budget["error"]
    assert agent._dispatch_tool("see_screen", {})["ok"] is True
    reset_attempt = agent._dispatch_tool("type", type_args)
    assert reset_attempt["ok"] is True
 def test_copy_shortcut_prefers_clipboard_verification(tmp_path: Path, monkeypatch) -> None:
    agent = _build_agent(tmp_path, monkeypatch)
    monkeypatch.setattr(
        agent,
        "_clipboard_get_metadata",
        lambda: {"has_text": True, "has_image": False, "available_formats": ["CF_UNICODETEXT"]},
    )
    monkeypatch.setattr(agent, "_clipboard_get_text", lambda: "copied")
    first = agent._dispatch_tool("press_key", {"key": "ctrl+c"})
    assert first["ok"] is True
    assert first["verification_channels"] == ["clipboard_get"]
    blocked = agent._dispatch_tool("press_key", {"key": "ctrl+c"})
    assert blocked["blocked"] is True
    assert "clipboard_get" in blocked["error"]
    observed = agent._dispatch_tool("clipboard_get", {})
    assert observed["ok"] is True
    assert observed["has_text"] is True
    second = agent._dispatch_tool("press_key", {"key": "ctrl+c"})
    assert second["ok"] is True
 def test_execute_command_blocks_unrequested_recursive_file_search(tmp_path: Path, monkeypatch) -> None:
    agent = _build_agent(tmp_path, monkeypatch)
    agent.objective = "Save the current note in Notepad"
    result = agent._tool_execute_command({"command": "Get-ChildItem -Recurse -Filter *.txt"})
    assert result["ok"] is False
    assert result["blocked"] is True
    assert "out of scope" in result["error"]
 def test_execute_command_allows_recursive_file_search_when_objective_requests_it(tmp_path: Path, monkeypatch) -> None:
    agent = _build_agent(tmp_path, monkeypatch)
    agent.objective = "Find the saved text file path"
    called: dict[str, Any] = {}
    class _FakeProcess:
        returncode = 0
        def poll(self) -> int:
            return 0
        def communicate(self, timeout: int = 2):
            return ("ok", "")
    def fake_popen(*args, **kwargs):
        called["command"] = args[0]
        return _FakeProcess()
    monkeypatch.setattr(agent_module.subprocess, "Popen", fake_popen)
    result = agent._tool_execute_command({"command": "Get-ChildItem -Recurse -Filter *.txt"})
    assert result["ok"] is True
    assert called["command"] == "Get-ChildItem -Recurse -Filter *.txt"
 def test_execute_command_launch_requires_focus_verification(tmp_path: Path, monkeypatch) -> None:
    agent = _build_agent(tmp_path, monkeypatch)
    called: dict[str, Any] = {}
    class _FakeProcess:
        returncode = 0
        def poll(self) -> int:
            return 0
        def communicate(self, timeout: int = 2):
            return ("", "")
    def fake_popen(*args, **kwargs):
        called["command"] = args[0]
        return _FakeProcess()
    monkeypatch.setattr(agent_module.subprocess, "Popen", fake_popen)
    first = agent._dispatch_tool("execute_command", {"command": "start notepad"})
    assert first["ok"] is True
    assert first["background_launch_assumed"] is True
    assert first["focus_change_assumed"] is False
    assert first["verification_required"] is True
    assert first["verification_channels"] == ["get_active_window", "see_screen"]
    assert called["command"] == "start notepad"
    blocked = agent._dispatch_tool("execute_command", {"command": "start notepad"})
    assert blocked["blocked"] is True
    assert "get_active_window" in blocked["error"]
    observed = agent._dispatch_tool("get_active_window", {})
    assert observed["ok"] is True
    second = agent._dispatch_tool("execute_command", {"command": "start notepad"})
    assert second["ok"] is True
 def test_system_prompt_emphasizes_situational_awareness() -> None:
    prompt = agent_module.SYSTEM_PROMPT
    assert "Maintain a live mental model" in prompt
    assert "classify -> choose control channel -> execute one meaningful transition -> verify" in prompt
    assert "First classify, then act." in prompt
    assert "Use see_screen at a balanced cadence" in prompt
    assert "get_active_window" in prompt
    assert "detect_dialog" in prompt
    assert "dialog_set_filename" in prompt
    assert "list_ui_elements" in prompt
    assert "clipboard_get" in prompt
    assert "Do not invent new subgoals" in prompt
    assert "verify-and-finish" in prompt
    assert "data.observed_result" in prompt
    assert "Treat command-launched apps or URLs as background" in prompt
    assert "#32770" in prompt
    assert "secure desktop" in prompt.lower()
 def test_observation_loop_prompt_pushes_action_or_finish() -> None:
    prompt = agent_module.build_observation_loop_prompt("Save as [#32770]", repeated_steps=3)
    assert "same stable window for 3 step(s)" in prompt
    assert "Save as [#32770]" in prompt
    assert "Do not keep calling broad observation tools" in prompt
    assert "native window/dialog/element tool" in prompt
    assert "Use enhance only if a small or text-heavy control must be read before acting." in prompt
    assert "#32770 dialog" in prompt
 def test_finish_likely_prompt_pushes_verification_then_completion() -> None:
    prompt = agent_module.build_finish_likely_prompt(
        'Save dialog closed and focus returned to "todo-demo.txt - Notepad". | Command verification confirms "todo-demo.txt" exists.',
        prohibited_key_combos={"ctrl+shift+s"},
    )
    assert "objective is likely already satisfied" in prompt
    assert "todo-demo.txt - Notepad" in prompt
    assert "call see_screen" in prompt
    assert "then call task_complete" in prompt
    assert "Do not reopen menus" in prompt
    assert "Prohibited key combos for this run: ctrl+shift+s." in prompt
 def test_initial_action_prompt_reinforces_observation_and_verification() -> None:
    prompt = agent_module.build_initial_action_prompt("Open calculator", {"ctrl+shift+s"})
    assert "JOB: Open calculator" in prompt
    assert "First classify the current UI state from the latest evidence." in prompt
    assert "Identify what changed since the last action or screen capture." in prompt
    assert "classify -> choose control channel -> execute one meaningful transition -> verify" in prompt
    assert "Prefer native window/dialog/element tools" in prompt
    assert "get_active_window plus detect_dialog" in prompt
    assert "click then see_screen" in prompt
    assert "Do not invent new subgoals" in prompt
    assert "Prefer non-visual verification when available" in prompt
    assert "wait_for_focus_change" in prompt
    assert "#32770 dialogs" in prompt
    assert "Prohibited key combos for this run: ctrl+shift+s." in prompt
    assert "do not re-capture the screen just to reconfirm an obvious large input area" in prompt
    assert 'task_complete(return=..., data={"observed_result": ...})' in prompt
 def test_no_tool_prompt_recovers_by_reobserving() -> None:
    prompt = agent_module.build_no_tool_prompt({"ctrl+shift+s"})
    assert "Recover by re-observing the current desktop state instead of guessing." in prompt
    assert "Start by classifying the surface." in prompt
    assert "get_active_window" in prompt
    assert "detect_dialog" in prompt
    assert "clipboard_get" in prompt
    assert "native window/dialog/element tools" in prompt
    assert "Do not assume execute_command launches changed the foreground window" in prompt
    assert "Prohibited key combos for this run: ctrl+shift+s." in prompt
    assert "If a modal, picker, or browser download/upload surface is likely" in prompt
 def test_blocked_action_prompt_reanchors_on_screen_state() -> None:
    prompt = agent_module.build_blocked_action_prompt("click", prohibited_key_combos={"ctrl+shift+s"})
    assert "The last action using click was blocked or unreliable." in prompt
    assert "Do not retry blindly." in prompt
    assert "classify the current surface" in prompt
    assert "detect_dialog" in prompt
    assert "dialog_set_filename" in prompt
    assert "get_active_window" in prompt
    assert "get_cursor_position before move_mouse or drag" in prompt
    assert "wait_for_focus_change" in prompt
    assert "secure desktop or UAC" in prompt
    assert "Switch strategy after the fresh classification" in prompt
    assert "Prohibited key combos for this run: ctrl+shift+s." in prompt
    assert "native control instead of pixels" in prompt
 def test_tool_schemas_include_completion_and_desktop_awareness_guidance(tmp_path: Path, monkeypatch) -> None:
    agent = _build_agent(tmp_path, monkeypatch)
    agent.prohibited_key_combos = {"ctrl+shift+s"}
    schemas = {tool["name"]: tool for tool in agent._tool_schemas()}
    assert "data.observed_result" in schemas["task_complete"]["description"]
    assert "before task_complete" in schemas["see_screen"]["description"]
    assert "text-heavy targets" in schemas["enhance"]["description"]
    assert "verify copy or cut results" in schemas["clipboard_get"]["description"]
    assert "pointer state matters" in schemas["get_cursor_position"]["description"]
    assert "verify focus and active app" in schemas["get_active_window"]["description"]
    assert "foreground focus" in schemas["execute_command"]["description"]
    assert "Prohibited for this run: ctrl+shift+s." in schemas["press_key"]["description"]
    assert "dialog classification" in schemas["get_active_window"]["description"]
    assert "visible top-level windows" in schemas["list_windows"]["description"]
    assert "#32770 or picker surface" in schemas["detect_dialog"]["description"]
    assert "filename or path field" in schemas["dialog_set_filename"]["description"]
    assert "native child controls" in schemas["list_ui_elements"]["description"]
 def test_tool_schemas_hide_optional_native_tools_when_mode_off(tmp_path: Path, monkeypatch) -> None:
    agent = _build_agent(tmp_path, monkeypatch)
    agent.options.native_automation_mode = "off"
    schemas = {tool["name"]: tool for tool in agent._tool_schemas()}
    assert "get_active_window" in schemas
    assert "list_windows" not in schemas
    assert "detect_dialog" not in schemas
    assert "list_ui_elements" not in schemas
 def test_list_windows_returns_structured_surface_metadata(tmp_path: Path, monkeypatch) -> None:
    agent = _build_agent(tmp_path, monkeypatch)
    monkeypatch.setattr(
        agent,
        "_list_windows_info",
        lambda visible_only=True: [
            {
                "available": True,
                "hwnd": 111,
                "title": "Open",
                "class_name": "#32770",
                "executable_name": "notepad.exe",
                "surface_kind": "file_dialog",
                "dialog_kind": "file_open",
            }
        ],
    )
    monkeypatch.setattr(
        agent,
        "_get_active_window_info",
        lambda: {
            "available": True,
            "hwnd": 111,
            "title": "Open",
            "class_name": "#32770",
            "executable_name": "notepad.exe",
        },
    )
    result = agent._tool_list_windows({})
    assert result["ok"] is True
    assert result["count"] == 1
    assert result["surface_kind"] == "file_dialog"
    assert result["dialog_kind"] == "file_open"
    assert result["recommended_next_tools"][0] == "dialog_set_filename"
 def test_detect_dialog_returns_buttons_and_target_handle(tmp_path: Path, monkeypatch) -> None:
    agent = _build_agent(tmp_path, monkeypatch)
    monkeypatch.setattr(
        agent,
        "_find_dialog_info",
        lambda title_contains="": {
            "available": True,
            "hwnd": 222,
            "title": "Save as",
            "class_name": "#32770",
            "executable_name": "notepad.exe",
        },
    )
    monkeypatch.setattr(
        agent,
        "_get_active_window_info",
        lambda: {
            "available": True,
            "hwnd": 222,
            "title": "Save as",
            "class_name": "#32770",
            "executable_name": "notepad.exe",
        },
    )
    monkeypatch.setattr(
        agent,
        "_list_ui_elements_for_window",
        lambda hwnd, include_hidden=False: [
            {
                "handle": 10,
                "role": "button",
                "text": "Save",
                "target": {"type": "ui_element", "handle": 10, "window_handle": hwnd},
            }
        ],
    )
    result = agent._tool_detect_dialog({})
    assert result["ok"] is True
    assert result["dialog_kind"] == "file_save"
    assert result["target"]["type"] == "dialog"
    assert result["buttons"][0]["text"] == "Save"
 def test_notepad_save_pattern_enters_finish_likely_mode(tmp_path: Path, monkeypatch) -> None:
    events: list[dict[str, object]] = []
    agent = _build_agent(tmp_path, monkeypatch)
    agent.event_callback = events.append
    agent.objective = "Open Notepad, type a short to-do list, save it as todo-demo.txt in Documents"
    agent.finish_likely_state["target_filename"] = agent._infer_target_filename(agent.objective)
    agent.last_observed_window = {
        "available": True,
        "title": "Save as",
        "class_name": "#32770",
    }
    agent.step = 24
    window_result = agent._update_finish_likely_from_tool(
        "get_active_window",
        {},
        {
            "ok": True,
            "window": {
                "available": True,
                "title": "todo-demo.txt - Notepad",
                "class_name": "Notepad",
            },
        },
    )
    assert agent.finish_likely_state["active"] is False
    assert [item["kind"] for item in window_result["completion_evidence"]] == [
        "active_window_title_matches_target",
        "save_dialog_closed_to_target_window",
    ]
    agent.last_visual_signature = "stable-post-save"
    agent.step = 25
    command_result = agent._update_finish_likely_from_tool(
        "execute_command",
        {"command": "powershell -NoProfile -Command \"Test-Path ... todo-demo.txt\""},
        {
            "ok": True,
            "exit_code": 0,
            "stdout": r"C:\Users\paulw\Documents\todo-demo.txt",
        },
    )
    assert agent.finish_likely_state["active"] is True
    assert agent.finish_likely_state["summary"]
    assert command_result["finish_likely"]["target_filename"] == "todo-demo.txt"
    assert any(event["event_type"] == "completion_evidence" for event in events)
    assert any(event["event_type"] == "finish_likely" for event in events)
 def test_finish_likely_guard_blocks_reopening_menu_after_fresh_verification(tmp_path: Path, monkeypatch) -> None:
    agent = _build_agent(tmp_path, monkeypatch)
    agent.objective = "Open Notepad, type a short to-do list, save it as todo-demo.txt in Documents"
    agent.finish_likely_state.update(
        {
            "active": True,
            "activated_at_step": 24,
            "target_filename": "todo-demo.txt",
            "summary": 'Save dialog closed and focus returned to "todo-demo.txt - Notepad". | Command verification confirms "todo-demo.txt" exists.',
            "fresh_verification_done": False,
            "verification_step": 0,
            "post_completion_visual_signature": "",
        }
    )
    agent.step = 25
    verify_result = agent._dispatch_tool("see_screen", {})
    assert verify_result["ok"] is True
    assert verify_result["finish_likely_verification_done"] is True
    assert agent.finish_likely_state["fresh_verification_done"] is True
    blocked = agent._dispatch_tool("press_key", {"key": "alt+f"})
    assert blocked["ok"] is False
    assert blocked["blocked"] is True
    assert blocked["blocked_reason"] == "finish_likely"
    assert "appears satisfied" in blocked["error"]
    assert "reopen menus" in blocked["hint"].lower()
 def test_dispatch_rejects_unknown_and_disabled_tools(tmp_path: Path, monkeypatch) -> None:
    agent = _build_agent(tmp_path, monkeypatch)
    agent.disabled_tools = {"scroll"}
    assert agent._dispatch_tool("unknown_tool", {}) == {"ok": False, "error": "Unknown tool: unknown_tool"}
    assert agent._dispatch_tool("scroll", {}) == {"ok": False, "error": "Tool 'scroll' is disabled for this job."}
 def test_tool_schemas_filter_disabled_tools(tmp_path: Path, monkeypatch) -> None:
    agent = _build_agent(tmp_path, monkeypatch)
    agent.disabled_tools = {"scroll", "clipboard_get"}
    tool_names = {tool["name"] for tool in agent._tool_schemas()}
    assert "scroll" not in tool_names
    assert "clipboard_get" not in tool_names
    assert "click" in tool_names
    assert "task_complete" in tool_names
 def test_normalize_disabled_tools_rejects_invalid_and_required_names() -> None:
    with pytest.raises(ValueError, match="Unknown disabled tool"):
        agent_module.normalize_disabled_tools(["not_a_real_tool"])
    with pytest.raises(ValueError, match="Cannot disable required tool"):
        agent_module.normalize_disabled_tools(["task_complete"])
--- a/tests/test_cli.py
+++ b/tests/test_cli.py
@@ -20,6 +20,7 @@ def test_cli_emits_structured_return_and_data(monkeypatch: Any, capsys, tmp_path
        port=8787,
        runs_dir=tmp_path / "runs",
        db_path=tmp_path / "screenjob.db",
        prohibited_key_combos=("ctrl+shift+s",),
    )
    config.runs_dir.mkdir(parents=True, exist_ok=True)
@@ -71,3 +72,11 @@ def test_cli_emits_structured_return_and_data(monkeypatch: Any, capsys, tmp_path
    assert payload["data"] == "file1.txt\nfile2.txt"
    assert captured_kwargs["options"].reasoning_effort == "medium"
    assert captured_kwargs["options"].screen_context_decay_steps == 4
    assert captured_kwargs["options"].max_visual_context_images == 3
    assert captured_kwargs["options"].native_automation_mode == "prefer"
    assert captured_kwargs["options"].dialog_timeout_seconds == 12.0
    assert captured_kwargs["options"].focus_timeout_seconds == 8.0
    assert captured_kwargs["options"].ui_element_timeout_seconds == 8.0
    assert captured_kwargs["options"].max_retries_per_surface == 3
    assert captured_kwargs["options"].pretty_logs is False
    assert captured_kwargs["options"].prohibited_key_combos == {"ctrl+shift+s"}
--- a/tests/test_desktop_overlay.py
+++ b/tests/test_desktop_overlay.py
@@ -0,0 +1,149 @@
 from __future__ import annotations
 import types
 from collections import deque
 from typing import Any
 from src.desktop_overlay import CompletionOverlayPayload, DesktopOverlayManager
 class _FakeWidget:
    def __init__(self, root: "_FakeTk", *, width: int = 360, height: int = 160) -> None:
        self._root = root
        self._width = width
        self._height = height
        self._exists = True
        self._after_ids: dict[str, tuple[int, Any]] = {}
    def withdraw(self) -> None:
        return None
    def overrideredirect(self, *_args: Any, **_kwargs: Any) -> None:
        return None
    def attributes(self, *_args: Any, **_kwargs: Any) -> None:
        return None
    def configure(self, *_args: Any, **_kwargs: Any) -> None:
        return None
    def pack(self, *_args: Any, **_kwargs: Any) -> None:
        return None
    def place(self, *_args: Any, **_kwargs: Any) -> None:
        return None
    def update_idletasks(self) -> None:
        return None
    def winfo_width(self) -> int:
        return self._width
    def winfo_height(self) -> int:
        return self._height
    def winfo_exists(self) -> bool:
        return self._exists
    def geometry(self, *_args: Any, **_kwargs: Any) -> None:
        return None
    def deiconify(self) -> None:
        return None
    def destroy(self) -> None:
        self._exists = False
    def after(self, delay_ms: int, callback: Any) -> str:
        after_id = self._root._schedule(delay_ms, callback)
        self._after_ids[after_id] = (delay_ms, callback)
        return after_id
    def after_cancel(self, after_id: str) -> None:
        self._after_ids.pop(after_id, None)
        self._root._cancel(after_id)
 class _FakeButton(_FakeWidget):
    def __init__(self, root: "_FakeTk", command: Any | None = None, **_kwargs: Any) -> None:
        super().__init__(root)
        self.command = command
 class _FakeTk(_FakeWidget):
    def __init__(self) -> None:
        super().__init__(self)
        self._events: deque[tuple[str, int, Any]] = deque()
        self._event_seq = 0
        self.scheduled_delays: list[int] = []
        self.cards: list[_FakeWidget] = []
    def withdraw(self) -> None:
        return None
    def winfo_screenwidth(self) -> int:
        return 1920
    def _schedule(self, delay_ms: int, callback: Any) -> str:
        after_id = f"after-{self._event_seq}"
        self._event_seq += 1
        self.scheduled_delays.append(delay_ms)
        self._events.append((after_id, delay_ms, callback))
        return after_id
    def _cancel(self, after_id: str) -> None:
        self._events = deque(event for event in self._events if event[0] != after_id)
    def mainloop(self) -> None:
        iterations = 0
        while self._events and iterations < 20:
            after_id, _delay_ms, callback = self._events.popleft()
            iterations += 1
            callback()
            if any(not card.winfo_exists() for card in self.cards):
                return
 class _FakeTkModule(types.SimpleNamespace):
    def __init__(self, root: _FakeTk) -> None:
        super().__init__()
        self._root = root
    def Tk(self) -> _FakeTk:
        return self._root
    def Toplevel(self, _root: _FakeTk) -> _FakeWidget:
        card = _FakeWidget(self._root)
        self._root.cards.append(card)
        return card
    def Frame(self, root: _FakeWidget, **_kwargs: Any) -> _FakeWidget:
        return _FakeWidget(root._root)
    def Label(self, root: _FakeWidget, **_kwargs: Any) -> _FakeWidget:
        return _FakeWidget(root._root)
    def Button(self, root: _FakeWidget, command: Any | None = None, **_kwargs: Any) -> _FakeButton:
        return _FakeButton(root._root, command=command)
 def test_completion_overlay_auto_dismisses(monkeypatch: Any) -> None:
    root = _FakeTk()
    fake_tk = _FakeTkModule(root)
    monkeypatch.setitem(__import__("sys").modules, "tkinter", fake_tk)
    manager = DesktopOverlayManager(auto_dismiss_seconds=0.01)
    manager._queue.put(
        CompletionOverlayPayload(
            job_id="job-123",
            objective="Write a report",
            return_message="Finished",
            steps=5,
            elapsed_seconds=12.4,
        )
    )
    manager._ui_main()
    assert any(delay == 10 for delay in root.scheduled_delays)
    assert root.cards[0]._exists is False
--- a/tests/test_server_api.py
+++ b/tests/test_server_api.py
@@ -46,6 +46,13 @@ class FakeJobManager:
        click_pause: float = 0.10,
        reasoning_effort: str = "medium",
        screen_context_decay_steps: int = 4,
        max_visual_context_images: int = 3,
        native_automation_mode: str = "prefer",
        dialog_timeout_seconds: float = 12.0,
        focus_timeout_seconds: float = 8.0,
        ui_element_timeout_seconds: float = 8.0,
        max_retries_per_surface: int = 3,
        pretty_logs: bool = False,
        disabled_tools: list[str] | None = None,
        safety_override: bool = False,
        no_failsafe: bool = False,
@@ -69,6 +76,13 @@ class FakeJobManager:
            "click_pause": click_pause,
            "reasoning_effort": reasoning_effort,
            "screen_context_decay_steps": screen_context_decay_steps,
            "max_visual_context_images": max_visual_context_images,
            "native_automation_mode": native_automation_mode,
            "dialog_timeout_seconds": dialog_timeout_seconds,
            "focus_timeout_seconds": focus_timeout_seconds,
            "ui_element_timeout_seconds": ui_element_timeout_seconds,
            "max_retries_per_surface": max_retries_per_surface,
            "pretty_logs": pretty_logs,
            "no_failsafe": no_failsafe,
        }
        self._jobs[job_id] = {
@@ -293,6 +307,7 @@ def _build_app(tmp_path: Path, monkeypatch: Any, disable_ui: bool = False):
        port=8787,
        runs_dir=tmp_path / "runs",
        db_path=tmp_path / "screenjob_test.db",
        prohibited_key_combos=("ctrl+shift+s",),
    )
    config.runs_dir.mkdir(parents=True, exist_ok=True)
    app = server_module.create_app(config)
@@ -326,6 +341,13 @@ def test_create_job_returns_only_job_id_and_defaults_model(tmp_path: Path, monke
    assert manager.last_submit_payload["disabled_tools"] == ["click"]
    assert manager.last_submit_payload["reasoning_effort"] == "medium"
    assert manager.last_submit_payload["screen_context_decay_steps"] == 4
    assert manager.last_submit_payload["max_visual_context_images"] == 3
    assert manager.last_submit_payload["native_automation_mode"] == "prefer"
    assert manager.last_submit_payload["dialog_timeout_seconds"] == 12.0
    assert manager.last_submit_payload["focus_timeout_seconds"] == 8.0
    assert manager.last_submit_payload["ui_element_timeout_seconds"] == 8.0
    assert manager.last_submit_payload["max_retries_per_surface"] == 3
    assert manager.last_submit_payload["pretty_logs"] is False
    status_res = client.get(f"/api/jobs/{job_id}/status", headers=headers)
    assert status_res.status_code == 200
@@ -334,6 +356,36 @@ def test_create_job_returns_only_job_id_and_defaults_model(tmp_path: Path, monke
    assert "data" in status_res.json()["response"]
 def test_create_job_rejects_invalid_disabled_tool_names(tmp_path: Path, monkeypatch: Any) -> None:
    app, _ = _build_app(tmp_path, monkeypatch, disable_ui=False)
    client = TestClient(app)
    headers = {"Authorization": "Bearer test_token"}
    response = client.post(
        "/api/jobs",
        headers=headers,
        json={"job": "Open amazon.de", "disabled_tools": ["not_a_real_tool"], "safety_override": True},
    )
    assert response.status_code == 400
    assert "Unknown disabled tool" in response.json()["detail"]
 def test_create_job_rejects_disabling_task_complete(tmp_path: Path, monkeypatch: Any) -> None:
    app, _ = _build_app(tmp_path, monkeypatch, disable_ui=False)
    client = TestClient(app)
    headers = {"Authorization": "Bearer test_token"}
    response = client.post(
        "/api/jobs",
        headers=headers,
        json={"job": "Open amazon.de", "disabled_tools": ["task_complete"], "safety_override": True},
    )
    assert response.status_code == 400
    assert "Cannot disable required tool" in response.json()["detail"]
 def test_cancel_endpoint_and_events(tmp_path: Path, monkeypatch: Any) -> None:
    app, _ = _build_app(tmp_path, monkeypatch, disable_ui=False)
    client = TestClient(app)
--- a/tests/test_task_manager.py
+++ b/tests/test_task_manager.py
@@ -0,0 +1,238 @@
 from __future__ import annotations
 import threading
 from pathlib import Path
 from typing import Any
 import src.task_manager as task_manager_module
 from src.config import AppConfig
 from src.models import AgentResult, RunArtifacts, UsageSummary
 from src.storage import HistoryDB
 from src.task_manager import JobManager
 class _OverlayRecorder:
    def __init__(self) -> None:
        self.calls: list[dict[str, Any]] = []
    def show_completion(self, **kwargs: Any) -> None:
        self.calls.append(kwargs)
 def _build_manager(tmp_path: Path, overlay_manager: _OverlayRecorder) -> tuple[JobManager, HistoryDB, AppConfig]:
    config = AppConfig(
        openai_api_key="test-key",
        screenjob_token="test-token",
        disable_ui=False,
        default_model="gpt-5.4-mini",
        safety_model="gpt-5.4-mini",
        host="127.0.0.1",
        port=8787,
        runs_dir=tmp_path / "runs",
        db_path=tmp_path / "screenjob.db",
    )
    db = HistoryDB(config.db_path)
    manager = JobManager(config=config, db=db, overlay_manager=overlay_manager)
    return manager, db, config
 def _artifacts(tmp_path: Path) -> RunArtifacts:
    root = tmp_path / "run_artifacts"
    return RunArtifacts(
        run_id="test_run",
        root_dir=root,
        logs_dir=root / "logs",
        shots_dir=root / "shots",
        enhance_dir=root / "enhanced",
        log_file=root / "logs" / "screenjob.log",
    )
 def _create_job(db: HistoryDB, job_id: str, objective: str) -> None:
    db.create_job(
        job_id=job_id,
        objective=objective,
        model="gpt-5.4-mini",
        created_at="2026-05-30T12:00:00+00:00",
        safety_override=True,
        disabled_tools=[],
    )
 def test_completed_job_triggers_desktop_overlay(tmp_path: Path, monkeypatch) -> None:
    overlay = _OverlayRecorder()
    manager, db, _config = _build_manager(tmp_path, overlay)
    job_id = "job_overlay_complete"
    objective = "Save todo-demo.txt in Documents"
    _create_job(db, job_id, objective)
    result = AgentResult(
        completed=True,
        result="Saved todo-demo.txt",
        return_message="Saved todo-demo.txt",
        data={"observed_result": "todo-demo.txt - Notepad is visible"},
        steps=11,
        started_at=100.0,
        ended_at=112.6,
        usage=UsageSummary(),
    )
    monkeypatch.setattr(task_manager_module, "run_job", lambda **_kwargs: (result, _artifacts(tmp_path)))
    manager._execute_job(
        job_id=job_id,
        objective=objective,
        model="gpt-5.4-mini",
        disabled_tools=[],
        safety_override=True,
        max_steps=60,
        command_timeout=45,
        type_interval=0.02,
        click_pause=0.10,
        reasoning_effort="medium",
        screen_context_decay_steps=4,
        max_visual_context_images=3,
        native_automation_mode="prefer",
        dialog_timeout_seconds=12.0,
        focus_timeout_seconds=8.0,
        ui_element_timeout_seconds=8.0,
        max_retries_per_surface=3,
        pretty_logs=False,
        no_failsafe=False,
        cancel_event=threading.Event(),
    )
    assert overlay.calls == [
        {
            "job_id": job_id,
            "objective": objective,
            "return_message": "Saved todo-demo.txt",
            "steps": 11,
            "elapsed_seconds": 12.599999999999994,
        }
    ]
    assert db.get_job(job_id)["status"] == "completed"
 def test_non_completed_jobs_do_not_trigger_desktop_overlay(tmp_path: Path, monkeypatch) -> None:
    overlay = _OverlayRecorder()
    manager, db, _config = _build_manager(tmp_path, overlay)
    failed_job_id = "job_overlay_failed"
    _create_job(db, failed_job_id, "Fail intentionally")
    failed_result = AgentResult(
        completed=False,
        result="Failure",
        return_message="Failure",
        data=None,
        steps=7,
        started_at=10.0,
        ended_at=18.0,
        usage=UsageSummary(),
        error="Failure",
    )
    monkeypatch.setattr(task_manager_module, "run_job", lambda **_kwargs: (failed_result, _artifacts(tmp_path)))
    manager._execute_job(
        job_id=failed_job_id,
        objective="Fail intentionally",
        model="gpt-5.4-mini",
        disabled_tools=[],
        safety_override=True,
        max_steps=60,
        command_timeout=45,
        type_interval=0.02,
        click_pause=0.10,
        reasoning_effort="medium",
        screen_context_decay_steps=4,
        max_visual_context_images=3,
        native_automation_mode="prefer",
        dialog_timeout_seconds=12.0,
        focus_timeout_seconds=8.0,
        ui_element_timeout_seconds=8.0,
        max_retries_per_surface=3,
        pretty_logs=False,
        no_failsafe=False,
        cancel_event=threading.Event(),
    )
    cancelled_job_id = "job_overlay_cancelled"
    _create_job(db, cancelled_job_id, "Cancel intentionally")
    cancelled_result = AgentResult(
        completed=False,
        result="Cancelled",
        return_message="Cancelled",
        data=None,
        steps=4,
        started_at=20.0,
        ended_at=23.0,
        usage=UsageSummary(),
        error="Cancelled",
        cancelled=True,
    )
    monkeypatch.setattr(task_manager_module, "run_job", lambda **_kwargs: (cancelled_result, _artifacts(tmp_path)))
    manager._execute_job(
        job_id=cancelled_job_id,
        objective="Cancel intentionally",
        model="gpt-5.4-mini",
        disabled_tools=[],
        safety_override=True,
        max_steps=60,
        command_timeout=45,
        type_interval=0.02,
        click_pause=0.10,
        reasoning_effort="medium",
        screen_context_decay_steps=4,
        max_visual_context_images=3,
        native_automation_mode="prefer",
        dialog_timeout_seconds=12.0,
        focus_timeout_seconds=8.0,
        ui_element_timeout_seconds=8.0,
        max_retries_per_surface=3,
        pretty_logs=False,
        no_failsafe=False,
        cancel_event=threading.Event(),
    )
    assert overlay.calls == []
 def test_rejected_job_does_not_trigger_desktop_overlay(tmp_path: Path, monkeypatch) -> None:
    overlay = _OverlayRecorder()
    manager, db, _config = _build_manager(tmp_path, overlay)
    job_id = "job_overlay_rejected"
    _create_job(db, job_id, "Do something unsafe")
    monkeypatch.setattr(task_manager_module, "create_openai_client", lambda *_args, **_kwargs: object())
    monkeypatch.setattr(
        task_manager_module,
        "assess_task_safety",
        lambda *_args, **_kwargs: (False, "Unsafe request", {"decision": "blocked"}),
    )
    manager._execute_job(
        job_id=job_id,
        objective="Do something unsafe",
        model="gpt-5.4-mini",
        disabled_tools=[],
        safety_override=False,
        max_steps=60,
        command_timeout=45,
        type_interval=0.02,
        click_pause=0.10,
        reasoning_effort="medium",
        screen_context_decay_steps=4,
        max_visual_context_images=3,
        native_automation_mode="prefer",
        dialog_timeout_seconds=12.0,
        focus_timeout_seconds=8.0,
        ui_element_timeout_seconds=8.0,
        max_retries_per_surface=3,
        pretty_logs=False,
        no_failsafe=False,
        cancel_event=threading.Event(),
    )
    assert overlay.calls == []
    events = db.get_job_events(job_id)
    assert events[-1]["event_type"] == "job_rejected"
--- a/tray_service_control.ps1
+++ b/tray_service_control.ps1
@@ -0,0 +1,53 @@
 [CmdletBinding()]
 param(
    [ValidateSet("start", "stop", "restart")]
    [string]$Action,
    [string]$ServiceName = "ScreenJobBackend"
 )
 Set-StrictMode -Version Latest
 $ErrorActionPreference = "Stop"
 function Wait-ForStatus {
    param(
        [Parameter(Mandatory = $true)]$Service,
        [Parameter(Mandatory = $true)][System.ServiceProcess.ServiceControllerStatus]$TargetStatus,
        [int]$TimeoutSeconds = 20
    )
    $deadline = (Get-Date).AddSeconds($TimeoutSeconds)
    while ((Get-Date) -lt $deadline) {
        $Service.Refresh()
        if ($Service.Status -eq $TargetStatus) {
            return
        }
        Start-Sleep -Milliseconds 350
    }
    throw "Timed out waiting for service '$($Service.ServiceName)' to reach status '$TargetStatus'."
 }
 $service = Get-Service -Name $ServiceName -ErrorAction Stop
 switch ($Action) {
    "start" {
        if ($service.Status -ne [System.ServiceProcess.ServiceControllerStatus]::Running) {
            Start-Service -Name $ServiceName -ErrorAction Stop
            Wait-ForStatus -Service $service -TargetStatus ([System.ServiceProcess.ServiceControllerStatus]::Running)
        }
    }
    "stop" {
        if ($service.Status -ne [System.ServiceProcess.ServiceControllerStatus]::Stopped) {
            Stop-Service -Name $ServiceName -Force -ErrorAction Stop
            Wait-ForStatus -Service $service -TargetStatus ([System.ServiceProcess.ServiceControllerStatus]::Stopped)
        }
    }
    "restart" {
        if ($service.Status -eq [System.ServiceProcess.ServiceControllerStatus]::Running) {
            Restart-Service -Name $ServiceName -Force -ErrorAction Stop
        } else {
            Start-Service -Name $ServiceName -ErrorAction Stop
        }
        Wait-ForStatus -Service $service -TargetStatus ([System.ServiceProcess.ServiceControllerStatus]::Running)
    }
 }
--- a/uninstall_backend_service.ps1
+++ b/uninstall_backend_service.ps1
@@ -0,0 +1,45 @@
 [CmdletBinding(SupportsShouldProcess = $true)]
 param(
    [switch]$AllUsers,
    [string]$ServiceName = "ScreenJobBackend"
 )
 Set-StrictMode -Version Latest
 $ErrorActionPreference = "Stop"
 $scriptDir = Split-Path -Parent $PSCommandPath
 $shortcutName = "ScreenJob Backend.lnk"
 $startupFolder = if ($AllUsers) {
    [Environment]::GetFolderPath("CommonStartup")
 } else {
    [Environment]::GetFolderPath("Startup")
 }
 $shortcutPath = Join-Path $startupFolder $shortcutName
 $service = Get-Service -Name $ServiceName -ErrorAction SilentlyContinue
 if ($null -ne $service) {
    if ($PSCmdlet.ShouldProcess($ServiceName, "Remove legacy Windows service")) {
        if ($service.Status -ne "Stopped") {
            Stop-Service -Name $ServiceName -Force -ErrorAction Stop
        }
        & sc.exe delete $ServiceName | Out-Null
        if ($LASTEXITCODE -ne 0) {
            throw "Failed to delete service '$ServiceName' (sc.exe exit code $LASTEXITCODE)."
        }
        Write-Host "Removed legacy Windows service: $ServiceName"
    }
 }
 if (Test-Path -LiteralPath $shortcutPath) {
    if ($PSCmdlet.ShouldProcess($shortcutPath, "Remove backend startup shortcut")) {
        Remove-Item -LiteralPath $shortcutPath -Force
        Write-Host "Removed backend startup shortcut: $shortcutPath"
    }
 } else {
    Write-Host "No backend startup shortcut found at: $shortcutPath"
 }
 Write-Host "Backend launcher uninstalled successfully." -ForegroundColor Green
Author	SHA1	Message	Date
Space-Banane	4123765aba	Commit remaining workspace updates Some checks failed CI / test (push) Failing after 8s Details	2026-05-31 20:43:36 +02:00
Space-Banane	79c9e98842	Switch backend startup to interactive session	2026-05-31 20:43:36 +02:00
Luna	a521142b89	docs: add patience rule for rerunning jobs All checks were successful CI / test (push) Successful in 8s Details	2026-05-31 18:35:35 +00:00
Space-Banane	880bfb1c70	Fix tray health detection and harden backend service startup All checks were successful CI / test (push) Successful in 7s Details	2026-05-28 13:44:31 +02:00
Space-Banane	114ddd80d6	Add Windows service host and system tray controller All checks were successful CI / test (push) Successful in 7s Details	2026-05-28 13:30:27 +02:00