Compare commits

..

16 Commits

Author SHA1 Message Date
Space-Banane
f0058d1057 feat: add support for Windows-only tools and enhance platform checks
Some checks failed
CI / test (push) Failing after 10s
2026-05-31 21:02:56 +02:00
Space-Banane
d514fe161c docs: update context compaction prompt with observe-decide-act-verify loop
Some checks failed
CI / test (push) Failing after 8s
2026-05-31 20:52:49 +02:00
Space-Banane
4123765aba Commit remaining workspace updates
Some checks failed
CI / test (push) Failing after 8s
2026-05-31 20:43:36 +02:00
Space-Banane
79c9e98842 Switch backend startup to interactive session 2026-05-31 20:43:36 +02:00
a521142b89 docs: add patience rule for rerunning jobs
All checks were successful
CI / test (push) Successful in 8s
2026-05-31 18:35:35 +00:00
Space-Banane
880bfb1c70 Fix tray health detection and harden backend service startup
All checks were successful
CI / test (push) Successful in 7s
2026-05-28 13:44:31 +02:00
Space-Banane
114ddd80d6 Add Windows service host and system tray controller
All checks were successful
CI / test (push) Successful in 7s
2026-05-28 13:30:27 +02:00
314311d8fc Merge pull request 'Add lightweight analytics dashboard' (#1) from feat/lightweight-dash into master
All checks were successful
CI / test (push) Successful in 7s
Reviewed-on: #1
2026-05-27 22:50:08 +02:00
Space-Banane
8126b57404 Add lightweight analytics dashboard
All checks were successful
CI / test (push) Successful in 7s
CI / test (pull_request) Successful in 7s
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-27 22:34:26 +02:00
Space-Banane
cceed18cf1 feat: (literally) "enhance" functionality with new parameters and improved image processing
All checks were successful
CI / test (push) Successful in 7s
2026-05-27 22:14:32 +02:00
Space-Banane
880468ef02 Mark completed P1 TODO items as done 2026-05-27 22:05:57 +02:00
Space-Banane
b05a7be668 Compact screenshot context every 4 steps by default 2026-05-27 22:04:15 +02:00
Space-Banane
0c019474af Default model reasoning effort to medium 2026-05-27 22:02:20 +02:00
Space-Banane
a8ef8ee552 Split monitor UI into separate HTML and JS assets
All checks were successful
CI / test (push) Successful in 7s
2026-05-27 22:01:06 +02:00
Space-Banane
111a1e84af feat: implement replay functionality with UI controls and backend support 2026-05-27 21:57:37 +02:00
Space-Banane
620fcc4aa6 removed slop 2026-05-27 21:53:32 +02:00
34 changed files with 7902 additions and 425 deletions

5
.gitignore vendored
View File

@@ -20,3 +20,8 @@ screenjob.db
# IDE
.vscode/
.idea/
# Service host build/publish artifacts
service_host/**/bin/
service_host/**/obj/
service_host/publish/

107
README.md
View File

@@ -1,7 +1,7 @@
# ScreenJob
ScreenJob is an autonomous desktop-and-terminal execution service.
It lets an LLM use controlled local tools (screen, click, type, shell) to complete GUI-heavy tasks on a real computer.
It lets an LLM use controlled local tools (screen, mouse, keyboard, clipboard, shell) to complete GUI-heavy tasks on a real computer.
## What It Solves
@@ -15,7 +15,8 @@ It lets an LLM use controlled local tools (screen, click, type, shell) to comple
## Core Features
- Tool-based agent loop (`execute_command`, `see_screen`, `enhance`, `click`, `type`, `press_key`, `sleep`, `task_complete`)
- Hybrid control model: screenshot grounding plus Windows-native window, dialog, and UI-element helpers when available
- Tool-based agent loop (`execute_command`, `see_screen`, `enhance`, `list_windows`, `find_window`, `focus_window`, `close_window`, `wait_for_window`, `wait_for_focus_change`, `detect_dialog`, `dialog_action`, `dialog_set_filename`, `wait_for_dialog_close`, `list_ui_elements`, `invoke_ui_element`, `set_ui_element_value`, `select_ui_element`, `wait_for_ui_element`, `click`, `scroll`, `drag`, `move_mouse`, `type`, `press_key`, `clipboard_get`, `clipboard_set`, `get_cursor_position`, `get_active_window`, `sleep`, `task_complete`)
- Safety pre-check with override support
- Per-job tool disable list
- Live/final usage and cost estimates
@@ -109,6 +110,80 @@ Or use the PowerShell launcher:
.\start_backend.ps1
```
### Backend Startup
For screenshot-driven automation, start the backend in the logged-in user session.
That gives `pyautogui` access to the interactive desktop, which Windows services do not.
If you previously installed the legacy service, remove it once from an elevated PowerShell session with `.\uninstall_backend_service.ps1`.
Install a sign-in launcher for the current user:
```powershell
.\install_backend_service.ps1
```
Install it for all users:
```powershell
.\install_backend_service.ps1 -AllUsers
```
Start it immediately after installing:
```powershell
.\install_backend_service.ps1 -StartNow
```
Remove the launcher:
```powershell
.\uninstall_backend_service.ps1
```
The launcher runs `start_backend.ps1` hidden via `start_backend_hidden.vbs`.
If you need to start the backend manually, run:
```powershell
.\start_backend.ps1
```
The legacy Windows service host remains in the tree for reference, but it is not the recommended path for GUI tasks.
### System Tray Icon (Windows)
Start tray icon now:
```powershell
powershell -NoProfile -ExecutionPolicy Bypass -STA -File .\screenjob_tray.ps1
```
Install startup shortcut (current user):
```powershell
.\install_tray_startup_shortcut.ps1
```
Install startup shortcut for all users:
```powershell
.\install_tray_startup_shortcut.ps1 -AllUsers
```
Remove startup shortcut:
```powershell
.\install_tray_startup_shortcut.ps1 -Remove
```
Tray menu actions:
- The service controls are for the legacy Windows service host.
- Refresh service status
- Start/Stop/Restart service (prompts for admin/UAC)
- Open dashboard URL from `.env` `SCREENJOB_HOST` / `SCREENJOB_PORT`
- Open service logs folder
- Exit tray icon process
Auth for all API routes:
- `Authorization: Bearer <SCREENJOB_TOKEN>`
@@ -123,6 +198,11 @@ Auth for all API routes:
{
"job": "run \"ls -a\" in C:/Users/username/Documents and return output",
"model": "gpt-5.4-mini",
"native_automation_mode": "prefer",
"dialog_timeout_seconds": 12,
"focus_timeout_seconds": 8,
"ui_element_timeout_seconds": 8,
"max_retries_per_surface": 3,
"disabled_tools": [],
"safety_override": false
}
@@ -156,20 +236,39 @@ Each job payload includes:
- Read-only dashboard (no run controls)
- Requires token input
- Live updates via `/ws`
- Analytics dashboards for success rate by objective category and daily averages
- Set `DISABLE_UI=true` to disable UI
### Analytics API
- `GET /api/analytics`
- Returns objective-category success rates plus average steps/cost over time
## Agent Instructions (Practical)
- Prefer `execute_command` for deterministic actions (opening URLs, filesystem checks).
- First classify the current Windows surface, then choose the control channel.
- Prefer native window/dialog/element tools for focus changes, file pickers, modal confirmations, and browser-owned dialogs when available.
- Use `see_screen` before UI interaction.
- Use `enhance` when text is unclear.
- Use `enhance` before clicking small/ambiguous targets; prefer `region="small"` for compact controls.
- Use `enhance` `mode="text"` for tiny labels/text, or `mode="ui"` for general UI.
- Optionally set `enhance` `scale` (2-6) for tighter zoom control.
- Use `list_windows`, `find_window`, `focus_window`, and `wait_for_focus_change` instead of blind Alt+Tab retries.
- Use `detect_dialog`, `dialog_set_filename`, `dialog_action`, and `wait_for_dialog_close` for native open/save/confirm flows.
- Use `list_ui_elements`, `invoke_ui_element`, `set_ui_element_value`, `select_ui_element`, and `wait_for_ui_element` when controls are exposed natively.
- Use `press_key` for non-text keys (Enter, Tab, arrows, Escape).
- For shortcuts, use one `press_key` call with combo syntax (example: `win+r`).
- Use `click` offsets via `offset_up/down/left/right` and optional `sleep_after_seconds`.
- Use `click` offsets via `offset_up/down/left/right`; set `button` and `click_count` there instead of inventing one-off click tools.
- Use `move_mouse` when you need hover-only behavior and `drag` for slider, selection, or window moves.
- Use `scroll` for vertical navigation; positive amounts scroll up and negative amounts scroll down.
- Use `clipboard_get` / `clipboard_set` for copy-paste workflows, `get_cursor_position` for cursor inspection, and `get_active_window` before interacting with uncertain focus.
- If native automation is unavailable or disabled, ScreenJob falls back to screenshots plus mouse/keyboard control and emits fallback events.
- When done, call:
- `task_complete(return="...", data=...)`
- Before `task_complete`, verify expected on-screen content with `see_screen` (and `enhance` if needed), and include an `observed_result` summary in `data`.
Per-job `disabled_tools` must match the built-in tool allowlist. `task_complete` cannot be disabled.
`data` should contain useful structured output for the requester (text, object, list, etc.).
## Verification

View File

@@ -6,8 +6,10 @@ ScreenJob lets an agent execute tasks that require a real desktop UI plus termin
## Main Features
- Hybrid control model: screenshot grounding plus Windows-native window/dialog/element helpers when available
- Screen perception (`see_screen`, `enhance`)
- Mouse/keyboard control (`click`, `type`, `press_key`)
- Native window/dialog control (`list_windows`, `find_window`, `focus_window`, `detect_dialog`, `dialog_action`, `dialog_set_filename`, `list_ui_elements`)
- Terminal execution (`execute_command`, `sleep`)
- Structured completion payload (`task_complete(return=..., data=...)`)
- Safety gate, auth, history, and live monitoring
@@ -37,12 +39,33 @@ Keyboard combo rule:
- For shortcuts, use one `press_key` call with combo syntax, for example: `win+r`, `ctrl+shift+esc`.
- Do not split modifier combos into separate calls.
Enhance-first click rule:
- Before clicking small buttons/icons, dense UI, or ambiguous targets, call `enhance` first.
- Preferred preset for tiny controls: `enhance(coordinate, region="small", mode="ui")`.
- For tiny labels/text: use `mode="text"` to improve readability.
- Optional zoom control: set `scale` from `2` to `6` (defaults are tuned by region).
- After checking the enhanced image, click using the same target coordinate (or a small directional offset if needed).
Windows-native routing rule:
- First classify whether the current surface is a normal app window, browser window, `#32770` dialog, Explorer file picker, or another system surface.
- Prefer native window/dialog/element tools for focus changes, save/open dialogs, modal confirmations, and exposed controls.
- Fall back to screenshots plus mouse/keyboard only when native automation is unavailable or the UI is custom-drawn.
Verification rule:
- Before `task_complete`, verify actual on-screen content matches the expected outcome.
- Use `see_screen` (and `enhance` if needed) for this check.
- Include a concise `observed_result` in `data` when completing the task.
Patience / rerun rule:
- If a job is still `running`, do not assume it is stuck just because it looks slow, repetitive, or token-heavy.
- Prefer waiting longer and checking for a final status/result before starting a replacement run.
- Only restart or replace a running job when there is clear evidence it is failed, irrecoverably stuck, or the user explicitly asks for a restart.
- If you do replace a run, say why in one short sentence and reference the specific blocker you observed.
## API Quick Reference
Base URL:

View File

@@ -0,0 +1,84 @@
[CmdletBinding(SupportsShouldProcess = $true)]
param(
[switch]$Remove,
[switch]$AllUsers,
[switch]$StartNow
)
Set-StrictMode -Version Latest
$ErrorActionPreference = "Stop"
$scriptDir = Split-Path -Parent $PSCommandPath
$backendScript = Join-Path $scriptDir "start_backend.ps1"
$vbsLauncher = Join-Path $scriptDir "start_backend_hidden.vbs"
$shortcutName = "ScreenJob Backend.lnk"
if (-not (Test-Path -LiteralPath $backendScript)) {
throw "Backend launcher script not found: $backendScript"
}
if (-not (Test-Path -LiteralPath $vbsLauncher)) {
throw "Hidden backend launcher file not found: $vbsLauncher"
}
function Test-IsAdministrator {
$identity = [Security.Principal.WindowsIdentity]::GetCurrent()
$principal = New-Object Security.Principal.WindowsPrincipal($identity)
return $principal.IsInRole([Security.Principal.WindowsBuiltInRole]::Administrator)
}
$legacyService = Get-Service -Name "ScreenJobBackend" -ErrorAction SilentlyContinue
if ($null -ne $legacyService) {
if (Test-IsAdministrator) {
if ($PSCmdlet.ShouldProcess("ScreenJobBackend", "Remove legacy Windows service")) {
if ($legacyService.Status -ne "Stopped") {
Stop-Service -Name "ScreenJobBackend" -Force -ErrorAction Stop
}
& sc.exe delete ScreenJobBackend | Out-Null
if ($LASTEXITCODE -ne 0) {
throw "Failed to delete legacy service 'ScreenJobBackend' (sc.exe exit code $LASTEXITCODE)."
}
Write-Host "Removed legacy Windows service: ScreenJobBackend"
}
} else {
Write-Warning "Legacy Windows service 'ScreenJobBackend' is still installed. Run uninstall_backend_service.ps1 from an elevated PowerShell session once to remove it."
}
}
$startupFolder = if ($AllUsers) {
[Environment]::GetFolderPath("CommonStartup")
} else {
[Environment]::GetFolderPath("Startup")
}
$shortcutPath = Join-Path $startupFolder $shortcutName
if ($Remove) {
if (Test-Path -LiteralPath $shortcutPath) {
if ($PSCmdlet.ShouldProcess($shortcutPath, "Remove backend startup shortcut")) {
Remove-Item -LiteralPath $shortcutPath -Force
Write-Host "Removed backend startup shortcut: $shortcutPath"
}
} else {
Write-Host "No backend startup shortcut found at: $shortcutPath"
}
return
}
if ($PSCmdlet.ShouldProcess($shortcutPath, "Create backend startup shortcut")) {
$shell = New-Object -ComObject WScript.Shell
$shortcut = $shell.CreateShortcut($shortcutPath)
$shortcut.TargetPath = "$env:SystemRoot\System32\wscript.exe"
$shortcut.Arguments = '"' + $vbsLauncher + '"'
$shortcut.WorkingDirectory = $scriptDir
$shortcut.Description = "Launch ScreenJob backend at sign-in in the current user session."
$shortcut.Save()
Write-Host "Created backend startup shortcut: $shortcutPath"
}
if ($StartNow) {
Start-Process -FilePath "$env:SystemRoot\System32\wscript.exe" -ArgumentList @($vbsLauncher) -WorkingDirectory $scriptDir | Out-Null
Write-Host "Started backend launcher now."
}

View File

@@ -0,0 +1,47 @@
[CmdletBinding(SupportsShouldProcess = $true)]
param(
[switch]$Remove,
[switch]$AllUsers
)
Set-StrictMode -Version Latest
$ErrorActionPreference = "Stop"
$scriptDir = Split-Path -Parent $PSCommandPath
$vbsLauncher = Join-Path $scriptDir "start_screenjob_tray_hidden.vbs"
$shortcutName = "ScreenJob Tray.lnk"
if (-not (Test-Path -LiteralPath $vbsLauncher)) {
throw "Launcher file not found: $vbsLauncher"
}
$startupFolder = if ($AllUsers) {
[Environment]::GetFolderPath("CommonStartup")
} else {
[Environment]::GetFolderPath("Startup")
}
$shortcutPath = Join-Path $startupFolder $shortcutName
if ($Remove) {
if (Test-Path -LiteralPath $shortcutPath) {
if ($PSCmdlet.ShouldProcess($shortcutPath, "Remove startup shortcut")) {
Remove-Item -LiteralPath $shortcutPath -Force
Write-Host "Removed startup shortcut: $shortcutPath"
}
} else {
Write-Host "No startup shortcut found at: $shortcutPath"
}
return
}
if ($PSCmdlet.ShouldProcess($shortcutPath, "Create startup shortcut")) {
$shell = New-Object -ComObject WScript.Shell
$shortcut = $shell.CreateShortcut($shortcutPath)
$shortcut.TargetPath = "$env:SystemRoot\System32\wscript.exe"
$shortcut.Arguments = '"' + $vbsLauncher + '"'
$shortcut.WorkingDirectory = $scriptDir
$shortcut.Description = "Launch ScreenJob tray icon at sign-in."
$shortcut.Save()
Write-Host "Created startup shortcut: $shortcutPath"
}

307
screenjob_tray.ps1 Normal file
View File

@@ -0,0 +1,307 @@
param(
[string]$ServiceName = "ScreenJobBackend"
)
Set-StrictMode -Version Latest
$ErrorActionPreference = "Stop"
Add-Type -AssemblyName System.Windows.Forms
Add-Type -AssemblyName System.Drawing
$scriptDir = Split-Path -Parent $MyInvocation.MyCommand.Path
$controlScript = Join-Path $scriptDir "tray_service_control.ps1"
$logsDir = Join-Path $scriptDir "screenjob_runs\service"
$defaultHost = "127.0.0.1"
$defaultPort = "8787"
function Read-EnvConfig {
param([string]$EnvFilePath)
$result = @{}
if (-not (Test-Path -LiteralPath $EnvFilePath)) {
return $result
}
foreach ($line in Get-Content -Path $EnvFilePath) {
$trimmed = $line.Trim()
if ($trimmed.Length -eq 0 -or $trimmed.StartsWith("#")) {
continue
}
$parts = $trimmed.Split("=", 2)
if ($parts.Count -eq 2) {
$key = $parts[0].Trim()
$value = $parts[1].Trim()
if (($value.StartsWith('"') -and $value.EndsWith('"')) -or ($value.StartsWith("'") -and $value.EndsWith("'"))) {
$value = $value.Substring(1, $value.Length - 2)
}
$result[$key] = $value
}
}
return $result
}
function Get-ServiceStatusSafe {
param([string]$Name)
try {
$svc = Get-Service -Name $Name -ErrorAction Stop
return $svc.Status.ToString()
} catch {
return "NotInstalled"
}
}
function Invoke-ServiceActionElevated {
param(
[Parameter(Mandatory = $true)][string]$Action,
[Parameter(Mandatory = $true)][string]$Name
)
if (-not (Test-Path -LiteralPath $controlScript)) {
[System.Windows.Forms.MessageBox]::Show(
"Missing control script: $controlScript",
"ScreenJob Tray",
[System.Windows.Forms.MessageBoxButtons]::OK,
[System.Windows.Forms.MessageBoxIcon]::Error
) | Out-Null
return
}
$argList = @(
"-NoProfile",
"-ExecutionPolicy", "Bypass",
"-File", "`"$controlScript`"",
"-Action", $Action,
"-ServiceName", $Name
)
try {
Start-Process -FilePath "powershell.exe" -ArgumentList $argList -Verb RunAs -WindowStyle Hidden | Out-Null
} catch {
# User canceled UAC prompt or launch failed.
}
}
function Get-DashboardUrl {
$envFile = Join-Path $scriptDir ".env"
$envVars = Read-EnvConfig -EnvFilePath $envFile
$dashboardHost = $defaultHost
$dashboardPort = $defaultPort
if ($envVars.ContainsKey("SCREENJOB_HOST") -and -not [string]::IsNullOrWhiteSpace($envVars["SCREENJOB_HOST"])) {
$dashboardHost = $envVars["SCREENJOB_HOST"]
}
if ($envVars.ContainsKey("SCREENJOB_PORT") -and -not [string]::IsNullOrWhiteSpace($envVars["SCREENJOB_PORT"])) {
$dashboardPort = $envVars["SCREENJOB_PORT"]
}
$connectHost = Resolve-ConnectHost -ConfiguredHost $dashboardHost
return "http://{0}:{1}/" -f $connectHost, $dashboardPort
}
function Resolve-ConnectHost {
param([string]$ConfiguredHost)
if ([string]::IsNullOrWhiteSpace($ConfiguredHost)) {
return "127.0.0.1"
}
switch ($ConfiguredHost.Trim().ToLowerInvariant()) {
"0.0.0.0" { return "127.0.0.1" }
"::" { return "127.0.0.1" }
"*" { return "127.0.0.1" }
default { return $ConfiguredHost }
}
}
function Get-HealthCheckHosts {
param([string]$ConfiguredHost)
if ([string]::IsNullOrWhiteSpace($ConfiguredHost)) {
return @("127.0.0.1", "localhost")
}
$normalized = $ConfiguredHost.Trim().ToLowerInvariant()
switch ($normalized) {
"0.0.0.0" { return @("127.0.0.1", "localhost", "::1") }
"::" { return @("127.0.0.1", "localhost", "::1") }
"*" { return @("127.0.0.1", "localhost", "::1") }
default { return @($ConfiguredHost) }
}
}
function Test-TcpEndpoint {
param(
[Parameter(Mandatory = $true)][string]$HostName,
[Parameter(Mandatory = $true)][int]$Port,
[int]$TimeoutMs = 1200
)
$client = New-Object System.Net.Sockets.TcpClient
try {
$async = $client.BeginConnect($HostName, $Port, $null, $null)
$connected = $async.AsyncWaitHandle.WaitOne($TimeoutMs, $false)
if (-not $connected) {
return $false
}
$client.EndConnect($async) | Out-Null
return $true
} catch {
return $false
} finally {
$client.Dispose()
}
}
function Get-BackendReachability {
$envFile = Join-Path $scriptDir ".env"
$envVars = Read-EnvConfig -EnvFilePath $envFile
$configuredHost = $defaultHost
$configuredPort = $defaultPort
if ($envVars.ContainsKey("SCREENJOB_HOST") -and -not [string]::IsNullOrWhiteSpace($envVars["SCREENJOB_HOST"])) {
$configuredHost = $envVars["SCREENJOB_HOST"]
}
if ($envVars.ContainsKey("SCREENJOB_PORT") -and -not [string]::IsNullOrWhiteSpace($envVars["SCREENJOB_PORT"])) {
$configuredPort = $envVars["SCREENJOB_PORT"]
}
$portNumber = 8787
[void][int]::TryParse([string]$configuredPort, [ref]$portNumber)
$hostsToTry = Get-HealthCheckHosts -ConfiguredHost $configuredHost
foreach ($candidateHost in $hostsToTry) {
if (Test-TcpEndpoint -HostName $candidateHost -Port $portNumber) {
return $true
}
}
return $false
}
function Update-TrayState {
param(
[System.Windows.Forms.NotifyIcon]$NotifyIcon,
[System.Windows.Forms.ToolStripMenuItem]$StatusItem,
[string]$Name
)
$status = Get-ServiceStatusSafe -Name $Name
$isBackendReachable = Get-BackendReachability
$displayStatus = $status
if ($status -eq "Running" -and -not $isBackendReachable) {
$displayStatus = "Running (Backend Down)"
} elseif ($status -eq "Stopped" -and $isBackendReachable) {
$displayStatus = "Stopped (Backend Up)"
} elseif ($status -eq "NotInstalled" -and $isBackendReachable) {
$displayStatus = "NotInstalled (Backend Up)"
}
$StatusItem.Text = "Status: $displayStatus"
switch ($displayStatus) {
"Running" {
$NotifyIcon.Icon = [System.Drawing.SystemIcons]::Information
}
"Stopped" {
$NotifyIcon.Icon = [System.Drawing.SystemIcons]::Warning
}
default {
$NotifyIcon.Icon = [System.Drawing.SystemIcons]::Error
}
}
$tooltip = "ScreenJob Backend: $displayStatus"
if ($tooltip.Length -gt 63) {
$tooltip = $tooltip.Substring(0, 63)
}
$NotifyIcon.Text = $tooltip
}
$appContext = New-Object System.Windows.Forms.ApplicationContext
$notifyIcon = New-Object System.Windows.Forms.NotifyIcon
$notifyIcon.Visible = $false
$menu = New-Object System.Windows.Forms.ContextMenuStrip
$statusItem = New-Object System.Windows.Forms.ToolStripMenuItem "Status: Unknown"
$statusItem.Enabled = $false
$refreshItem = New-Object System.Windows.Forms.ToolStripMenuItem "Refresh Status"
$refreshItem.Add_Click({
Update-TrayState -NotifyIcon $notifyIcon -StatusItem $statusItem -Name $ServiceName
})
$startItem = New-Object System.Windows.Forms.ToolStripMenuItem "Start Service (Admin)"
$startItem.Add_Click({
Invoke-ServiceActionElevated -Action "start" -Name $ServiceName
})
$stopItem = New-Object System.Windows.Forms.ToolStripMenuItem "Stop Service (Admin)"
$stopItem.Add_Click({
Invoke-ServiceActionElevated -Action "stop" -Name $ServiceName
})
$restartItem = New-Object System.Windows.Forms.ToolStripMenuItem "Restart Service (Admin)"
$restartItem.Add_Click({
Invoke-ServiceActionElevated -Action "restart" -Name $ServiceName
})
$dashboardItem = New-Object System.Windows.Forms.ToolStripMenuItem "Open Dashboard"
$dashboardItem.Add_Click({
$url = Get-DashboardUrl
Start-Process $url | Out-Null
})
$logsItem = New-Object System.Windows.Forms.ToolStripMenuItem "Open Service Logs"
$logsItem.Add_Click({
if (-not (Test-Path -LiteralPath $logsDir)) {
New-Item -ItemType Directory -Path $logsDir -Force | Out-Null
}
Start-Process explorer.exe $logsDir | Out-Null
})
$openFolderItem = New-Object System.Windows.Forms.ToolStripMenuItem "Open Project Folder"
$openFolderItem.Add_Click({
Start-Process explorer.exe $scriptDir | Out-Null
})
$exitItem = New-Object System.Windows.Forms.ToolStripMenuItem "Exit Tray"
$exitItem.Add_Click({
$refreshTimer.Stop()
$notifyIcon.Visible = $false
$notifyIcon.Dispose()
$menu.Dispose()
$appContext.ExitThread()
})
[void]$menu.Items.Add($statusItem)
[void]$menu.Items.Add($refreshItem)
[void]$menu.Items.Add((New-Object System.Windows.Forms.ToolStripSeparator))
[void]$menu.Items.Add($startItem)
[void]$menu.Items.Add($stopItem)
[void]$menu.Items.Add($restartItem)
[void]$menu.Items.Add((New-Object System.Windows.Forms.ToolStripSeparator))
[void]$menu.Items.Add($dashboardItem)
[void]$menu.Items.Add($logsItem)
[void]$menu.Items.Add($openFolderItem)
[void]$menu.Items.Add((New-Object System.Windows.Forms.ToolStripSeparator))
[void]$menu.Items.Add($exitItem)
$notifyIcon.ContextMenuStrip = $menu
$notifyIcon.Visible = $true
$notifyIcon.Add_DoubleClick({
$url = Get-DashboardUrl
Start-Process $url | Out-Null
})
$refreshTimer = New-Object System.Windows.Forms.Timer
$refreshTimer.Interval = 5000
$refreshTimer.Add_Tick({
Update-TrayState -NotifyIcon $notifyIcon -StatusItem $statusItem -Name $ServiceName
})
Update-TrayState -NotifyIcon $notifyIcon -StatusItem $statusItem -Name $ServiceName
$refreshTimer.Start()
[System.Windows.Forms.Application]::Run($appContext)

View File

@@ -0,0 +1,138 @@
using System.Diagnostics;
using Microsoft.Extensions.Hosting;
using Microsoft.Extensions.Logging;
namespace ScreenJob.WindowsServiceHost;
internal sealed class BackendProcessService : BackgroundService
{
private readonly ILogger<BackendProcessService> _logger;
private readonly ServiceOptions _options;
private readonly object _logLock = new();
private Process? _backendProcess;
private string _stdoutLogPath = string.Empty;
private string _stderrLogPath = string.Empty;
public BackendProcessService(ILogger<BackendProcessService> logger, ServiceOptions options)
{
_logger = logger;
_options = options;
}
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
Directory.CreateDirectory(_options.LogDirectory);
_stdoutLogPath = Path.Combine(_options.LogDirectory, "backend-service.stdout.log");
_stderrLogPath = Path.Combine(_options.LogDirectory, "backend-service.stderr.log");
LogStdOut("Service host starting backend process.");
LogStdOut($"Script: {_options.BackendScriptPath}");
LogStdOut($"Working directory: {_options.WorkingDirectory}");
var powershellPath = Path.Combine(
Environment.GetFolderPath(Environment.SpecialFolder.Windows),
"System32",
"WindowsPowerShell",
"v1.0",
"powershell.exe");
var startInfo = new ProcessStartInfo
{
FileName = powershellPath,
Arguments = $"-NoProfile -ExecutionPolicy Bypass -File \"{_options.BackendScriptPath}\"",
WorkingDirectory = _options.WorkingDirectory,
RedirectStandardOutput = true,
RedirectStandardError = true,
UseShellExecute = false,
CreateNoWindow = true
};
_backendProcess = new Process { StartInfo = startInfo };
if (!_backendProcess.Start())
{
throw new InvalidOperationException("Failed to start backend process.");
}
LogStdOut($"Backend process started with PID {_backendProcess.Id}.");
_logger.LogInformation("Backend process started with PID {Pid}.", _backendProcess.Id);
var stdoutPump = PumpStreamAsync(_backendProcess.StandardOutput, LogStdOut, stoppingToken);
var stderrPump = PumpStreamAsync(_backendProcess.StandardError, LogStdErr, stoppingToken);
try
{
await _backendProcess.WaitForExitAsync(stoppingToken);
var exitCode = _backendProcess.ExitCode;
LogStdErr($"Backend process exited unexpectedly with code {exitCode}.");
_logger.LogError("Backend process exited unexpectedly with code {ExitCode}.", exitCode);
Environment.ExitCode = exitCode == 0 ? 1 : exitCode;
throw new InvalidOperationException(
$"Backend process ended unexpectedly. Service host exit code: {Environment.ExitCode}.");
}
catch (OperationCanceledException)
{
LogStdOut("Service stop requested.");
}
finally
{
await Task.WhenAll(stdoutPump, stderrPump);
}
}
public override async Task StopAsync(CancellationToken cancellationToken)
{
if (_backendProcess is { HasExited: false })
{
try
{
LogStdOut("Stopping backend process.");
_backendProcess.Kill(entireProcessTree: true);
}
catch (Exception ex)
{
LogStdErr($"Failed to stop backend process cleanly: {ex.Message}");
_logger.LogError(ex, "Failed to stop backend process cleanly.");
}
}
await base.StopAsync(cancellationToken);
}
private async Task PumpStreamAsync(
StreamReader reader,
Action<string> sink,
CancellationToken stoppingToken)
{
while (!stoppingToken.IsCancellationRequested)
{
var line = await reader.ReadLineAsync();
if (line is null)
{
break;
}
sink(line);
}
}
private void LogStdOut(string message)
{
WriteLog(_stdoutLogPath, message);
}
private void LogStdErr(string message)
{
WriteLog(_stderrLogPath, message);
}
private void WriteLog(string path, string message)
{
var stamp = DateTimeOffset.Now.ToString("yyyy-MM-dd HH:mm:ss");
var line = $"[{stamp}] {message}{Environment.NewLine}";
lock (_logLock)
{
File.AppendAllText(path, line);
}
}
}

View File

@@ -0,0 +1,18 @@
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
using ScreenJob.WindowsServiceHost;
var options = ServiceOptions.Parse(args);
Host.CreateDefaultBuilder(args)
.UseWindowsService(serviceOptions =>
{
serviceOptions.ServiceName = "ScreenJobBackend";
})
.ConfigureServices(services =>
{
services.AddSingleton(options);
services.AddHostedService<BackendProcessService>();
})
.Build()
.Run();

View File

@@ -0,0 +1,12 @@
<Project Sdk="Microsoft.NET.Sdk.Worker">
<PropertyGroup>
<TargetFramework>net10.0-windows</TargetFramework>
<Nullable>enable</Nullable>
<ImplicitUsings>enable</ImplicitUsings>
<OutputType>Exe</OutputType>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="Microsoft.Extensions.Hosting.WindowsServices" Version="10.0.0" />
</ItemGroup>
</Project>

View File

@@ -0,0 +1,77 @@
namespace ScreenJob.WindowsServiceHost;
internal sealed record ServiceOptions(
string BackendScriptPath,
string WorkingDirectory,
string LogDirectory)
{
public static ServiceOptions Parse(string[] args)
{
var map = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase);
for (var i = 0; i < args.Length; i++)
{
var raw = args[i];
if (!raw.StartsWith("--", StringComparison.Ordinal))
{
continue;
}
var key = raw[2..];
if (string.IsNullOrWhiteSpace(key))
{
continue;
}
if (i + 1 < args.Length && !args[i + 1].StartsWith("--", StringComparison.Ordinal))
{
map[key] = args[++i];
}
else
{
map[key] = "true";
}
}
if (!map.TryGetValue("backend-script", out var backendScript) || string.IsNullOrWhiteSpace(backendScript))
{
throw new ArgumentException("Missing required argument: --backend-script <absolute-path-to-start_backend.ps1>.");
}
if (!Path.IsPathRooted(backendScript))
{
throw new ArgumentException("The --backend-script value must be an absolute path.");
}
if (!File.Exists(backendScript))
{
throw new FileNotFoundException("Backend script not found.", backendScript);
}
if (!map.TryGetValue("working-dir", out var workingDir) || string.IsNullOrWhiteSpace(workingDir))
{
workingDir = Path.GetDirectoryName(backendScript)
?? throw new ArgumentException("Could not resolve working directory from backend script path.");
}
if (!Path.IsPathRooted(workingDir))
{
throw new ArgumentException("The --working-dir value must be an absolute path.");
}
if (!map.TryGetValue("log-dir", out var logDir) || string.IsNullOrWhiteSpace(logDir))
{
logDir = Path.Combine(workingDir, "screenjob_runs", "service");
}
if (!Path.IsPathRooted(logDir))
{
throw new ArgumentException("The --log-dir value must be an absolute path.");
}
return new ServiceOptions(
Path.GetFullPath(backendScript),
Path.GetFullPath(workingDir),
Path.GetFullPath(logDir));
}
}

File diff suppressed because it is too large Load Diff

View File

@@ -30,6 +30,7 @@ def main(argv: list[str] | None = None) -> int:
print(" OPENAI_API_KEY=...")
print(" SCREENJOB_TOKEN=...")
print(" DISABLE_UI=true|false (optional)")
print(" SCREENJOB_PROHIBITED_KEY_COMBOS=ctrl+shift+s,alt+f4 (optional)")
return 0
server.main()
return 0

View File

@@ -5,6 +5,7 @@ import json
import sys
from pathlib import Path
from .agent import normalize_disabled_tools
from .config import load_app_config
from .models import RuntimeOptions
from .runtime import create_openai_client, run_job
@@ -28,8 +29,67 @@ def build_parser() -> argparse.ArgumentParser:
parser.add_argument("--command-timeout", type=int, default=45, help="Timeout in seconds for execute_command.")
parser.add_argument("--type-interval", type=float, default=0.02, help="Seconds between typed characters.")
parser.add_argument("--click-pause", type=float, default=0.10, help="Mouse move duration before click.")
parser.add_argument(
"--reasoning-effort",
choices=["low", "medium", "high"],
default="medium",
help="Reasoning effort passed to the model.",
)
parser.add_argument(
"--screen-context-decay-steps",
type=int,
default=4,
help="Compact model context every N steps to decay old screenshots (0 disables).",
)
parser.add_argument(
"--max-visual-context-images",
type=int,
default=3,
help="Maximum screenshots/enhanced images retained in model-visible context during rebases.",
)
parser.add_argument(
"--native-automation-mode",
choices=["off", "prefer", "require_fallback"],
default="prefer",
help="How strongly the agent should prefer Windows-native automation helpers over pixel fallback.",
)
parser.add_argument(
"--dialog-timeout-seconds",
type=float,
default=12.0,
help="Timeout for dialog-oriented waits and retries.",
)
parser.add_argument(
"--focus-timeout-seconds",
type=float,
default=8.0,
help="Timeout for focus-change waits and verification.",
)
parser.add_argument(
"--ui-element-timeout-seconds",
type=float,
default=8.0,
help="Timeout for native UI element lookup waits.",
)
parser.add_argument(
"--max-retries-per-surface",
type=int,
default=3,
help="Maximum repeated retries on the same classified window/dialog surface before the agent must pivot.",
)
parser.add_argument(
"--pretty-logs",
action="store_true",
help="Emit expanded multi-line tool call/result logs for easier debugging.",
)
parser.add_argument("--disable-tool", action="append", default=[], help="Disable a tool by name.")
parser.add_argument("--skip-safety-check", action="store_true", help="Bypass pre-flight safety check.")
parser.add_argument(
"--skip-safety-check",
"--skip-safety-chec",
dest="skip_safety_check",
action="store_true",
help="Bypass pre-flight safety check.",
)
parser.add_argument("--no-failsafe", action="store_true", help="Disable PyAutoGUI fail-safe.")
return parser
@@ -45,7 +105,10 @@ def main(argv: list[str] | None = None) -> int:
return 2
model = args.model or config.default_model
disabled_tools = sorted({str(x).strip() for x in args.disable_tool if str(x).strip()})
try:
disabled_tools = normalize_disabled_tools(args.disable_tool)
except ValueError as exc:
parser.error(str(exc))
if not args.skip_safety_check:
safety_client = create_openai_client(config.openai_api_key)
@@ -78,7 +141,17 @@ def main(argv: list[str] | None = None) -> int:
command_timeout=args.command_timeout,
type_interval=args.type_interval,
click_pause=args.click_pause,
reasoning_effort=args.reasoning_effort,
screen_context_decay_steps=max(0, int(args.screen_context_decay_steps)),
max_visual_context_images=max(0, int(args.max_visual_context_images)),
native_automation_mode=args.native_automation_mode,
dialog_timeout_seconds=max(0.5, float(args.dialog_timeout_seconds)),
focus_timeout_seconds=max(0.5, float(args.focus_timeout_seconds)),
ui_element_timeout_seconds=max(0.5, float(args.ui_element_timeout_seconds)),
max_retries_per_surface=max(1, int(args.max_retries_per_surface)),
pretty_logs=bool(args.pretty_logs),
disable_tools=set(disabled_tools),
prohibited_key_combos=set(config.prohibited_key_combos),
)
try:
result, artifacts = run_job(

View File

@@ -14,6 +14,13 @@ def _env_bool(name: str, default: bool = False) -> bool:
return raw.strip().lower() in {"1", "true", "yes", "on"}
def _env_csv(name: str) -> list[str]:
raw = os.getenv(name)
if raw is None:
return []
return [item.strip() for item in raw.split(",") if item.strip()]
@dataclass(frozen=True)
class AppConfig:
openai_api_key: str
@@ -25,6 +32,7 @@ class AppConfig:
port: int
runs_dir: Path
db_path: Path
prohibited_key_combos: tuple[str, ...] = ()
def load_app_config(cwd: Path) -> AppConfig:
@@ -38,6 +46,7 @@ def load_app_config(cwd: Path) -> AppConfig:
runs_dir = cwd / "screenjob_runs"
db_path = cwd / "screenjob.db"
disable_ui = _env_bool("DISABLE_UI", default=False)
prohibited_key_combos = tuple(_env_csv("SCREENJOB_PROHIBITED_KEY_COMBOS"))
return AppConfig(
openai_api_key=openai_api_key,
screenjob_token=screenjob_token,
@@ -48,5 +57,5 @@ def load_app_config(cwd: Path) -> AppConfig:
port=port,
runs_dir=runs_dir,
db_path=db_path,
prohibited_key_combos=prohibited_key_combos,
)

272
src/desktop_overlay.py Normal file
View File

@@ -0,0 +1,272 @@
from __future__ import annotations
import logging
import os
import queue
import threading
from dataclasses import dataclass
from typing import Any
@dataclass(frozen=True)
class CompletionOverlayPayload:
job_id: str
objective: str
return_message: str
steps: int
elapsed_seconds: float
class DesktopOverlayManager:
def __init__(self, logger: logging.Logger | None = None, *, auto_dismiss_seconds: float = 10.0) -> None:
self.logger = logger or logging.getLogger("screenjob.overlay")
self._queue: queue.Queue[CompletionOverlayPayload] = queue.Queue()
self._thread: threading.Thread | None = None
self._lock = threading.Lock()
self._ready = threading.Event()
self._disabled = False
self._warned = False
self._auto_dismiss_ms = max(0, int(round(float(auto_dismiss_seconds) * 1000)))
def show_completion(
self,
*,
job_id: str,
objective: str,
return_message: str,
steps: int,
elapsed_seconds: float,
) -> None:
if os.name != "nt":
self._disable_once("Desktop completion HUD is only enabled on Windows.")
return
if not self._ensure_thread():
return
self._queue.put(
CompletionOverlayPayload(
job_id=job_id,
objective=objective,
return_message=return_message,
steps=max(0, int(steps)),
elapsed_seconds=max(0.0, float(elapsed_seconds)),
)
)
def _ensure_thread(self) -> bool:
with self._lock:
if self._disabled:
return False
if self._thread is None or not self._thread.is_alive():
self._ready.clear()
self._thread = threading.Thread(target=self._ui_main, name="screenjob-overlay", daemon=True)
self._thread.start()
self._ready.wait(timeout=2.0)
return not self._disabled
def _disable_once(self, reason: str) -> None:
with self._lock:
self._disabled = True
already_warned = self._warned
self._warned = True
self._ready.set()
if not already_warned:
self.logger.warning("%s Overlay notifications disabled.", reason)
def _format_elapsed(self, elapsed_seconds: float) -> str:
total_seconds = max(0, int(round(elapsed_seconds)))
minutes, seconds = divmod(total_seconds, 60)
hours, minutes = divmod(minutes, 60)
if hours:
return f"{hours}h {minutes}m {seconds}s"
if minutes:
return f"{minutes}m {seconds}s"
return f"{seconds}s"
def _shorten(self, text: str, limit: int) -> str:
raw = " ".join(str(text or "").split())
if len(raw) <= limit:
return raw
return raw[: max(0, limit - 1)].rstrip() + "..."
def _ui_main(self) -> None:
try:
import tkinter as tk
except Exception as exc: # noqa: BLE001
self._disable_once(f"tkinter is unavailable ({type(exc).__name__}: {exc}).")
return
try:
root = tk.Tk()
root.withdraw()
root.update_idletasks()
except Exception as exc: # noqa: BLE001
self._disable_once(f"Desktop overlay could not initialize ({type(exc).__name__}: {exc}).")
return
cards: list[dict[str, Any]] = []
self._ready.set()
def reposition() -> None:
screen_width = root.winfo_screenwidth()
top = 24
for entry in cards:
window = entry["window"]
if not bool(window.winfo_exists()):
continue
window.update_idletasks()
width = max(320, int(window.winfo_width() or 360))
height = max(120, int(window.winfo_height() or 160))
left = max(12, screen_width - width - 24)
window.geometry(f"{width}x{height}+{left}+{top}")
top += height + 16
def dismiss(window: Any) -> None:
for index, entry in enumerate(list(cards)):
if entry["window"] is window:
after_id = entry.get("after_id")
if after_id is not None:
try:
window.after_cancel(after_id)
except Exception: # noqa: BLE001
pass
cards.pop(index)
break
try:
if bool(window.winfo_exists()):
window.destroy()
except Exception: # noqa: BLE001
pass
if cards:
reposition()
def add_card(payload: CompletionOverlayPayload) -> None:
card = tk.Toplevel(root)
card.withdraw()
card.overrideredirect(True)
card.attributes("-topmost", True)
card.configure(bg="#0f172a")
frame = tk.Frame(card, bg="#0f172a", highlightthickness=1, highlightbackground="#22c55e", bd=0)
frame.pack(fill="both", expand=True)
close_button = tk.Button(
frame,
text="×",
command=lambda win=card: dismiss(win),
bg="#0f172a",
fg="#cbd5e1",
activebackground="#111827",
activeforeground="#ffffff",
relief="flat",
borderwidth=0,
font=("Segoe UI", 14, "bold"),
padx=6,
pady=0,
)
close_button.place(relx=1.0, x=-8, y=6, anchor="ne")
header = tk.Label(
frame,
text="Completed",
bg="#0f172a",
fg="#86efac",
font=("Segoe UI", 10, "bold"),
anchor="w",
)
header.pack(fill="x", padx=14, pady=(12, 2))
title = tk.Label(
frame,
text=self._shorten(payload.objective, 72) or "Job complete",
bg="#0f172a",
fg="#f8fafc",
font=("Segoe UI", 11, "bold"),
justify="left",
wraplength=320,
anchor="w",
)
title.pack(fill="x", padx=14)
job_row = tk.Label(
frame,
text=f"Job {payload.job_id}",
bg="#0f172a",
fg="#94a3b8",
font=("Segoe UI", 9),
justify="left",
anchor="w",
)
job_row.pack(fill="x", padx=14, pady=(2, 8))
message = tk.Label(
frame,
text=self._shorten(payload.return_message, 180) or "Task completed.",
bg="#0f172a",
fg="#e2e8f0",
font=("Segoe UI", 9),
justify="left",
wraplength=320,
anchor="w",
)
message.pack(fill="x", padx=14)
footer = tk.Label(
frame,
text=f"{payload.steps} step(s) | {self._format_elapsed(payload.elapsed_seconds)}",
bg="#0f172a",
fg="#94a3b8",
font=("Segoe UI", 9),
justify="left",
anchor="w",
)
footer.pack(fill="x", padx=14, pady=(10, 12))
after_id = None
if self._auto_dismiss_ms > 0:
after_id = card.after(self._auto_dismiss_ms, lambda win=card: dismiss(win))
cards.insert(0, {"window": card, "after_id": after_id})
while len(cards) > 3:
stale = cards.pop()
try:
stale_after_id = stale.get("after_id")
if stale_after_id is not None:
stale["window"].after_cancel(stale_after_id)
stale["window"].destroy()
except Exception: # noqa: BLE001
pass
card.update_idletasks()
reposition()
card.deiconify()
def pump_queue() -> None:
try:
while True:
add_card(self._queue.get_nowait())
except queue.Empty:
pass
try:
root.after(120, pump_queue)
except Exception: # noqa: BLE001
self._disable_once("Desktop overlay event loop stopped unexpectedly.")
pump_queue()
try:
root.mainloop()
except Exception as exc: # noqa: BLE001
self._disable_once(f"Desktop overlay main loop failed ({type(exc).__name__}: {exc}).")
_overlay_singleton: DesktopOverlayManager | None = None
_overlay_lock = threading.Lock()
def get_desktop_overlay_manager(logger: logging.Logger | None = None) -> DesktopOverlayManager:
global _overlay_singleton
with _overlay_lock:
if _overlay_singleton is None:
_overlay_singleton = DesktopOverlayManager(logger=logger)
elif logger is not None:
_overlay_singleton.logger = logger
return _overlay_singleton

View File

@@ -58,4 +58,14 @@ class RuntimeOptions:
command_timeout: int = 45
type_interval: float = 0.02
click_pause: float = 0.10
reasoning_effort: str = "medium"
screen_context_decay_steps: int = 4
max_visual_context_images: int = 3
native_automation_mode: str = "prefer"
dialog_timeout_seconds: float = 12.0
focus_timeout_seconds: float = 8.0
ui_element_timeout_seconds: float = 8.0
max_retries_per_surface: int = 3
pretty_logs: bool = False
disable_tools: set[str] | None = None
prohibited_key_combos: set[str] | None = None

View File

@@ -12,10 +12,12 @@ from fastapi.responses import FileResponse
from fastapi.responses import HTMLResponse, JSONResponse
from pydantic import BaseModel, Field
from .agent import normalize_disabled_tools
from .config import AppConfig, load_app_config
from .storage import HistoryDB
from .task_manager import JobManager
from .ui import monitoring_page_html
from .ui import monitoring_js_path, monitoring_page_html
from .utils import utc_now_iso
class CreateJobRequest(BaseModel):
@@ -25,11 +27,195 @@ class CreateJobRequest(BaseModel):
command_timeout: int = Field(45, ge=1, le=600)
type_interval: float = Field(0.02, ge=0.0, le=1.0)
click_pause: float = Field(0.10, ge=0.0, le=2.0)
reasoning_effort: str = Field("medium", pattern="^(low|medium|high)$")
screen_context_decay_steps: int = Field(4, ge=0, le=50)
max_visual_context_images: int = Field(3, ge=0, le=12)
native_automation_mode: str = Field("prefer", pattern="^(off|prefer|require_fallback)$")
dialog_timeout_seconds: float = Field(12.0, ge=0.5, le=120.0)
focus_timeout_seconds: float = Field(8.0, ge=0.5, le=120.0)
ui_element_timeout_seconds: float = Field(8.0, ge=0.5, le=120.0)
max_retries_per_surface: int = Field(3, ge=1, le=10)
pretty_logs: bool = False
disabled_tools: list[str] = Field(default_factory=list)
safety_override: bool = False
no_failsafe: bool = False
def _safe_int(value: Any) -> int | None:
try:
return int(value)
except Exception: # noqa: BLE001
return None
def _safe_text(value: Any, limit: int = 180) -> str:
text = str(value or "").strip()
if len(text) <= limit:
return text
return f"{text[:limit]}..."
def _resolve_artifact_path(artifacts_dir: Path | None, path_raw: Any) -> Path | None:
if artifacts_dir is None:
return None
text = str(path_raw or "").strip()
if not text:
return None
candidate = Path(text).resolve()
try:
candidate.relative_to(artifacts_dir)
except ValueError:
return None
return candidate
def _extract_replay_action(
event: dict[str, Any],
pending_tool_args: dict[tuple[int, str], list[dict[str, Any]]],
) -> dict[str, Any] | None:
event_type = str(event.get("event_type") or "")
payload = event.get("payload") if isinstance(event.get("payload"), dict) else {}
step = int(event.get("step") or 0)
ts = str(event.get("ts") or "")
event_id = int(event.get("id") or 0)
if event_type == "tool_called":
tool = str(payload.get("tool") or "").strip()
args = payload.get("args") if isinstance(payload.get("args"), dict) else {}
if tool:
pending_tool_args.setdefault((step, tool), []).append(args)
action: dict[str, Any] = {
"ts": ts,
"step": step,
"event_id": event_id,
"kind": "tool_called",
"tool": tool,
"label": f"Call: {tool}" if tool else "Tool call",
}
if tool == "click":
coord = args.get("coordinate") if isinstance(args, dict) else None
if isinstance(coord, dict):
x = _safe_int(coord.get("x"))
y = _safe_int(coord.get("y"))
if x is not None and y is not None:
action["requested_click"] = {"x": x, "y": y}
action["label"] = f"Call: click ({x}, {y})"
elif tool == "type":
text = _safe_text((args or {}).get("text"), 120)
if text:
action["text_preview"] = text
action["label"] = f"Call: type \"{text}\""
return action
if event_type == "tool_result":
tool = str(payload.get("tool") or "").strip()
result = payload.get("result") if isinstance(payload.get("result"), dict) else {}
matching_args: dict[str, Any] = {}
key = (step, tool)
queued = pending_tool_args.get(key) or []
if queued:
matching_args = queued.pop(0)
if not queued:
pending_tool_args.pop(key, None)
action = {
"ts": ts,
"step": step,
"event_id": event_id,
"kind": "tool_result",
"tool": tool,
"ok": bool(result.get("ok")),
"label": f"Result: {tool}",
}
if tool == "click":
clicked = result.get("clicked") if isinstance(result.get("clicked"), dict) else {}
x = _safe_int(clicked.get("x"))
y = _safe_int(clicked.get("y"))
if x is not None and y is not None:
action["click"] = {"x": x, "y": y}
action["label"] = f"Clicked ({x}, {y})" if bool(result.get("ok")) else f"Click failed ({x}, {y})"
elif tool == "type":
text = _safe_text((matching_args or {}).get("text"), 120)
typed_length = _safe_int(result.get("typed_length"))
if typed_length is not None:
action["typed_length"] = typed_length
if text:
action["text_preview"] = text
action["label"] = f"Typed \"{text}\""
elif tool == "press_key":
key_name = _safe_text(result.get("key"), 80)
if key_name:
action["label"] = f"Pressed {key_name}"
elif tool == "execute_command":
command = _safe_text((matching_args or {}).get("command"), 140)
if command:
action["command_preview"] = command
action["label"] = f"Command: {command}"
return action
return None
def _build_replay_payload(job_id: str, job: dict[str, Any], events: list[dict[str, Any]]) -> dict[str, Any]:
artifacts_dir_raw = str(job.get("artifacts_dir") or "").strip()
artifacts_dir = Path(artifacts_dir_raw).resolve() if artifacts_dir_raw else None
pending_tool_args: dict[tuple[int, str], list[dict[str, Any]]] = {}
buffered_actions: list[dict[str, Any]] = []
frames: list[dict[str, Any]] = []
for event in events:
action = _extract_replay_action(event, pending_tool_args)
if action is not None:
buffered_actions.append(action)
if str(event.get("event_type") or "") != "visual_update":
continue
payload = event.get("payload") if isinstance(event.get("payload"), dict) else {}
image_meta = payload.get("image_meta") if isinstance(payload.get("image_meta"), dict) else {}
resolved = _resolve_artifact_path(artifacts_dir, image_meta.get("path"))
if resolved is None or not resolved.exists() or not resolved.is_file():
continue
width = _safe_int(image_meta.get("width"))
height = _safe_int(image_meta.get("height"))
if width is None or height is None:
size = image_meta.get("screen_size") if isinstance(image_meta.get("screen_size"), dict) else {}
width = _safe_int(size.get("width"))
height = _safe_int(size.get("height"))
is_fullscreen = (
str(payload.get("kind") or "") == "see_screen"
and bool(image_meta.get("grid"))
and isinstance(width, int)
and isinstance(height, int)
and width > 0
and height > 0
)
frames.append(
{
"frame_index": len(frames),
"event_id": int(event.get("id") or 0),
"ts": str(event.get("ts") or ""),
"step": int(event.get("step") or 0),
"kind": str(payload.get("kind") or "visual_update"),
"image_path": str(resolved),
"image_meta": image_meta,
"screen_size": {"width": width, "height": height} if width and height else None,
"is_fullscreen": is_fullscreen,
"overlays": buffered_actions,
}
)
buffered_actions = []
return {
"job_id": job_id,
"total_events": len(events),
"total_frames": len(frames),
"frames": frames,
"trailing_events": buffered_actions,
}
class _WebSocketHub:
def __init__(self) -> None:
self._connections: set[WebSocket] = set()
@@ -119,6 +305,8 @@ def create_app(config: AppConfig | None = None) -> FastAPI:
@app.post("/api/jobs")
def create_job(payload: CreateJobRequest, _: None = Depends(require_token)) -> dict[str, str]:
try:
disabled_tools = normalize_disabled_tools(payload.disabled_tools)
job_id = manager.submit_job(
objective=payload.job,
model=payload.model,
@@ -126,10 +314,21 @@ def create_app(config: AppConfig | None = None) -> FastAPI:
command_timeout=payload.command_timeout,
type_interval=payload.type_interval,
click_pause=payload.click_pause,
disabled_tools=payload.disabled_tools,
reasoning_effort=payload.reasoning_effort,
screen_context_decay_steps=payload.screen_context_decay_steps,
max_visual_context_images=payload.max_visual_context_images,
native_automation_mode=payload.native_automation_mode,
dialog_timeout_seconds=payload.dialog_timeout_seconds,
focus_timeout_seconds=payload.focus_timeout_seconds,
ui_element_timeout_seconds=payload.ui_element_timeout_seconds,
max_retries_per_surface=payload.max_retries_per_surface,
pretty_logs=payload.pretty_logs,
disabled_tools=disabled_tools,
safety_override=payload.safety_override,
no_failsafe=payload.no_failsafe,
)
except ValueError as exc:
raise HTTPException(status_code=400, detail=str(exc)) from exc
return {"job_id": job_id}
@app.get("/api/jobs")
@@ -161,6 +360,18 @@ def create_app(config: AppConfig | None = None) -> FastAPI:
raise HTTPException(status_code=404, detail="Job not found")
return {"events": manager.get_events(job_id, limit=limit)}
@app.get("/api/jobs/{job_id}/replay")
def get_job_replay(
job_id: str,
limit: int = Query(default=5000, ge=1, le=5000),
_: None = Depends(require_token),
) -> dict[str, Any]:
job = manager.get_job(job_id)
if job is None:
raise HTTPException(status_code=404, detail="Job not found")
events = manager.get_events(job_id, limit=limit)
return _build_replay_payload(job_id, job, events)
@app.post("/api/jobs/{job_id}/cancel")
def cancel_job(job_id: str, _: None = Depends(require_token)) -> dict[str, Any]:
job = manager.get_job(job_id)
@@ -195,11 +406,21 @@ def create_app(config: AppConfig | None = None) -> FastAPI:
def stats(_: None = Depends(require_token)) -> dict[str, Any]:
return manager.stats()
@app.get("/api/analytics")
def analytics(_: None = Depends(require_token)) -> dict[str, Any]:
payload = manager.analytics()
payload["generated_at"] = utc_now_iso()
return payload
if not app_config.disable_ui:
@app.get("/", response_class=HTMLResponse)
def ui_root() -> str:
return monitoring_page_html(device_hostname=device_hostname)
@app.get("/ui/monitoring.js")
def ui_monitoring_js() -> FileResponse:
return FileResponse(str(monitoring_js_path()), media_type="application/javascript")
@app.websocket("/ws")
async def ws_endpoint(websocket: WebSocket, token: str = Query(default="")) -> None:
if not token or not secrets.compare_digest(token, app_config.screenjob_token):

View File

@@ -7,6 +7,39 @@ from pathlib import Path
from typing import Any
_TERMINAL_STATUSES = {"completed", "failed", "cancelled"}
_CATEGORY_RULES: tuple[tuple[str, tuple[str, ...]], ...] = (
(
"Browser / web",
("browser", "website", "webpage", "chrome", "url", "amazon", "google", "login", "shopping", "checkout", "orders"),
),
(
"Files / terminal",
("file", "folder", "directory", "terminal", "shell", "command", "cli", "script", "git", "repo", "install", "pip", "npm", "powershell", "bash"),
),
(
"Writing / docs",
("write", "summary", "summarize", "document", "docs", "report", "email", "message", "readme", "markdown", "note", "proposal"),
),
(
"Data / analysis",
("data", "analysis", "analyze", "csv", "spreadsheet", "sheet", "table", "chart", "dashboard", "metric", "metrics", "sql"),
),
(
"Development / ops",
("code", "bug", "fix", "test", "debug", "api", "backend", "frontend", "database", "deploy", "docker", "service", "build"),
),
)
def _objective_category(objective: str) -> str:
text = objective.lower()
for category, keywords in _CATEGORY_RULES:
if any(keyword in text for keyword in keywords):
return category
return "Other"
class HistoryDB:
def __init__(self, db_path: Path) -> None:
self.db_path = db_path
@@ -184,6 +217,131 @@ class HistoryDB:
).fetchone()
return dict(totals) if totals else {}
def analytics(self) -> dict[str, Any]:
with self._connect() as conn:
rows = conn.execute(
"""
SELECT job_id, objective, status, steps, estimated_cost_usd, created_at
FROM jobs
ORDER BY created_at ASC, job_id ASC
"""
).fetchall()
total_jobs = 0
finished_jobs = 0
completed_jobs = 0
failed_jobs = 0
cancelled_jobs = 0
steps_sum = 0
steps_count = 0
cost_sum = 0.0
cost_count = 0
by_category: dict[str, dict[str, Any]] = {}
by_day: dict[str, dict[str, Any]] = {}
def _bucket(target: dict[str, dict[str, Any]], key: str) -> dict[str, Any]:
bucket = target.setdefault(
key,
{
"label": key,
"total_jobs": 0,
"finished_jobs": 0,
"completed_jobs": 0,
"failed_jobs": 0,
"cancelled_jobs": 0,
"steps_sum": 0,
"steps_count": 0,
"cost_sum": 0.0,
"cost_count": 0,
},
)
return bucket
for row in rows:
total_jobs += 1
status = str(row["status"] or "")
finished = status in _TERMINAL_STATUSES
completed = status == "completed"
objective = str(row["objective"] or "")
category = _objective_category(objective)
created_at = str(row["created_at"] or "")
day = created_at[:10] if len(created_at) >= 10 else created_at or "unknown"
category_bucket = _bucket(by_category, category)
day_bucket = _bucket(by_day, day)
for bucket in (category_bucket, day_bucket):
bucket["total_jobs"] += 1
if not finished:
continue
finished_jobs += 1
if completed:
completed_jobs += 1
elif status == "failed":
failed_jobs += 1
elif status == "cancelled":
cancelled_jobs += 1
steps = row["steps"]
if steps is not None:
step_value = int(steps)
steps_sum += step_value
steps_count += 1
for bucket in (category_bucket, day_bucket):
bucket["steps_sum"] += step_value
bucket["steps_count"] += 1
estimated_cost = row["estimated_cost_usd"]
if estimated_cost is not None:
cost_value = float(estimated_cost)
cost_sum += cost_value
cost_count += 1
for bucket in (category_bucket, day_bucket):
bucket["cost_sum"] += cost_value
bucket["cost_count"] += 1
for bucket in (category_bucket, day_bucket):
bucket["finished_jobs"] += 1
if completed:
bucket["completed_jobs"] += 1
elif status == "failed":
bucket["failed_jobs"] += 1
elif status == "cancelled":
bucket["cancelled_jobs"] += 1
def _finalize(bucket: dict[str, Any]) -> dict[str, Any]:
finished = bucket["finished_jobs"]
return {
"label": bucket["label"],
"total_jobs": bucket["total_jobs"],
"finished_jobs": finished,
"completed_jobs": bucket["completed_jobs"],
"failed_jobs": bucket["failed_jobs"],
"cancelled_jobs": bucket["cancelled_jobs"],
"success_rate": round((bucket["completed_jobs"] / finished) * 100, 2) if finished else 0.0,
"avg_steps": round(bucket["steps_sum"] / bucket["steps_count"], 2) if bucket["steps_count"] else None,
"avg_cost_usd": round(bucket["cost_sum"] / bucket["cost_count"], 6) if bucket["cost_count"] else None,
}
category_rows = [_finalize(bucket) for bucket in by_category.values()]
category_rows.sort(key=lambda item: (-item["success_rate"], item["label"]))
day_rows = [_finalize(bucket) for bucket in by_day.values()]
day_rows.sort(key=lambda item: item["label"])
return {
"total_jobs": total_jobs,
"finished_jobs": finished_jobs,
"completed_jobs": completed_jobs,
"failed_jobs": failed_jobs,
"cancelled_jobs": cancelled_jobs,
"success_rate": round((completed_jobs / finished_jobs) * 100, 2) if finished_jobs else 0.0,
"avg_steps": round(steps_sum / steps_count, 2) if steps_count else None,
"avg_cost_usd": round(cost_sum / cost_count, 6) if cost_count else None,
"by_category": category_rows,
"timeline": day_rows,
}
def _row_to_job(self, row: sqlite3.Row) -> dict[str, Any]:
disabled_tools: list[str] = []
try:

View File

@@ -8,7 +8,9 @@ from dataclasses import dataclass
from pathlib import Path
from typing import Any, Callable
from .agent import normalize_disabled_tools
from .config import AppConfig
from .desktop_overlay import DesktopOverlayManager, get_desktop_overlay_manager
from .models import RuntimeOptions
from .runtime import create_openai_client, run_job
from .safety import assess_task_safety
@@ -32,10 +34,12 @@ class JobManager:
config: AppConfig,
db: HistoryDB,
broadcast: Callable[[dict[str, Any]], None] | None = None,
overlay_manager: DesktopOverlayManager | None = None,
) -> None:
self.config = config
self.db = db
self.broadcast = broadcast
self.overlay_manager = overlay_manager or get_desktop_overlay_manager()
self._running: dict[str, _RunningJob] = {}
self._lock = threading.Lock()
@@ -48,6 +52,15 @@ class JobManager:
command_timeout: int = 45,
type_interval: float = 0.02,
click_pause: float = 0.10,
reasoning_effort: str = "medium",
screen_context_decay_steps: int = 4,
max_visual_context_images: int = 3,
native_automation_mode: str = "prefer",
dialog_timeout_seconds: float = 12.0,
focus_timeout_seconds: float = 8.0,
ui_element_timeout_seconds: float = 8.0,
max_retries_per_surface: int = 3,
pretty_logs: bool = False,
disabled_tools: list[str] | None = None,
safety_override: bool = False,
no_failsafe: bool = False,
@@ -55,7 +68,7 @@ class JobManager:
job_id = f"job_{int(time.time())}_{uuid.uuid4().hex[:8]}"
created_at = utc_now_iso()
selected_model = (model or self.config.default_model).strip() or self.config.default_model
disabled = sorted({tool.strip() for tool in (disabled_tools or []) if tool.strip()})
disabled = normalize_disabled_tools(disabled_tools)
self.db.create_job(
job_id=job_id,
objective=objective,
@@ -93,6 +106,15 @@ class JobManager:
"command_timeout": command_timeout,
"type_interval": type_interval,
"click_pause": click_pause,
"reasoning_effort": reasoning_effort,
"screen_context_decay_steps": screen_context_decay_steps,
"max_visual_context_images": max_visual_context_images,
"native_automation_mode": native_automation_mode,
"dialog_timeout_seconds": dialog_timeout_seconds,
"focus_timeout_seconds": focus_timeout_seconds,
"ui_element_timeout_seconds": ui_element_timeout_seconds,
"max_retries_per_surface": max_retries_per_surface,
"pretty_logs": pretty_logs,
"no_failsafe": no_failsafe,
"cancel_event": cancel_event,
},
@@ -121,6 +143,15 @@ class JobManager:
command_timeout: int,
type_interval: float,
click_pause: float,
reasoning_effort: str,
screen_context_decay_steps: int,
max_visual_context_images: int,
native_automation_mode: str,
dialog_timeout_seconds: float,
focus_timeout_seconds: float,
ui_element_timeout_seconds: float,
max_retries_per_surface: int,
pretty_logs: bool,
no_failsafe: bool,
cancel_event: threading.Event,
) -> None:
@@ -218,7 +249,17 @@ class JobManager:
command_timeout=command_timeout,
type_interval=type_interval,
click_pause=click_pause,
reasoning_effort=reasoning_effort,
screen_context_decay_steps=max(0, int(screen_context_decay_steps)),
max_visual_context_images=max(0, int(max_visual_context_images)),
native_automation_mode=str(native_automation_mode or "prefer").strip().lower() or "prefer",
dialog_timeout_seconds=max(0.5, float(dialog_timeout_seconds)),
focus_timeout_seconds=max(0.5, float(focus_timeout_seconds)),
ui_element_timeout_seconds=max(0.5, float(ui_element_timeout_seconds)),
max_retries_per_surface=max(1, int(max_retries_per_surface)),
pretty_logs=bool(pretty_logs),
disable_tools=set(disabled_tools),
prohibited_key_combos=set(self.config.prohibited_key_combos),
)
try:
result, artifacts = run_job(
@@ -289,6 +330,14 @@ class JobManager:
},
},
)
if status == "completed":
self.overlay_manager.show_completion(
job_id=job_id,
objective=objective,
return_message=result.return_message,
steps=result.steps,
elapsed_seconds=max(0.0, float(result.ended_at - result.started_at)),
)
with self._lock:
self._running.pop(job_id, None)
@@ -343,6 +392,9 @@ class JobManager:
stats["live_running_threads"] = sum(1 for job in self._running.values() if job.thread.is_alive())
return stats
def analytics(self) -> dict[str, Any]:
return self.db.analytics()
def _normalize_job_payload(self, job: dict[str, Any]) -> dict[str, Any]:
response = job.get("response")
if not isinstance(response, dict):

310
src/ui.py
View File

@@ -1,307 +1,19 @@
from __future__ import annotations
from html import escape
from pathlib import Path
_UI_DIR = Path(__file__).resolve().parent / "ui_assets"
_HTML_TEMPLATE_PATH = _UI_DIR / "monitoring.html"
_JS_PATH = _UI_DIR / "monitoring.js"
def monitoring_page_html(device_hostname: str = "") -> str:
host_suffix = f" ({escape(device_hostname)})" if device_hostname else ""
return """<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>ScreenJob Monitor</title>
<script src="https://cdn.tailwindcss.com"></script>
</head>
<body class="bg-slate-950 text-slate-100 min-h-screen">
<div class="max-w-7xl mx-auto p-4 md:p-8 space-y-6">
<header class="flex flex-col gap-3 md:flex-row md:items-center md:justify-between">
<div>
<h1 class="text-2xl md:text-3xl font-bold tracking-tight">ScreenJob Monitor<span class="text-slate-400 text-base md:text-lg font-medium">__MONITOR_HOST__</span></h1>
<p class="text-slate-400 text-sm">Read-only monitoring for active and historical tasks.</p>
</div>
<div class="flex flex-col md:flex-row gap-2 md:items-center">
<input id="tokenInput" type="password" placeholder="SCREENJOB_TOKEN" class="bg-slate-900 border border-slate-700 rounded px-3 py-2 text-sm w-72" />
<button id="saveTokenBtn" class="bg-cyan-500 hover:bg-cyan-400 text-slate-950 font-semibold px-4 py-2 rounded">Connect</button>
</div>
</header>
html = _HTML_TEMPLATE_PATH.read_text(encoding="utf-8")
return html.replace("__MONITOR_HOST__", host_suffix)
<section class="grid grid-cols-2 md:grid-cols-6 gap-3" id="stats"></section>
<section class="grid grid-cols-1 lg:grid-cols-5 gap-4">
<div class="lg:col-span-2 bg-slate-900/70 border border-slate-800 rounded-xl p-4">
<div class="flex items-center justify-between mb-3">
<h2 class="font-semibold">Jobs</h2>
<button id="refreshBtn" class="text-xs bg-slate-800 px-2 py-1 rounded">Refresh</button>
</div>
<div id="jobList" class="space-y-2 max-h-[62vh] overflow-auto"></div>
</div>
<div class="lg:col-span-3 bg-slate-900/70 border border-slate-800 rounded-xl p-4 space-y-3">
<h2 class="font-semibold">Job Detail</h2>
<pre id="jobDetail" class="bg-slate-950 border border-slate-800 rounded p-3 text-xs overflow-auto max-h-[24vh]"></pre>
<h3 class="font-semibold text-sm">Latest Visual</h3>
<div class="bg-slate-950 border border-slate-800 rounded p-2">
<img id="latestVisual" alt="Latest visual update" class="max-h-[24vh] w-full object-contain rounded" />
</div>
<div class="flex items-center justify-between">
<h3 class="font-semibold text-sm">Live Events</h3>
<label for="eventsViewToggle" class="flex items-center gap-2 text-xs text-slate-300 cursor-pointer select-none">
<span>Raw</span>
<input id="eventsViewToggle" type="checkbox" class="accent-cyan-400 h-4 w-4" />
<span>Beautiful</span>
</label>
</div>
<div id="events" class="bg-slate-950 border border-slate-800 rounded p-3 text-xs overflow-auto max-h-[36vh] space-y-1"></div>
</div>
</section>
</div>
<script>
const tokenInput = document.getElementById("tokenInput");
const saveTokenBtn = document.getElementById("saveTokenBtn");
const refreshBtn = document.getElementById("refreshBtn");
const jobListEl = document.getElementById("jobList");
const jobDetailEl = document.getElementById("jobDetail");
const eventsEl = document.getElementById("events");
const statsEl = document.getElementById("stats");
const latestVisualEl = document.getElementById("latestVisual");
const eventsViewToggle = document.getElementById("eventsViewToggle");
const state = {
token: localStorage.getItem("screenjob_token") || "",
jobs: [],
selectedJobId: null,
ws: null,
wsReconnectTimer: null,
eventsViewMode: localStorage.getItem("screenjob_events_view_mode") === "beautiful" ? "beautiful" : "raw"
};
const manuallyClosedSockets = new WeakSet();
tokenInput.value = state.token;
function authHeaders() {
return { "Authorization": "Bearer " + state.token };
}
async function api(path, opts = {}) {
if (!state.token) throw new Error("Token required");
const headers = Object.assign({}, authHeaders(), opts.headers || {});
const response = await fetch(path, Object.assign({}, opts, { headers }));
if (!response.ok) throw new Error(await response.text());
return response.json();
}
function renderStats(stats) {
const cards = [
["Total Jobs", stats.total_jobs || 0],
["Running", stats.running_jobs || 0],
["Completed", stats.completed_jobs || 0],
["Failed", stats.failed_jobs || 0],
["Cancelled", stats.cancelled_jobs || 0],
["Total Cost (USD)", Number(stats.total_estimated_cost || 0).toFixed(4)]
];
statsEl.innerHTML = cards.map(([name, val]) => `
<div class="bg-slate-900/70 border border-slate-800 rounded-xl p-3">
<div class="text-slate-400 text-xs">${name}</div>
<div class="text-lg font-semibold">${val}</div>
</div>
`).join("");
}
function renderJobs() {
jobListEl.innerHTML = state.jobs.map((job) => {
const active = job.job_id === state.selectedJobId;
return `
<button data-job-id="${job.job_id}" class="w-full text-left p-3 rounded border ${active ? "border-cyan-400 bg-slate-800" : "border-slate-800 bg-slate-950"} hover:bg-slate-800">
<div class="flex items-center justify-between">
<span class="font-medium">${job.job_id}</span>
<span class="text-xs px-2 py-0.5 rounded bg-slate-700">${job.status}</span>
</div>
<div class="text-xs text-slate-400 mt-1">${job.model}</div>
<div class="text-xs text-slate-300 mt-1 line-clamp-2">${job.objective}</div>
<div class="text-xs text-slate-500 mt-1">$${Number((job.usage && job.usage.estimated_cost_usd) || 0).toFixed(6)}</div>
</button>
`;
}).join("");
for (const btn of jobListEl.querySelectorAll("button[data-job-id]")) {
btn.addEventListener("click", () => {
state.selectedJobId = btn.getAttribute("data-job-id");
renderJobs();
refreshJobDetail();
});
}
}
function pushEventLine(obj) {
if (!obj || !obj.job_id || !obj.event_type) return;
const line = document.createElement("div");
const ts = obj.ts || "-";
const step = (obj.step ?? "-");
if (state.eventsViewMode === "raw") {
line.className = "border-b border-slate-800 pb-1";
line.textContent = `[${ts}] ${obj.job_id} step=${step} ${obj.event_type} ${JSON.stringify(obj.payload || {})}`;
} else {
const typeColors = {
info: "bg-sky-900/50 text-sky-200 border border-sky-800",
warning: "bg-amber-900/40 text-amber-200 border border-amber-800",
error: "bg-rose-900/40 text-rose-200 border border-rose-800",
visual_update: "bg-emerald-900/40 text-emerald-200 border border-emerald-800",
tool_call: "bg-violet-900/40 text-violet-200 border border-violet-800",
tool_result: "bg-indigo-900/40 text-indigo-200 border border-indigo-800"
};
const dt = new Date(ts);
const tsText = Number.isNaN(dt.getTime()) ? ts : dt.toLocaleString();
const payload = obj.payload || {};
line.className = "rounded-lg border border-slate-800 bg-slate-900/80 p-2 space-y-2";
const header = document.createElement("div");
header.className = "flex flex-wrap items-center gap-2";
const typePill = document.createElement("span");
typePill.className = `px-2 py-0.5 rounded text-[10px] font-semibold ${typeColors[obj.event_type] || "bg-slate-800 text-slate-200 border border-slate-700"}`;
typePill.textContent = obj.event_type;
const stepPill = document.createElement("span");
stepPill.className = "px-2 py-0.5 rounded text-[10px] bg-slate-800 text-slate-300 border border-slate-700";
stepPill.textContent = `step ${step}`;
const tsSpan = document.createElement("span");
tsSpan.className = "text-[10px] text-slate-400";
tsSpan.textContent = tsText;
header.appendChild(typePill);
header.appendChild(stepPill);
header.appendChild(tsSpan);
const jobLine = document.createElement("div");
jobLine.className = "text-[11px] text-slate-300 font-medium";
jobLine.textContent = obj.job_id;
const body = document.createElement("pre");
body.className = "bg-slate-950 border border-slate-800 rounded p-2 text-[11px] text-slate-200 overflow-auto";
body.textContent = JSON.stringify(payload, null, 2);
line.appendChild(header);
line.appendChild(jobLine);
line.appendChild(body);
}
eventsEl.prepend(line);
while (eventsEl.childNodes.length > 400) {
eventsEl.removeChild(eventsEl.lastChild);
}
}
function scheduleWsReconnect() {
if (state.wsReconnectTimer || !state.token) return;
state.wsReconnectTimer = setTimeout(() => {
state.wsReconnectTimer = null;
connectWs();
}, 1200);
}
function updateLatestVisualFromEvent(ev) {
if (!ev || ev.event_type !== "visual_update") return;
if (!state.selectedJobId || ev.job_id !== state.selectedJobId) return;
const imagePath = ev.payload && ev.payload.image_meta && ev.payload.image_meta.path;
if (!imagePath) return;
const q = encodeURIComponent(imagePath);
latestVisualEl.src = `/api/jobs/${state.selectedJobId}/artifact?path=${q}&token=${encodeURIComponent(state.token)}`;
}
async function refreshJobs() {
const payload = await api("/api/jobs?limit=100");
state.jobs = payload.jobs || [];
if (!state.selectedJobId && state.jobs.length > 0) state.selectedJobId = state.jobs[0].job_id;
renderJobs();
}
async function refreshStats() {
const payload = await api("/api/stats");
renderStats(payload);
}
async function refreshJobDetail() {
if (!state.selectedJobId) return;
const [job, events] = await Promise.all([
api(`/api/jobs/${state.selectedJobId}`),
api(`/api/jobs/${state.selectedJobId}/events?limit=120`)
]);
jobDetailEl.textContent = JSON.stringify(job, null, 2);
eventsEl.innerHTML = "";
const list = (events.events || []).slice().reverse();
for (const ev of list) pushEventLine(ev);
const visual = list.find((ev) => ev.event_type === "visual_update");
if (visual) updateLatestVisualFromEvent(visual);
}
function connectWs() {
if (!state.token) return;
if (state.ws && (state.ws.readyState === WebSocket.OPEN || state.ws.readyState === WebSocket.CONNECTING)) {
return;
}
const scheme = location.protocol === "https:" ? "wss" : "ws";
const ws = new WebSocket(`${scheme}://${location.host}/ws?token=${encodeURIComponent(state.token)}`);
state.ws = ws;
ws.onmessage = async (event) => {
try {
const payload = JSON.parse(event.data);
if (!payload || payload.event_type === "connected") return;
pushEventLine(payload);
updateLatestVisualFromEvent(payload);
if (!state.selectedJobId || payload.job_id === state.selectedJobId) {
await refreshJobDetail();
}
await refreshJobs();
await refreshStats();
} catch (err) {
console.error(err);
}
};
ws.onclose = () => {
if (state.ws === ws) state.ws = null;
if (manuallyClosedSockets.has(ws)) {
manuallyClosedSockets.delete(ws);
return;
}
scheduleWsReconnect();
};
}
async function fullRefresh() {
await refreshJobs();
await refreshStats();
await refreshJobDetail();
}
async function connect() {
state.token = tokenInput.value.trim();
localStorage.setItem("screenjob_token", state.token);
if (state.ws) {
manuallyClosedSockets.add(state.ws);
try { state.ws.close(); } catch (_) {}
state.ws = null;
}
if (state.wsReconnectTimer) {
clearTimeout(state.wsReconnectTimer);
state.wsReconnectTimer = null;
}
await fullRefresh();
connectWs();
}
function syncEventsViewToggle() {
eventsViewToggle.checked = state.eventsViewMode === "beautiful";
}
saveTokenBtn.addEventListener("click", () => connect().catch((err) => alert(err.message)));
refreshBtn.addEventListener("click", () => fullRefresh().catch((err) => alert(err.message)));
eventsViewToggle.addEventListener("change", () => {
state.eventsViewMode = eventsViewToggle.checked ? "beautiful" : "raw";
localStorage.setItem("screenjob_events_view_mode", state.eventsViewMode);
refreshJobDetail().catch((err) => alert(err.message));
});
syncEventsViewToggle();
if (state.token) connect().catch(() => {});
</script>
</body>
</html>
""".replace("__MONITOR_HOST__", host_suffix)
def monitoring_js_path() -> Path:
return _JS_PATH

View File

@@ -0,0 +1,106 @@
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>ScreenJob Monitor</title>
<script src="https://cdn.tailwindcss.com"></script>
</head>
<body class="bg-slate-950 text-slate-100 min-h-screen">
<div class="max-w-7xl mx-auto p-4 md:p-8 space-y-6">
<header class="flex flex-col gap-3 md:flex-row md:items-center md:justify-between">
<div>
<h1 class="text-2xl md:text-3xl font-bold tracking-tight">ScreenJob Monitor<span class="text-slate-400 text-base md:text-lg font-medium">__MONITOR_HOST__</span></h1>
<p class="text-slate-400 text-sm">Read-only monitoring for active and historical tasks.</p>
</div>
<div class="flex flex-col md:flex-row gap-2 md:items-center">
<input id="tokenInput" type="password" placeholder="SCREENJOB_TOKEN" class="bg-slate-900 border border-slate-700 rounded px-3 py-2 text-sm w-72" />
<button id="saveTokenBtn" class="bg-cyan-500 hover:bg-cyan-400 text-slate-950 font-semibold px-4 py-2 rounded">Connect</button>
</div>
</header>
<section class="grid grid-cols-2 md:grid-cols-6 gap-3" id="stats"></section>
<section class="space-y-3">
<div class="flex items-center justify-between gap-3">
<h2 class="font-semibold">Analytics</h2>
<div id="analyticsMeta" class="text-[11px] text-slate-400"></div>
</div>
<div id="analyticsSummary" class="grid grid-cols-2 md:grid-cols-4 gap-3"></div>
<div class="grid grid-cols-1 xl:grid-cols-2 gap-4">
<div class="bg-slate-900/70 border border-slate-800 rounded-xl p-4 space-y-3">
<div class="flex items-center justify-between gap-3">
<h3 class="font-semibold text-sm">Success by Objective Category</h3>
<div id="analyticsCategorySummary" class="text-[11px] text-slate-400"></div>
</div>
<div id="analyticsCategories" class="space-y-3"></div>
</div>
<div class="bg-slate-900/70 border border-slate-800 rounded-xl p-4 space-y-3">
<div class="flex items-center justify-between gap-3">
<h3 class="font-semibold text-sm">Avg Steps / Cost Over Time</h3>
<div id="analyticsTrendSummary" class="text-[11px] text-slate-400"></div>
</div>
<div id="analyticsTrends" class="space-y-4"></div>
</div>
</div>
</section>
<section class="grid grid-cols-1 lg:grid-cols-5 gap-4">
<div class="lg:col-span-2 bg-slate-900/70 border border-slate-800 rounded-xl p-4">
<div class="flex items-center justify-between mb-3">
<h2 class="font-semibold">Jobs</h2>
<button id="refreshBtn" class="text-xs bg-slate-800 px-2 py-1 rounded">Refresh</button>
</div>
<div id="jobList" class="space-y-2 max-h-[62vh] overflow-auto"></div>
</div>
<div class="lg:col-span-3 bg-slate-900/70 border border-slate-800 rounded-xl p-4 space-y-3">
<h2 class="font-semibold">Job Detail</h2>
<pre id="jobDetail" class="bg-slate-950 border border-slate-800 rounded p-3 text-xs overflow-auto max-h-[24vh]"></pre>
<h3 class="font-semibold text-sm">Latest Visual</h3>
<div class="bg-slate-950 border border-slate-800 rounded p-2">
<img id="latestVisual" alt="Latest visual update" class="max-h-[24vh] w-full object-contain rounded" />
</div>
<div class="flex items-center justify-between">
<h3 class="font-semibold text-sm">Replay</h3>
<div id="replayStatus" class="text-[11px] text-slate-400">No replay loaded.</div>
</div>
<div class="flex flex-wrap items-center gap-2">
<button id="replayPlayBtn" class="text-xs bg-slate-800 px-2 py-1 rounded">Play</button>
<button id="replayPrevBtn" class="text-xs bg-slate-800 px-2 py-1 rounded">Prev</button>
<button id="replayNextBtn" class="text-xs bg-slate-800 px-2 py-1 rounded">Next</button>
<label class="text-xs text-slate-300 flex items-center gap-1">
Speed
<select id="replaySpeed" class="bg-slate-900 border border-slate-700 rounded px-1 py-0.5">
<option value="0.5">0.5x</option>
<option value="1" selected>1.0x</option>
<option value="1.5">1.5x</option>
<option value="2">2.0x</option>
</select>
</label>
</div>
<input id="replaySeek" type="range" min="0" max="0" value="0" class="w-full accent-cyan-400" />
<div class="bg-slate-950 border border-slate-800 rounded p-2">
<div class="relative w-full min-h-[180px] bg-black/40 rounded">
<img id="replayVisual" alt="Replay frame" class="max-h-[30vh] w-full object-contain rounded" />
<svg id="replayOverlay" class="absolute inset-0 w-full h-full pointer-events-none" preserveAspectRatio="xMidYMid meet"></svg>
</div>
<div id="replayFrameMeta" class="text-[11px] text-slate-400 mt-2"></div>
<div id="replayFrameEvents" class="mt-2 space-y-1"></div>
</div>
<div class="flex items-center justify-between">
<h3 class="font-semibold text-sm">Live Events</h3>
<label for="eventsViewToggle" class="flex items-center gap-2 text-xs text-slate-300 cursor-pointer select-none">
<span>Raw</span>
<input id="eventsViewToggle" type="checkbox" class="accent-cyan-400 h-4 w-4" />
<span>Beautiful</span>
</label>
</div>
<div id="events" class="bg-slate-950 border border-slate-800 rounded p-3 text-xs overflow-auto max-h-[36vh] space-y-1"></div>
</div>
</section>
</div>
<script src="/ui/monitoring.js"></script>
</body>
</html>

625
src/ui_assets/monitoring.js Normal file
View File

@@ -0,0 +1,625 @@
const tokenInput = document.getElementById("tokenInput");
const saveTokenBtn = document.getElementById("saveTokenBtn");
const refreshBtn = document.getElementById("refreshBtn");
const jobListEl = document.getElementById("jobList");
const jobDetailEl = document.getElementById("jobDetail");
const eventsEl = document.getElementById("events");
const statsEl = document.getElementById("stats");
const latestVisualEl = document.getElementById("latestVisual");
const eventsViewToggle = document.getElementById("eventsViewToggle");
const replayVisualEl = document.getElementById("replayVisual");
const replayOverlayEl = document.getElementById("replayOverlay");
const replayFrameMetaEl = document.getElementById("replayFrameMeta");
const replayFrameEventsEl = document.getElementById("replayFrameEvents");
const replayStatusEl = document.getElementById("replayStatus");
const replayPlayBtn = document.getElementById("replayPlayBtn");
const replayPrevBtn = document.getElementById("replayPrevBtn");
const replayNextBtn = document.getElementById("replayNextBtn");
const replaySpeedEl = document.getElementById("replaySpeed");
const replaySeekEl = document.getElementById("replaySeek");
const analyticsMetaEl = document.getElementById("analyticsMeta");
const analyticsSummaryEl = document.getElementById("analyticsSummary");
const analyticsCategorySummaryEl = document.getElementById("analyticsCategorySummary");
const analyticsCategoriesEl = document.getElementById("analyticsCategories");
const analyticsTrendSummaryEl = document.getElementById("analyticsTrendSummary");
const analyticsTrendsEl = document.getElementById("analyticsTrends");
const state = {
token: localStorage.getItem("screenjob_token") || "",
jobs: [],
selectedJobId: null,
ws: null,
wsReconnectTimer: null,
eventsViewMode: localStorage.getItem("screenjob_events_view_mode") === "beautiful" ? "beautiful" : "raw",
replay: {
frames: [],
trailingEvents: [],
frameIndex: 0,
isPlaying: false,
speed: 1,
timer: null
}
};
const manuallyClosedSockets = new WeakSet();
const analyticsRefreshEvents = new Set(["job_finished", "job_failed", "job_rejected"]);
tokenInput.value = state.token;
function authHeaders() {
return { "Authorization": "Bearer " + state.token };
}
async function api(path, opts = {}) {
if (!state.token) throw new Error("Token required");
const headers = Object.assign({}, authHeaders(), opts.headers || {});
const response = await fetch(path, Object.assign({}, opts, { headers }));
if (!response.ok) throw new Error(await response.text());
return response.json();
}
function renderStats(stats) {
const cards = [
["Total Jobs", stats.total_jobs || 0],
["Running", stats.running_jobs || 0],
["Completed", stats.completed_jobs || 0],
["Failed", stats.failed_jobs || 0],
["Cancelled", stats.cancelled_jobs || 0],
["Total Cost (USD)", Number(stats.total_estimated_cost || 0).toFixed(4)]
];
statsEl.innerHTML = cards.map(([name, val]) => `
<div class="bg-slate-900/70 border border-slate-800 rounded-xl p-3">
<div class="text-slate-400 text-xs">${name}</div>
<div class="text-lg font-semibold">${val}</div>
</div>
`).join("");
}
function escapeHtml(value) {
return String(value ?? "").replace(/[&<>"']/g, (ch) => ({
"&": "&amp;",
"<": "&lt;",
">": "&gt;",
'"': "&quot;",
"'": "&#39;"
})[ch]);
}
function formatNumber(value, digits = 2) {
const num = Number(value);
return Number.isFinite(num) ? num.toFixed(digits) : "—";
}
function formatCurrency(value, digits = 6) {
const num = Number(value);
return Number.isFinite(num) ? `$${num.toFixed(digits)}` : "—";
}
function formatPercent(value) {
const num = Number(value);
return Number.isFinite(num) ? `${num.toFixed(1)}%` : "—";
}
function formatDateLabel(value) {
const dt = new Date(value);
if (Number.isNaN(dt.getTime())) return String(value || "—");
return dt.toLocaleDateString(undefined, { month: "short", day: "numeric" });
}
function renderMetricCard(label, value) {
return `
<div class="bg-slate-950 border border-slate-800 rounded-xl p-3">
<div class="text-[11px] uppercase tracking-wide text-slate-400">${escapeHtml(label)}</div>
<div class="text-xl font-semibold mt-1">${escapeHtml(value)}</div>
</div>
`;
}
function renderLineChart(title, points, options = {}) {
const color = options.color || "#22d3ee";
const valueLabel = options.valueLabel || "";
const sourcePoints = Array.isArray(points)
? points.filter((point) => Number.isFinite(Number(point.value)))
: [];
if (!sourcePoints.length) {
return `
<div class="rounded-lg border border-slate-800 bg-slate-950/70 p-3">
<div class="flex items-center justify-between gap-3">
<div>
<div class="text-xs text-slate-400">${escapeHtml(title)}</div>
<div class="text-sm text-slate-200 font-semibold">No data yet</div>
</div>
</div>
</div>
`;
}
const width = 640;
const height = 220;
const margin = { top: 20, right: 18, bottom: 34, left: 44 };
const values = sourcePoints.map((point) => Number(point.value));
const minValue = Math.min(...values);
const maxValue = Math.max(...values);
const span = maxValue - minValue || 1;
const chartWidth = width - margin.left - margin.right;
const chartHeight = height - margin.top - margin.bottom;
const xStep = sourcePoints.length > 1 ? chartWidth / (sourcePoints.length - 1) : 0;
const coords = sourcePoints.map((point, index) => ({
x: margin.left + (index * xStep),
y: margin.top + ((maxValue - Number(point.value)) / span) * chartHeight,
}));
const linePath = coords.map((point, index) => `${index === 0 ? "M" : "L"} ${point.x} ${point.y}`).join(" ");
const baseline = height - margin.bottom;
const midIndex = Math.floor(sourcePoints.length / 2);
const xLabels = [
{ index: 0, label: sourcePoints[0].label },
{ index: midIndex, label: sourcePoints[midIndex].label },
{ index: sourcePoints.length - 1, label: sourcePoints[sourcePoints.length - 1].label },
].filter((item, index, array) => item.label && array.findIndex((candidate) => candidate.index === item.index) === index);
const minLabel = options.formatValue ? options.formatValue(minValue) : formatNumber(minValue, 2);
const maxLabel = options.formatValue ? options.formatValue(maxValue) : formatNumber(maxValue, 2);
const latest = sourcePoints[sourcePoints.length - 1];
const latestValue = options.formatValue ? options.formatValue(latest.value) : formatNumber(latest.value, 2);
return `
<div class="rounded-lg border border-slate-800 bg-slate-950/70 p-3 space-y-2">
<div class="flex items-center justify-between gap-3">
<div>
<div class="text-xs text-slate-400">${escapeHtml(title)}</div>
<div class="text-sm text-slate-200 font-semibold">${escapeHtml(latestValue)}${valueLabel ? ` <span class="text-slate-500 font-normal">${escapeHtml(valueLabel)}</span>` : ""}</div>
</div>
<div class="text-[11px] text-slate-400 text-right">
<div>${escapeHtml(sourcePoints.length)} points</div>
<div>${escapeHtml(minLabel)} - ${escapeHtml(maxLabel)}</div>
</div>
</div>
<svg viewBox="0 0 ${width} ${height}" class="w-full h-56">
${Array.from({ length: 4 }, (_, idx) => {
const y = margin.top + (chartHeight / 3) * idx;
return `<line x1="${margin.left}" y1="${y}" x2="${width - margin.right}" y2="${y}" stroke="rgba(51, 65, 85, 0.7)" stroke-width="1" />`;
}).join("")}
<line x1="${margin.left}" y1="${baseline}" x2="${width - margin.right}" y2="${baseline}" stroke="rgba(71, 85, 105, 0.8)" stroke-width="1.5" />
<path d="${linePath}" fill="none" stroke="${color}" stroke-width="3" stroke-linecap="round" stroke-linejoin="round" />
${coords.map((point) => `
<circle cx="${point.x}" cy="${point.y}" r="4.5" fill="${color}" />
`).join("")}
<text x="${margin.left - 8}" y="${margin.top + 4}" text-anchor="end" class="fill-slate-400 text-[10px]">${escapeHtml(maxLabel)}</text>
<text x="${margin.left - 8}" y="${baseline}" text-anchor="end" class="fill-slate-400 text-[10px]">${escapeHtml(minLabel)}</text>
${xLabels.map((item) => `
<text x="${coords[item.index].x}" y="${height - 10}" text-anchor="middle" class="fill-slate-500 text-[10px]">${escapeHtml(formatDateLabel(item.label))}</text>
`).join("")}
</svg>
</div>
`;
}
function renderAnalytics(payload) {
const analytics = payload || {};
const categories = Array.isArray(analytics.by_category) ? analytics.by_category : [];
const timeline = Array.isArray(analytics.timeline) ? analytics.timeline : [];
const finishedCategories = categories.filter((row) => Number(row.finished_jobs || 0) > 0);
if (analyticsMetaEl) {
analyticsMetaEl.textContent = analytics.generated_at
? `Updated ${new Date(analytics.generated_at).toLocaleString()}`
: "Historical snapshot";
}
analyticsSummaryEl.innerHTML = [
renderMetricCard("Finished Jobs", analytics.finished_jobs || 0),
renderMetricCard("Success Rate", formatPercent(analytics.success_rate)),
renderMetricCard("Avg Steps", formatNumber(analytics.avg_steps, 1)),
renderMetricCard("Avg Cost", formatCurrency(analytics.avg_cost_usd)),
].join("");
analyticsCategorySummaryEl.textContent = finishedCategories.length
? `${finishedCategories.length} categories`
: "No finished jobs yet";
if (finishedCategories.length) {
analyticsCategoriesEl.innerHTML = finishedCategories.map((row) => {
const successRate = Number(row.success_rate || 0);
const completed = Number(row.completed_jobs || 0);
const finished = Number(row.finished_jobs || 0);
const total = Number(row.total_jobs || 0);
const avgSteps = row.avg_steps == null ? "—" : formatNumber(row.avg_steps, 1);
const avgCost = row.avg_cost_usd == null ? "—" : formatCurrency(row.avg_cost_usd);
return `
<div class="rounded-lg border border-slate-800 bg-slate-950/70 p-3 space-y-2">
<div class="flex items-start justify-between gap-3">
<div>
<div class="font-medium">${escapeHtml(row.label || "Other")}</div>
<div class="text-[11px] text-slate-400">${finished} finished · ${completed} completed · ${total} total</div>
</div>
<div class="text-right">
<div class="text-base font-semibold">${formatPercent(successRate)}</div>
<div class="text-[11px] text-slate-500">success rate</div>
</div>
</div>
<div class="h-2 rounded bg-slate-800 overflow-hidden">
<div class="h-full rounded bg-cyan-400" style="width: ${Math.max(0, Math.min(successRate, 100))}%"></div>
</div>
<div class="grid grid-cols-2 gap-2 text-[11px] text-slate-300">
<div>Avg steps: ${escapeHtml(avgSteps)}</div>
<div>Avg cost: ${escapeHtml(avgCost)}</div>
</div>
</div>
`;
}).join("");
} else {
analyticsCategoriesEl.innerHTML = `
<div class="rounded-lg border border-dashed border-slate-800 bg-slate-950/70 p-4 text-sm text-slate-400">
No finished jobs yet.
</div>
`;
}
analyticsTrendSummaryEl.textContent = timeline.length ? `${timeline.length} days` : "No daily data yet";
analyticsTrendsEl.innerHTML = [
renderLineChart("Average steps per day", timeline.map((row) => ({ label: row.label, value: row.avg_steps })), { color: "#38bdf8" }),
renderLineChart("Average cost per day", timeline.map((row) => ({ label: row.label, value: row.avg_cost_usd })), {
color: "#34d399",
valueLabel: "USD",
formatValue: (value) => formatCurrency(value),
}),
].join("");
}
function renderJobs() {
jobListEl.innerHTML = state.jobs.map((job) => {
const active = job.job_id === state.selectedJobId;
return `
<button data-job-id="${job.job_id}" class="w-full text-left p-3 rounded border ${active ? "border-cyan-400 bg-slate-800" : "border-slate-800 bg-slate-950"} hover:bg-slate-800">
<div class="flex items-center justify-between">
<span class="font-medium">${job.job_id}</span>
<span class="text-xs px-2 py-0.5 rounded bg-slate-700">${job.status}</span>
</div>
<div class="text-xs text-slate-400 mt-1">${job.model}</div>
<div class="text-xs text-slate-300 mt-1 line-clamp-2">${job.objective}</div>
<div class="text-xs text-slate-500 mt-1">$${Number((job.usage && job.usage.estimated_cost_usd) || 0).toFixed(6)}</div>
</button>
`;
}).join("");
for (const btn of jobListEl.querySelectorAll("button[data-job-id]")) {
btn.addEventListener("click", () => {
state.selectedJobId = btn.getAttribute("data-job-id");
renderJobs();
refreshJobDetail();
});
}
}
function pushEventLine(obj) {
if (!obj || !obj.job_id || !obj.event_type) return;
const line = document.createElement("div");
const ts = obj.ts || "-";
const step = (obj.step ?? "-");
if (state.eventsViewMode === "raw") {
line.className = "border-b border-slate-800 pb-1";
line.textContent = `[${ts}] ${obj.job_id} step=${step} ${obj.event_type} ${JSON.stringify(obj.payload || {})}`;
} else {
const typeColors = {
info: "bg-sky-900/50 text-sky-200 border border-sky-800",
warning: "bg-amber-900/40 text-amber-200 border border-amber-800",
error: "bg-rose-900/40 text-rose-200 border border-rose-800",
visual_update: "bg-emerald-900/40 text-emerald-200 border border-emerald-800",
tool_call: "bg-violet-900/40 text-violet-200 border border-violet-800",
tool_result: "bg-indigo-900/40 text-indigo-200 border border-indigo-800"
};
const dt = new Date(ts);
const tsText = Number.isNaN(dt.getTime()) ? ts : dt.toLocaleString();
const payload = obj.payload || {};
line.className = "rounded-lg border border-slate-800 bg-slate-900/80 p-2 space-y-2";
const header = document.createElement("div");
header.className = "flex flex-wrap items-center gap-2";
const typePill = document.createElement("span");
typePill.className = `px-2 py-0.5 rounded text-[10px] font-semibold ${typeColors[obj.event_type] || "bg-slate-800 text-slate-200 border border-slate-700"}`;
typePill.textContent = obj.event_type;
const stepPill = document.createElement("span");
stepPill.className = "px-2 py-0.5 rounded text-[10px] bg-slate-800 text-slate-300 border border-slate-700";
stepPill.textContent = `step ${step}`;
const tsSpan = document.createElement("span");
tsSpan.className = "text-[10px] text-slate-400";
tsSpan.textContent = tsText;
header.appendChild(typePill);
header.appendChild(stepPill);
header.appendChild(tsSpan);
const jobLine = document.createElement("div");
jobLine.className = "text-[11px] text-slate-300 font-medium";
jobLine.textContent = obj.job_id;
const body = document.createElement("pre");
body.className = "bg-slate-950 border border-slate-800 rounded p-2 text-[11px] text-slate-200 overflow-auto";
body.textContent = JSON.stringify(payload, null, 2);
line.appendChild(header);
line.appendChild(jobLine);
line.appendChild(body);
}
eventsEl.prepend(line);
while (eventsEl.childNodes.length > 400) {
eventsEl.removeChild(eventsEl.lastChild);
}
}
function clearReplayTimer() {
if (state.replay.timer) {
clearTimeout(state.replay.timer);
state.replay.timer = null;
}
}
function stopReplay() {
state.replay.isPlaying = false;
clearReplayTimer();
replayPlayBtn.textContent = "Play";
}
function replayImageSrc(path) {
const q = encodeURIComponent(path || "");
return `/api/jobs/${state.selectedJobId}/artifact?path=${q}&token=${encodeURIComponent(state.token)}`;
}
function renderReplayOverlay(frame) {
replayOverlayEl.innerHTML = "";
const size = frame && frame.screen_size;
if (!frame || !frame.is_fullscreen || !size || !size.width || !size.height) {
replayOverlayEl.removeAttribute("viewBox");
return;
}
replayOverlayEl.setAttribute("viewBox", `0 0 ${size.width} ${size.height}`);
const overlayEvents = Array.isArray(frame.overlays) ? frame.overlays : [];
const points = overlayEvents.filter((ev) => ev && ev.kind === "tool_result" && ev.tool === "click" && ev.click);
for (const ev of points) {
const x = Number(ev.click.x);
const y = Number(ev.click.y);
if (!Number.isFinite(x) || !Number.isFinite(y)) continue;
const halo = document.createElementNS("http://www.w3.org/2000/svg", "circle");
halo.setAttribute("cx", String(x));
halo.setAttribute("cy", String(y));
halo.setAttribute("r", "14");
halo.setAttribute("fill", "rgba(14, 165, 233, 0.22)");
halo.setAttribute("stroke", "#38bdf8");
halo.setAttribute("stroke-width", "2");
const dot = document.createElementNS("http://www.w3.org/2000/svg", "circle");
dot.setAttribute("cx", String(x));
dot.setAttribute("cy", String(y));
dot.setAttribute("r", "4");
dot.setAttribute("fill", "#38bdf8");
replayOverlayEl.appendChild(halo);
replayOverlayEl.appendChild(dot);
}
}
function renderReplayFrameEvents(frame) {
replayFrameEventsEl.innerHTML = "";
if (!frame) return;
const events = Array.isArray(frame.overlays) ? frame.overlays : [];
const shown = events.slice(-8);
for (const ev of shown) {
const row = document.createElement("div");
row.className = "text-[11px] rounded border border-slate-800 bg-slate-900/80 px-2 py-1";
row.textContent = ev.label || `${ev.kind || "event"} ${ev.tool || ""}`.trim();
replayFrameEventsEl.appendChild(row);
}
if (!shown.length) {
const empty = document.createElement("div");
empty.className = "text-[11px] text-slate-500";
empty.textContent = "No overlay events for this frame.";
replayFrameEventsEl.appendChild(empty);
}
}
function setReplayFrame(index) {
const frames = state.replay.frames;
if (!frames.length) {
replayVisualEl.removeAttribute("src");
replayOverlayEl.innerHTML = "";
replayFrameMetaEl.textContent = "No replay frames.";
replaySeekEl.value = "0";
replaySeekEl.max = "0";
replayStatusEl.textContent = "No replay loaded.";
return;
}
const bounded = Math.max(0, Math.min(index, frames.length - 1));
state.replay.frameIndex = bounded;
const frame = frames[bounded];
replayVisualEl.src = replayImageSrc(frame.image_path);
replayFrameMetaEl.textContent = `Frame ${bounded + 1}/${frames.length} | step ${frame.step} | ${frame.kind} | ${frame.ts}`;
replaySeekEl.max = String(Math.max(0, frames.length - 1));
replaySeekEl.value = String(bounded);
replayStatusEl.textContent = state.replay.isPlaying ? "Playing replay." : "Replay ready.";
renderReplayOverlay(frame);
renderReplayFrameEvents(frame);
}
function advanceReplay() {
const frames = state.replay.frames;
if (!state.replay.isPlaying || !frames.length) return;
if (state.replay.frameIndex >= frames.length - 1) {
stopReplay();
setReplayFrame(frames.length - 1);
replayStatusEl.textContent = "Replay finished.";
return;
}
setReplayFrame(state.replay.frameIndex + 1);
clearReplayTimer();
const delayMs = Math.max(120, Math.round(700 / (state.replay.speed || 1)));
state.replay.timer = setTimeout(advanceReplay, delayMs);
}
function toggleReplayPlay() {
if (!state.replay.frames.length) return;
if (state.replay.isPlaying) {
stopReplay();
setReplayFrame(state.replay.frameIndex);
return;
}
state.replay.isPlaying = true;
replayPlayBtn.textContent = "Pause";
replayStatusEl.textContent = "Playing replay.";
advanceReplay();
}
function resetReplay(payload) {
stopReplay();
const replayPayload = payload || {};
state.replay.frames = Array.isArray(replayPayload.frames) ? replayPayload.frames : [];
state.replay.trailingEvents = Array.isArray(replayPayload.trailing_events) ? replayPayload.trailing_events : [];
state.replay.frameIndex = 0;
setReplayFrame(0);
}
function scheduleWsReconnect() {
if (state.wsReconnectTimer || !state.token) return;
state.wsReconnectTimer = setTimeout(() => {
state.wsReconnectTimer = null;
connectWs();
}, 1200);
}
function updateLatestVisualFromEvent(ev) {
if (!ev || ev.event_type !== "visual_update") return;
if (!state.selectedJobId || ev.job_id !== state.selectedJobId) return;
const imagePath = ev.payload && ev.payload.image_meta && ev.payload.image_meta.path;
if (!imagePath) return;
const q = encodeURIComponent(imagePath);
latestVisualEl.src = `/api/jobs/${state.selectedJobId}/artifact?path=${q}&token=${encodeURIComponent(state.token)}`;
}
async function refreshJobs() {
const payload = await api("/api/jobs?limit=100");
state.jobs = payload.jobs || [];
if (!state.selectedJobId && state.jobs.length > 0) state.selectedJobId = state.jobs[0].job_id;
renderJobs();
}
async function refreshStats() {
const payload = await api("/api/stats");
renderStats(payload);
}
async function refreshAnalytics() {
const payload = await api("/api/analytics");
renderAnalytics(payload);
}
async function refreshJobDetail() {
if (!state.selectedJobId) return;
const [job, events, replay] = await Promise.all([
api(`/api/jobs/${state.selectedJobId}`),
api(`/api/jobs/${state.selectedJobId}/events?limit=120`),
api(`/api/jobs/${state.selectedJobId}/replay?limit=5000`)
]);
jobDetailEl.textContent = JSON.stringify(job, null, 2);
eventsEl.innerHTML = "";
const list = (events.events || []).slice().reverse();
for (const ev of list) pushEventLine(ev);
const visual = list.find((ev) => ev.event_type === "visual_update");
if (visual) updateLatestVisualFromEvent(visual);
resetReplay(replay);
}
function connectWs() {
if (!state.token) return;
if (state.ws && (state.ws.readyState === WebSocket.OPEN || state.ws.readyState === WebSocket.CONNECTING)) {
return;
}
const scheme = location.protocol === "https:" ? "wss" : "ws";
const ws = new WebSocket(`${scheme}://${location.host}/ws?token=${encodeURIComponent(state.token)}`);
state.ws = ws;
ws.onmessage = async (event) => {
try {
const payload = JSON.parse(event.data);
if (!payload || payload.event_type === "connected") return;
pushEventLine(payload);
updateLatestVisualFromEvent(payload);
if (!state.selectedJobId || payload.job_id === state.selectedJobId) {
await refreshJobDetail();
}
await refreshJobs();
await refreshStats();
if (analyticsRefreshEvents.has(payload.event_type)) {
await refreshAnalytics();
}
} catch (err) {
console.error(err);
}
};
ws.onclose = () => {
if (state.ws === ws) state.ws = null;
if (manuallyClosedSockets.has(ws)) {
manuallyClosedSockets.delete(ws);
return;
}
scheduleWsReconnect();
};
}
async function fullRefresh() {
await refreshJobs();
await refreshStats();
await refreshAnalytics();
await refreshJobDetail();
}
async function connect() {
state.token = tokenInput.value.trim();
localStorage.setItem("screenjob_token", state.token);
if (state.ws) {
manuallyClosedSockets.add(state.ws);
try { state.ws.close(); } catch (_) {}
state.ws = null;
}
if (state.wsReconnectTimer) {
clearTimeout(state.wsReconnectTimer);
state.wsReconnectTimer = null;
}
await fullRefresh();
connectWs();
}
function syncEventsViewToggle() {
eventsViewToggle.checked = state.eventsViewMode === "beautiful";
}
saveTokenBtn.addEventListener("click", () => connect().catch((err) => alert(err.message)));
refreshBtn.addEventListener("click", () => fullRefresh().catch((err) => alert(err.message)));
eventsViewToggle.addEventListener("change", () => {
state.eventsViewMode = eventsViewToggle.checked ? "beautiful" : "raw";
localStorage.setItem("screenjob_events_view_mode", state.eventsViewMode);
refreshJobDetail().catch((err) => alert(err.message));
});
replayPlayBtn.addEventListener("click", () => toggleReplayPlay());
replayPrevBtn.addEventListener("click", () => {
stopReplay();
setReplayFrame(state.replay.frameIndex - 1);
});
replayNextBtn.addEventListener("click", () => {
stopReplay();
setReplayFrame(state.replay.frameIndex + 1);
});
replaySpeedEl.addEventListener("change", () => {
const speed = Number(replaySpeedEl.value);
state.replay.speed = Number.isFinite(speed) && speed > 0 ? speed : 1;
if (state.replay.isPlaying) {
clearReplayTimer();
advanceReplay();
}
});
replaySeekEl.addEventListener("input", () => {
stopReplay();
setReplayFrame(Number(replaySeekEl.value || 0));
});
syncEventsViewToggle();
resetReplay(null);
if (state.token) connect().catch(() => {});

View File

@@ -15,10 +15,76 @@ function Test-EnvVarLine {
return [bool](Select-String -Path $FilePath -Pattern ("^\s*" + [regex]::Escape($Name) + "=") -Quiet)
}
if (-not (Get-Command python -ErrorAction SilentlyContinue)) {
throw "Python was not found in PATH. Install Python 3.11+ and retry."
function Resolve-PythonExecutable {
$venvPython = Join-Path $scriptDir ".venv\Scripts\python.exe"
if (Test-Path -LiteralPath $venvPython) {
return $venvPython
}
$pythonCmd = Get-Command python -ErrorAction SilentlyContinue
if ($null -ne $pythonCmd -and (Test-Path -LiteralPath $pythonCmd.Source)) {
return $pythonCmd.Source
}
$candidatePyLaunchers = @()
$pyFromPath = Get-Command py -ErrorAction SilentlyContinue
if ($null -ne $pyFromPath -and (Test-Path -LiteralPath $pyFromPath.Source)) {
$candidatePyLaunchers += $pyFromPath.Source
}
$candidatePyLaunchers += "C:\Windows\py.exe"
if ($scriptDir -match "^[A-Za-z]:\\Users\\[^\\]+") {
$repoUserHome = $Matches[0]
$candidatePyLaunchers += (Join-Path $repoUserHome "AppData\Local\Programs\Python\Launcher\py.exe")
}
foreach ($pyLauncher in ($candidatePyLaunchers | Select-Object -Unique)) {
if (-not (Test-Path -LiteralPath $pyLauncher)) {
continue
}
try {
$resolved = (& $pyLauncher -3 -c "import sys; print(sys.executable)" 2>$null | Select-Object -Last 1).Trim()
if ($resolved -and (Test-Path -LiteralPath $resolved)) {
return $resolved
}
} catch {
continue
}
}
$candidatePythonPaths = @()
if ($scriptDir -match "^[A-Za-z]:\\Users\\[^\\]+") {
$repoUserHome = $Matches[0]
$pythonBase = Join-Path $repoUserHome "AppData\Local\Programs\Python"
if (Test-Path -LiteralPath $pythonBase) {
$candidatePythonPaths += (Get-ChildItem -LiteralPath $pythonBase -Directory -ErrorAction SilentlyContinue |
Sort-Object Name -Descending |
ForEach-Object { Join-Path $_.FullName "python.exe" })
}
}
$candidatePythonPaths += @(
"C:\Python314\python.exe",
"C:\Python313\python.exe",
"C:\Python312\python.exe",
"C:\Python311\python.exe",
"C:\Program Files\Python314\python.exe",
"C:\Program Files\Python313\python.exe",
"C:\Program Files\Python312\python.exe",
"C:\Program Files\Python311\python.exe"
)
foreach ($candidate in ($candidatePythonPaths | Select-Object -Unique)) {
if (Test-Path -LiteralPath $candidate) {
return $candidate
}
}
throw "Python was not found. Install Python 3.11+ system-wide, or create .venv in the repo root."
}
$pythonExe = Resolve-PythonExecutable
$envFile = Join-Path $scriptDir ".env"
if (-not (Test-Path -LiteralPath $envFile)) {
Write-Warning ".env was not found at $envFile. Server startup may fail if required vars are missing."
@@ -31,5 +97,5 @@ if (-not (Test-Path -LiteralPath $envFile)) {
}
}
Write-Host "Starting ScreenJob backend on configured host/port..." -ForegroundColor Cyan
python main.py server
Write-Host "Starting ScreenJob backend with Python: $pythonExe" -ForegroundColor Cyan
& $pythonExe main.py server

11
start_backend_hidden.vbs Normal file
View File

@@ -0,0 +1,11 @@
Option Explicit
Dim shell, fso, scriptDir, psScript, command
Set shell = CreateObject("WScript.Shell")
Set fso = CreateObject("Scripting.FileSystemObject")
scriptDir = fso.GetParentFolderName(WScript.ScriptFullName)
psScript = """" & fso.BuildPath(scriptDir, "start_backend.ps1") & """"
command = "powershell.exe -NoProfile -ExecutionPolicy Bypass -WindowStyle Hidden -STA -File " & psScript
shell.Run command, 0, False

View File

@@ -0,0 +1,11 @@
Option Explicit
Dim shell, fso, scriptDir, psScript, command
Set shell = CreateObject("WScript.Shell")
Set fso = CreateObject("Scripting.FileSystemObject")
scriptDir = fso.GetParentFolderName(WScript.ScriptFullName)
psScript = """" & fso.BuildPath(scriptDir, "screenjob_tray.ps1") & """"
command = "powershell.exe -NoProfile -ExecutionPolicy Bypass -WindowStyle Hidden -STA -File " & psScript
shell.Run command, 0, False

View File

@@ -1,8 +1,11 @@
from __future__ import annotations
import json
import logging
from pathlib import Path
from typing import Any
import pytest
from PIL import Image
import src.agent as agent_module
@@ -15,8 +18,12 @@ class _DummyPyAutoGUI:
def __init__(self) -> None:
self.last_move_to: tuple[int, int] | None = None
self.last_click: tuple[int, int] | None = None
self.last_move_duration: float | None = None
self.last_click: dict[str, object] | None = None
self.last_hotkey: tuple[str, ...] | None = None
self.last_drag_to: dict[str, object] | None = None
self.last_scroll: int | None = None
self.current_position: tuple[int, int] = (640, 360)
def screenshot(self) -> Image.Image:
return Image.new("RGB", (1280, 720), color=(24, 24, 24))
@@ -26,9 +33,26 @@ class _DummyPyAutoGUI:
def moveTo(self, x: int, y: int, duration: float = 0.0) -> None: # noqa: N802
self.last_move_to = (x, y)
self.last_move_duration = duration
self.current_position = (x, y)
def click(self, x: int, y: int) -> None:
self.last_click = (x, y)
def click(
self,
x: int,
y: int,
clicks: int = 1,
interval: float = 0.0,
button: str = "left",
) -> None:
self.last_click = {"x": x, "y": y, "clicks": clicks, "interval": interval, "button": button}
self.current_position = (x, y)
def dragTo(self, x: int, y: int, duration: float = 0.0, button: str = "left") -> None: # noqa: N802
self.last_drag_to = {"x": x, "y": y, "duration": duration, "button": button}
self.current_position = (x, y)
def scroll(self, amount: int) -> None:
self.last_scroll = amount
def write(self, _: str, interval: float = 0.0) -> None:
return None
@@ -39,6 +63,10 @@ class _DummyPyAutoGUI:
def hotkey(self, *keys: str) -> None:
self.last_hotkey = tuple(keys)
def position(self):
x, y = self.current_position
return type("Point", (), {"x": x, "y": y})()
def _build_agent(tmp_path: Path, monkeypatch) -> agent_module.ScreenJobAgent:
dummy_gui = _DummyPyAutoGUI()
@@ -84,11 +112,193 @@ def test_click_supports_directional_offsets(tmp_path: Path, monkeypatch) -> None
"offset_up": "2px",
"offset_right": 7,
"offset": {"x": 3, "y": 4},
"button": "right",
"click_count": 2,
"interval_seconds": "0.5s",
"duration_seconds": "0.2s",
"sleep_after_seconds": 0,
}
)
assert click_result["ok"] is True
assert click_result["clicked"] == {"x": 110, "y": 102}
assert click_result["button"] == "right"
assert click_result["click_count"] == 2
assert click_result["interval_seconds"] == 0.5
assert click_result["duration_seconds"] == 0.2
assert agent_module.pyautogui.last_click == {
"x": 110,
"y": 102,
"clicks": 2,
"interval": 0.5,
"button": "right",
}
def test_scroll_supports_direction_and_amount(tmp_path: Path, monkeypatch) -> None:
agent = _build_agent(tmp_path, monkeypatch)
result = agent._tool_scroll(
{
"amount": 8,
"direction": "down",
"coordinate": {"x": 1400, "y": -5},
"sleep_after_seconds": 0,
}
)
assert result["ok"] is True
assert result["amount"] == -8
assert result["direction"] == "down"
assert result["moved_to"] == {"x": 1279, "y": 0}
assert agent_module.pyautogui.last_scroll == -8
def test_drag_translates_coordinates_and_button(tmp_path: Path, monkeypatch) -> None:
agent = _build_agent(tmp_path, monkeypatch)
result = agent._tool_drag(
{
"start_coordinate": {"x": -10, "y": 100},
"end_coordinate": {"x": 1285, "y": 800},
"button": "middle",
"duration_seconds": "0.3s",
"sleep_after_seconds": 0,
}
)
assert result["ok"] is True
assert result["from"] == {"x": 0, "y": 100}
assert result["to"] == {"x": 1279, "y": 719}
assert result["button"] == "middle"
assert result["duration_seconds"] == 0.3
assert agent_module.pyautogui.last_drag_to == {
"x": 1279,
"y": 719,
"duration": 0.3,
"button": "middle",
}
def test_move_mouse_clamps_target_coordinate(tmp_path: Path, monkeypatch) -> None:
agent = _build_agent(tmp_path, monkeypatch)
result = agent._tool_move_mouse({"coordinate": {"x": 1500, "y": -5}, "duration_seconds": "0.4s"})
assert result["ok"] is True
assert result["moved_to"] == {"x": 1279, "y": 0}
assert result["duration_seconds"] == 0.4
assert agent_module.pyautogui.last_move_to == (1279, 0)
def test_clipboard_get_and_set_round_trip(tmp_path: Path, monkeypatch) -> None:
agent = _build_agent(tmp_path, monkeypatch)
state = {"text": ""}
monkeypatch.setattr(agent, "_clipboard_set_text", lambda text: state.__setitem__("text", text))
monkeypatch.setattr(agent, "_clipboard_get_text", lambda: state["text"])
monkeypatch.setattr(
agent,
"_clipboard_get_metadata",
lambda: {"has_text": bool(state["text"]), "has_image": True, "available_formats": ["CF_UNICODETEXT", "CF_DIB"]},
)
set_result = agent._tool_clipboard_set({"text": "hello clipboard"})
get_result = agent._tool_clipboard_get({})
assert set_result["ok"] is True
assert set_result["length"] == 15
assert get_result["ok"] is True
assert get_result["text"] == "hello clipboard"
assert get_result["length"] == 15
assert get_result["has_text"] is True
assert get_result["has_image"] is True
assert get_result["available_formats"] == ["CF_UNICODETEXT", "CF_DIB"]
def test_clipboard_set_falls_back_to_powershell_when_native_path_fails(tmp_path: Path, monkeypatch) -> None:
agent = _build_agent(tmp_path, monkeypatch)
state = {"text": ""}
def fail_native(_: str) -> None:
raise OSError("[WinError 6] The handle is invalid.")
def shell_fallback(text: str) -> None:
state["text"] = text
monkeypatch.setattr(agent, "_clipboard_set_text", fail_native)
monkeypatch.setattr(agent, "_clipboard_set_text_via_shell", shell_fallback)
result = agent._tool_clipboard_set({"text": "Example Domain"})
assert result["ok"] is True
assert result["used_shell_fallback"] is True
assert "WinError 6" in result["native_error"]
assert state["text"] == "Example Domain"
def test_get_cursor_position_returns_current_mouse_location(tmp_path: Path, monkeypatch) -> None:
agent = _build_agent(tmp_path, monkeypatch)
agent_module.pyautogui.current_position = (321, 654)
result = agent._tool_get_cursor_position({})
assert result["ok"] is True
assert result["position"] == {"x": 321, "y": 654}
def test_get_active_window_returns_metadata_shape(tmp_path: Path, monkeypatch) -> None:
agent = _build_agent(tmp_path, monkeypatch)
monkeypatch.setattr(
agent,
"_get_active_window_info",
lambda: {
"available": True,
"hwnd": 1234,
"title": "Settings",
"class_name": "ApplicationFrameWindow",
"thread_id": 44,
"process_id": 77,
"is_visible": True,
"rect": {"left": 10, "top": 20, "right": 410, "bottom": 320, "width": 400, "height": 300},
},
)
result = agent._tool_get_active_window({})
assert result["ok"] is True
assert result["window"]["title"] == "Settings"
assert result["window"]["rect"]["width"] == 400
def test_enhance_defaults_to_small_ui_preset(tmp_path: Path, monkeypatch) -> None:
agent = _build_agent(tmp_path, monkeypatch)
result = agent._tool_enhance({"coordinate": {"x": 100, "y": 120}})
assert result["ok"] is True
meta = result["meta"]
assert meta["region"] == "small"
assert meta["mode"] == "ui"
assert meta["scale"] == 4
assert Path(meta["path"]).exists()
assert meta["target_pixel"]["x"] >= 0
assert meta["target_pixel"]["y"] >= 0
def test_enhance_supports_text_mode_and_scale_clamp(tmp_path: Path, monkeypatch) -> None:
agent = _build_agent(tmp_path, monkeypatch)
result = agent._tool_enhance(
{
"coordinate": {"x": -99, "y": 9999},
"region": "medium",
"mode": "text",
"scale": 99,
}
)
assert result["ok"] is True
meta = result["meta"]
assert meta["region"] == "medium"
assert meta["mode"] == "text"
assert meta["scale"] == 6
assert meta["requested_coord"] == {"x": -99, "y": 9999}
assert meta["source_coord"] == {"x": 0, "y": 719}
assert Path(meta["path"]).exists()
def test_press_key_supports_hotkey_combo(tmp_path: Path, monkeypatch) -> None:
@@ -98,3 +308,653 @@ def test_press_key_supports_hotkey_combo(tmp_path: Path, monkeypatch) -> None:
assert result["key"] == "win+r"
assert result["message"] == "Key combo executed."
assert agent_module.pyautogui.last_hotkey == ("win", "r")
def test_press_key_blocks_prohibited_combo(tmp_path: Path, monkeypatch) -> None:
agent = _build_agent(tmp_path, monkeypatch)
agent.options.prohibited_key_combos = {"ctrl+shift+s"}
agent.prohibited_key_combos = agent._normalize_prohibited_key_combos(agent.options.prohibited_key_combos)
result = agent._tool_press_key({"key": "ctrl+shift+s"})
assert result["ok"] is False
assert result["blocked"] is True
assert result["key"] == "ctrl+shift+s"
assert "prohibited by runtime configuration" in result["error"]
assert "another allowed route" in result["hint"]
def test_press_key_blocks_prohibited_combo_after_alias_normalization(tmp_path: Path, monkeypatch) -> None:
agent = _build_agent(tmp_path, monkeypatch)
agent.options.prohibited_key_combos = {"meta+r"}
agent.prohibited_key_combos = agent._normalize_prohibited_key_combos(agent.options.prohibited_key_combos)
result = agent._tool_press_key({"key": "win+r"})
assert result["ok"] is False
assert result["blocked"] is True
assert result["key"] == "win+r"
def test_context_compaction_trigger_and_payload(tmp_path: Path, monkeypatch) -> None:
agent = _build_agent(tmp_path, monkeypatch)
agent.objective = "Open settings app"
agent.previous_response_id = "resp_123"
agent.step = 4
agent.last_context_compact_step = 0
agent.options.screen_context_decay_steps = 4
agent.recent_tool_summaries = ["step=1 tool=see_screen status=ok"]
agent.last_screen_data_url = "data:image/png;base64,abc"
agent.last_screen_meta = {"width": 1280, "height": 720, "path": "C:/tmp/frame.png"}
assert agent._should_compact_context() is True
visual_message = agent._build_visual_message("Current screen", "data:image/png;base64,abc", agent.last_screen_meta)
agent._register_visual_context_message(visual_message, agent.last_screen_meta, tool_name="see_screen")
compacted = agent._build_compacted_pending_input("decay")
assert len(compacted) == 2
assert "Context compaction activated due to stale context decay." in compacted[0]["content"][0]["text"]
assert "Open settings app" in compacted[0]["content"][0]["text"]
assert "Treat prior reasoning as stale" in compacted[0]["content"][0]["text"]
assert "Retained visual observations:" in compacted[0]["content"][0]["text"]
assert "do not call see_screen again only because compaction happened" in compacted[0]["content"][0]["text"]
assert "observe -> decide -> act -> verify" in compacted[0]["content"][0]["text"]
def test_context_compaction_drops_function_call_outputs_from_rebased_input(tmp_path: Path, monkeypatch) -> None:
agent = _build_agent(tmp_path, monkeypatch)
agent.objective = "Open settings app"
visual_meta = {"path": "C:/tmp/frame.png"}
visual_message = agent._build_visual_message("Current screen", "data:image/png;base64,abc", visual_meta)
agent._register_visual_context_message(visual_message, visual_meta, tool_name="see_screen")
compacted = agent._build_compacted_pending_input(
"decay",
carryover_items=[
{"type": "function_call_output", "call_id": "call_123", "output": "{\"ok\": true}"},
{"role": "user", "content": [{"type": "input_text", "text": "blocked hint"}]},
],
)
assert len(compacted) == 3
assert compacted[1]["role"] == "user"
assert compacted[1]["content"][0]["text"] == "blocked hint"
assert all(item.get("type") != "function_call_output" for item in compacted)
def test_visual_context_budget_keeps_only_latest_three_images(tmp_path: Path, monkeypatch) -> None:
agent = _build_agent(tmp_path, monkeypatch)
agent.options.max_visual_context_images = 3
captured_times = [
"2026-05-30T10:00:03+00:00",
"2026-05-30T10:00:01+00:00",
"2026-05-30T10:00:04+00:00",
"2026-05-30T10:00:02+00:00",
]
for idx, captured_at in enumerate(captured_times):
meta = {"path": f"C:/tmp/frame_{idx}.png", "captured_at": captured_at}
message = agent._build_visual_message(f"frame {idx}", f"data:image/png;base64,{idx}", meta)
agent._register_visual_context_message(message, meta, tool_name="see_screen")
assert agent.visual_context_overflow_pending is True
assert [entry["meta"]["path"] for entry in agent.visual_context_messages] == [
"C:/tmp/frame_3.png",
"C:/tmp/frame_0.png",
"C:/tmp/frame_2.png",
]
def test_compacted_input_uses_latest_visuals_by_capture_time(tmp_path: Path, monkeypatch) -> None:
agent = _build_agent(tmp_path, monkeypatch)
agent.options.max_visual_context_images = 3
agent.objective = "Verify the current app window"
for idx, captured_at in enumerate(
[
"2026-05-30T10:00:04+00:00",
"2026-05-30T10:00:01+00:00",
"2026-05-30T10:00:03+00:00",
"2026-05-30T10:00:02+00:00",
]
):
meta = {"path": f"C:/tmp/frame_{idx}.png", "captured_at": captured_at}
message = agent._build_visual_message(f"frame {idx}", f"data:image/png;base64,{idx}", meta)
agent._register_visual_context_message(message, meta, tool_name="see_screen")
compacted = agent._build_compacted_pending_input("visual_budget")
visual_messages = [
item
for item in compacted
if isinstance(item.get("content"), list)
and any(part.get("type") == "input_image" for part in item["content"] if isinstance(part, dict))
]
assert len(visual_messages) == 3
assert [
json.loads(message["content"][0]["text"].split("Metadata: ", 1)[1].split("\n", 1)[0])["path"]
for message in visual_messages
] == [
"C:/tmp/frame_3.png",
"C:/tmp/frame_2.png",
"C:/tmp/frame_0.png",
]
def test_context_compaction_event_includes_visual_budget_reason_and_paths(tmp_path: Path, monkeypatch) -> None:
events: list[dict[str, object]] = []
agent = _build_agent(tmp_path, monkeypatch)
agent.event_callback = events.append
agent.step = 5
agent.recent_tool_summaries = ["step=4 tool=enhance status=ok"]
agent.visual_context_messages = [
{"message": {"role": "user", "content": []}, "meta": {"path": "C:/tmp/1.png"}},
{"message": {"role": "user", "content": []}, "meta": {"path": "C:/tmp/2.png"}},
{"message": {"role": "user", "content": []}, "meta": {"path": "C:/tmp/3.png"}},
]
agent._emit_context_compacted("visual_budget")
assert events[-1]["event_type"] == "context_compacted"
payload = events[-1]["payload"]
assert payload["rebuild_reason"] == "visual_budget"
assert payload["visual_context_paths"] == ["C:/tmp/1.png", "C:/tmp/2.png", "C:/tmp/3.png"]
def test_observation_loop_blocks_repeated_broad_reobservation(tmp_path: Path, monkeypatch) -> None:
agent = _build_agent(tmp_path, monkeypatch)
agent.step_history = [
{
"step": 21,
"tool_names": ["get_active_window", "see_screen"],
"window_signature": "123|#32770|Save as",
"window_summary": "Save as [#32770]",
"had_visual": True,
},
{
"step": 22,
"tool_names": ["get_active_window", "see_screen"],
"window_signature": "123|#32770|Save as",
"window_summary": "Save as [#32770]",
"had_visual": True,
},
{
"step": 23,
"tool_names": ["get_active_window", "see_screen"],
"window_signature": "123|#32770|Save as",
"window_summary": "Save as [#32770]",
"had_visual": True,
},
]
blocked = agent._dispatch_tool("see_screen", {})
assert blocked["ok"] is False
assert blocked["blocked"] is True
assert blocked["blocked_reason"] == "observation_loop"
assert "unchanged foreground window" in blocked["error"]
assert blocked["window_summary"] == "Save as [#32770]"
def test_repeated_ambiguous_action_requires_verification_and_then_blocks(tmp_path: Path, monkeypatch) -> None:
agent = _build_agent(tmp_path, monkeypatch)
type_args = {"text": "repeat me"}
first = agent._dispatch_tool("type", type_args)
assert first["ok"] is True
assert first["verification_required"] is True
assert first["verification_channels"] == ["enhance", "get_active_window", "see_screen"]
blocked_without_verification = agent._dispatch_tool("type", type_args)
assert blocked_without_verification["blocked"] is True
assert "see_screen" in blocked_without_verification["error"]
assert agent._dispatch_tool("see_screen", {})["ok"] is True
assert agent._dispatch_tool("type", type_args)["ok"] is True
assert agent._dispatch_tool("see_screen", {})["ok"] is True
assert agent._dispatch_tool("type", type_args)["ok"] is True
assert agent._dispatch_tool("see_screen", {})["ok"] is True
blocked_after_retry_budget = agent._dispatch_tool("type", type_args)
assert blocked_after_retry_budget["blocked"] is True
assert "3 time(s) on the same surface" in blocked_after_retry_budget["error"]
assert agent._dispatch_tool("see_screen", {})["ok"] is True
reset_attempt = agent._dispatch_tool("type", type_args)
assert reset_attempt["ok"] is True
def test_copy_shortcut_prefers_clipboard_verification(tmp_path: Path, monkeypatch) -> None:
agent = _build_agent(tmp_path, monkeypatch)
monkeypatch.setattr(
agent,
"_clipboard_get_metadata",
lambda: {"has_text": True, "has_image": False, "available_formats": ["CF_UNICODETEXT"]},
)
monkeypatch.setattr(agent, "_clipboard_get_text", lambda: "copied")
first = agent._dispatch_tool("press_key", {"key": "ctrl+c"})
assert first["ok"] is True
assert first["verification_channels"] == ["clipboard_get"]
blocked = agent._dispatch_tool("press_key", {"key": "ctrl+c"})
assert blocked["blocked"] is True
assert "clipboard_get" in blocked["error"]
observed = agent._dispatch_tool("clipboard_get", {})
assert observed["ok"] is True
assert observed["has_text"] is True
second = agent._dispatch_tool("press_key", {"key": "ctrl+c"})
assert second["ok"] is True
def test_execute_command_blocks_unrequested_recursive_file_search(tmp_path: Path, monkeypatch) -> None:
agent = _build_agent(tmp_path, monkeypatch)
agent.objective = "Save the current note in Notepad"
result = agent._tool_execute_command({"command": "Get-ChildItem -Recurse -Filter *.txt"})
assert result["ok"] is False
assert result["blocked"] is True
assert "out of scope" in result["error"]
def test_execute_command_allows_recursive_file_search_when_objective_requests_it(tmp_path: Path, monkeypatch) -> None:
agent = _build_agent(tmp_path, monkeypatch)
agent.objective = "Find the saved text file path"
called: dict[str, Any] = {}
class _FakeProcess:
returncode = 0
def poll(self) -> int:
return 0
def communicate(self, timeout: int = 2):
return ("ok", "")
def fake_popen(*args, **kwargs):
called["command"] = args[0]
return _FakeProcess()
monkeypatch.setattr(agent_module.subprocess, "Popen", fake_popen)
result = agent._tool_execute_command({"command": "Get-ChildItem -Recurse -Filter *.txt"})
assert result["ok"] is True
assert called["command"] == "Get-ChildItem -Recurse -Filter *.txt"
def test_execute_command_launch_requires_focus_verification(tmp_path: Path, monkeypatch) -> None:
agent = _build_agent(tmp_path, monkeypatch)
called: dict[str, Any] = {}
class _FakeProcess:
returncode = 0
def poll(self) -> int:
return 0
def communicate(self, timeout: int = 2):
return ("", "")
def fake_popen(*args, **kwargs):
called["command"] = args[0]
return _FakeProcess()
monkeypatch.setattr(agent_module.subprocess, "Popen", fake_popen)
first = agent._dispatch_tool("execute_command", {"command": "start notepad"})
assert first["ok"] is True
assert first["background_launch_assumed"] is True
assert first["focus_change_assumed"] is False
assert first["verification_required"] is True
assert first["verification_channels"] == ["get_active_window", "see_screen"]
assert called["command"] == "start notepad"
blocked = agent._dispatch_tool("execute_command", {"command": "start notepad"})
assert blocked["blocked"] is True
assert "get_active_window" in blocked["error"]
observed = agent._dispatch_tool("get_active_window", {})
assert observed["ok"] is True
second = agent._dispatch_tool("execute_command", {"command": "start notepad"})
assert second["ok"] is True
def test_system_prompt_emphasizes_situational_awareness() -> None:
prompt = agent_module.SYSTEM_PROMPT
assert "Maintain a live mental model" in prompt
assert "classify -> choose control channel -> execute one meaningful transition -> verify" in prompt
assert "First classify, then act." in prompt
assert "Use see_screen at a balanced cadence" in prompt
assert "get_active_window" in prompt
assert "detect_dialog" in prompt
assert "dialog_set_filename" in prompt
assert "list_ui_elements" in prompt
assert "clipboard_get" in prompt
assert "Do not invent new subgoals" in prompt
assert "verify-and-finish" in prompt
assert "data.observed_result" in prompt
assert "Treat command-launched apps or URLs as background" in prompt
assert "#32770" in prompt
assert "secure desktop" in prompt.lower()
def test_observation_loop_prompt_pushes_action_or_finish() -> None:
prompt = agent_module.build_observation_loop_prompt("Save as [#32770]", repeated_steps=3)
assert "same stable window for 3 step(s)" in prompt
assert "Save as [#32770]" in prompt
assert "Do not keep calling broad observation tools" in prompt
assert "native window/dialog/element tool" in prompt
assert "Use enhance only if a small or text-heavy control must be read before acting." in prompt
assert "#32770 dialog" in prompt
def test_finish_likely_prompt_pushes_verification_then_completion() -> None:
prompt = agent_module.build_finish_likely_prompt(
'Save dialog closed and focus returned to "todo-demo.txt - Notepad". | Command verification confirms "todo-demo.txt" exists.',
prohibited_key_combos={"ctrl+shift+s"},
)
assert "objective is likely already satisfied" in prompt
assert "todo-demo.txt - Notepad" in prompt
assert "call see_screen" in prompt
assert "then call task_complete" in prompt
assert "Do not reopen menus" in prompt
assert "Prohibited key combos for this run: ctrl+shift+s." in prompt
def test_initial_action_prompt_reinforces_observation_and_verification() -> None:
prompt = agent_module.build_initial_action_prompt("Open calculator", {"ctrl+shift+s"})
assert "JOB: Open calculator" in prompt
assert "First classify the current UI state from the latest evidence." in prompt
assert "Identify what changed since the last action or screen capture." in prompt
assert "classify -> choose control channel -> execute one meaningful transition -> verify" in prompt
assert "Prefer native window/dialog/element tools" in prompt
assert "get_active_window plus detect_dialog" in prompt
assert "click then see_screen" in prompt
assert "Do not invent new subgoals" in prompt
assert "Prefer non-visual verification when available" in prompt
assert "wait_for_focus_change" in prompt
assert "#32770 dialogs" in prompt
assert "Prohibited key combos for this run: ctrl+shift+s." in prompt
assert "do not re-capture the screen just to reconfirm an obvious large input area" in prompt
assert 'task_complete(return=..., data={"observed_result": ...})' in prompt
def test_no_tool_prompt_recovers_by_reobserving() -> None:
prompt = agent_module.build_no_tool_prompt({"ctrl+shift+s"})
assert "Recover by re-observing the current desktop state instead of guessing." in prompt
assert "Start by classifying the surface." in prompt
assert "get_active_window" in prompt
assert "detect_dialog" in prompt
assert "clipboard_get" in prompt
assert "native window/dialog/element tools" in prompt
assert "Do not assume execute_command launches changed the foreground window" in prompt
assert "Prohibited key combos for this run: ctrl+shift+s." in prompt
assert "If a modal, picker, or browser download/upload surface is likely" in prompt
def test_blocked_action_prompt_reanchors_on_screen_state() -> None:
prompt = agent_module.build_blocked_action_prompt("click", prohibited_key_combos={"ctrl+shift+s"})
assert "The last action using click was blocked or unreliable." in prompt
assert "Do not retry blindly." in prompt
assert "classify the current surface" in prompt
assert "detect_dialog" in prompt
assert "dialog_set_filename" in prompt
assert "get_active_window" in prompt
assert "get_cursor_position before move_mouse or drag" in prompt
assert "wait_for_focus_change" in prompt
assert "secure desktop or UAC" in prompt
assert "Switch strategy after the fresh classification" in prompt
assert "Prohibited key combos for this run: ctrl+shift+s." in prompt
assert "native control instead of pixels" in prompt
def test_tool_schemas_include_completion_and_desktop_awareness_guidance(tmp_path: Path, monkeypatch) -> None:
agent = _build_agent(tmp_path, monkeypatch)
agent.prohibited_key_combos = {"ctrl+shift+s"}
schemas = {tool["name"]: tool for tool in agent._tool_schemas()}
assert "data.observed_result" in schemas["task_complete"]["description"]
assert "before task_complete" in schemas["see_screen"]["description"]
assert "text-heavy targets" in schemas["enhance"]["description"]
assert "verify copy or cut results" in schemas["clipboard_get"]["description"]
assert "pointer state matters" in schemas["get_cursor_position"]["description"]
assert "verify focus and active app" in schemas["get_active_window"]["description"]
assert "foreground focus" in schemas["execute_command"]["description"]
assert "Prohibited for this run: ctrl+shift+s." in schemas["press_key"]["description"]
assert "dialog classification" in schemas["get_active_window"]["description"]
assert "visible top-level windows" in schemas["list_windows"]["description"]
assert "#32770 or picker surface" in schemas["detect_dialog"]["description"]
assert "filename or path field" in schemas["dialog_set_filename"]["description"]
assert "native child controls" in schemas["list_ui_elements"]["description"]
def test_tool_schemas_hide_optional_native_tools_when_mode_off(tmp_path: Path, monkeypatch) -> None:
agent = _build_agent(tmp_path, monkeypatch)
agent.options.native_automation_mode = "off"
schemas = {tool["name"]: tool for tool in agent._tool_schemas()}
assert "get_active_window" in schemas
assert "list_windows" not in schemas
assert "detect_dialog" not in schemas
assert "list_ui_elements" not in schemas
def test_tool_schemas_hide_windows_only_tools_on_non_windows_host(tmp_path: Path, monkeypatch) -> None:
agent = _build_agent(tmp_path, monkeypatch)
monkeypatch.setattr(agent_module.sys, "platform", "linux")
schemas = {tool["name"]: tool for tool in agent._tool_schemas()}
assert "get_active_window" not in schemas
assert "list_windows" not in schemas
assert "detect_dialog" not in schemas
assert "list_ui_elements" not in schemas
result = agent._dispatch_tool("get_active_window", {})
assert result["ok"] is False
assert result["error"] == "Tool 'get_active_window' is only available on Windows."
def test_list_windows_returns_structured_surface_metadata(tmp_path: Path, monkeypatch) -> None:
agent = _build_agent(tmp_path, monkeypatch)
monkeypatch.setattr(
agent,
"_list_windows_info",
lambda visible_only=True: [
{
"available": True,
"hwnd": 111,
"title": "Open",
"class_name": "#32770",
"executable_name": "notepad.exe",
"surface_kind": "file_dialog",
"dialog_kind": "file_open",
}
],
)
monkeypatch.setattr(
agent,
"_get_active_window_info",
lambda: {
"available": True,
"hwnd": 111,
"title": "Open",
"class_name": "#32770",
"executable_name": "notepad.exe",
},
)
result = agent._tool_list_windows({})
assert result["ok"] is True
assert result["count"] == 1
assert result["surface_kind"] == "file_dialog"
assert result["dialog_kind"] == "file_open"
assert result["recommended_next_tools"][0] == "dialog_set_filename"
def test_detect_dialog_returns_buttons_and_target_handle(tmp_path: Path, monkeypatch) -> None:
agent = _build_agent(tmp_path, monkeypatch)
monkeypatch.setattr(
agent,
"_find_dialog_info",
lambda title_contains="": {
"available": True,
"hwnd": 222,
"title": "Save as",
"class_name": "#32770",
"executable_name": "notepad.exe",
},
)
monkeypatch.setattr(
agent,
"_get_active_window_info",
lambda: {
"available": True,
"hwnd": 222,
"title": "Save as",
"class_name": "#32770",
"executable_name": "notepad.exe",
},
)
monkeypatch.setattr(
agent,
"_list_ui_elements_for_window",
lambda hwnd, include_hidden=False: [
{
"handle": 10,
"role": "button",
"text": "Save",
"target": {"type": "ui_element", "handle": 10, "window_handle": hwnd},
}
],
)
result = agent._tool_detect_dialog({})
assert result["ok"] is True
assert result["dialog_kind"] == "file_save"
assert result["target"]["type"] == "dialog"
assert result["buttons"][0]["text"] == "Save"
def test_notepad_save_pattern_enters_finish_likely_mode(tmp_path: Path, monkeypatch) -> None:
events: list[dict[str, object]] = []
agent = _build_agent(tmp_path, monkeypatch)
agent.event_callback = events.append
agent.objective = "Open Notepad, type a short to-do list, save it as todo-demo.txt in Documents"
agent.finish_likely_state["target_filename"] = agent._infer_target_filename(agent.objective)
agent.last_observed_window = {
"available": True,
"title": "Save as",
"class_name": "#32770",
}
agent.step = 24
window_result = agent._update_finish_likely_from_tool(
"get_active_window",
{},
{
"ok": True,
"window": {
"available": True,
"title": "todo-demo.txt - Notepad",
"class_name": "Notepad",
},
},
)
assert agent.finish_likely_state["active"] is False
assert [item["kind"] for item in window_result["completion_evidence"]] == [
"active_window_title_matches_target",
"save_dialog_closed_to_target_window",
]
agent.last_visual_signature = "stable-post-save"
agent.step = 25
command_result = agent._update_finish_likely_from_tool(
"execute_command",
{"command": "powershell -NoProfile -Command \"Test-Path ... todo-demo.txt\""},
{
"ok": True,
"exit_code": 0,
"stdout": r"C:\Users\paulw\Documents\todo-demo.txt",
},
)
assert agent.finish_likely_state["active"] is True
assert agent.finish_likely_state["summary"]
assert command_result["finish_likely"]["target_filename"] == "todo-demo.txt"
assert any(event["event_type"] == "completion_evidence" for event in events)
assert any(event["event_type"] == "finish_likely" for event in events)
def test_finish_likely_guard_blocks_reopening_menu_after_fresh_verification(tmp_path: Path, monkeypatch) -> None:
agent = _build_agent(tmp_path, monkeypatch)
agent.objective = "Open Notepad, type a short to-do list, save it as todo-demo.txt in Documents"
agent.finish_likely_state.update(
{
"active": True,
"activated_at_step": 24,
"target_filename": "todo-demo.txt",
"summary": 'Save dialog closed and focus returned to "todo-demo.txt - Notepad". | Command verification confirms "todo-demo.txt" exists.',
"fresh_verification_done": False,
"verification_step": 0,
"post_completion_visual_signature": "",
}
)
agent.step = 25
verify_result = agent._dispatch_tool("see_screen", {})
assert verify_result["ok"] is True
assert verify_result["finish_likely_verification_done"] is True
assert agent.finish_likely_state["fresh_verification_done"] is True
blocked = agent._dispatch_tool("press_key", {"key": "alt+f"})
assert blocked["ok"] is False
assert blocked["blocked"] is True
assert blocked["blocked_reason"] == "finish_likely"
assert "appears satisfied" in blocked["error"]
assert "reopen menus" in blocked["hint"].lower()
def test_dispatch_rejects_unknown_and_disabled_tools(tmp_path: Path, monkeypatch) -> None:
agent = _build_agent(tmp_path, monkeypatch)
agent.disabled_tools = {"scroll"}
assert agent._dispatch_tool("unknown_tool", {}) == {"ok": False, "error": "Unknown tool: unknown_tool"}
assert agent._dispatch_tool("scroll", {}) == {"ok": False, "error": "Tool 'scroll' is disabled for this job."}
def test_tool_schemas_filter_disabled_tools(tmp_path: Path, monkeypatch) -> None:
agent = _build_agent(tmp_path, monkeypatch)
agent.disabled_tools = {"scroll", "clipboard_get"}
tool_names = {tool["name"] for tool in agent._tool_schemas()}
assert "scroll" not in tool_names
assert "clipboard_get" not in tool_names
assert "click" in tool_names
assert "task_complete" in tool_names
def test_normalize_disabled_tools_rejects_invalid_and_required_names() -> None:
with pytest.raises(ValueError, match="Unknown disabled tool"):
agent_module.normalize_disabled_tools(["not_a_real_tool"])
with pytest.raises(ValueError, match="Cannot disable required tool"):
agent_module.normalize_disabled_tools(["task_complete"])

View File

@@ -20,6 +20,7 @@ def test_cli_emits_structured_return_and_data(monkeypatch: Any, capsys, tmp_path
port=8787,
runs_dir=tmp_path / "runs",
db_path=tmp_path / "screenjob.db",
prohibited_key_combos=("ctrl+shift+s",),
)
config.runs_dir.mkdir(parents=True, exist_ok=True)
@@ -29,7 +30,10 @@ def test_cli_emits_structured_return_and_data(monkeypatch: Any, capsys, tmp_path
def fake_assess_task_safety(*_args, **_kwargs):
return True, "safe", {"safe": True}
captured_kwargs: dict[str, Any] = {}
def fake_run_job(*_args, **_kwargs):
captured_kwargs.update(_kwargs)
result = AgentResult(
completed=True,
result="Done",
@@ -66,3 +70,13 @@ def test_cli_emits_structured_return_and_data(monkeypatch: Any, capsys, tmp_path
assert payload["response"]["data"] == "file1.txt\nfile2.txt"
assert payload["return"] == "Task completed successfully"
assert payload["data"] == "file1.txt\nfile2.txt"
assert captured_kwargs["options"].reasoning_effort == "medium"
assert captured_kwargs["options"].screen_context_decay_steps == 4
assert captured_kwargs["options"].max_visual_context_images == 3
assert captured_kwargs["options"].native_automation_mode == "prefer"
assert captured_kwargs["options"].dialog_timeout_seconds == 12.0
assert captured_kwargs["options"].focus_timeout_seconds == 8.0
assert captured_kwargs["options"].ui_element_timeout_seconds == 8.0
assert captured_kwargs["options"].max_retries_per_surface == 3
assert captured_kwargs["options"].pretty_logs is False
assert captured_kwargs["options"].prohibited_key_combos == {"ctrl+shift+s"}

View File

@@ -0,0 +1,149 @@
from __future__ import annotations
import types
from collections import deque
from typing import Any
from src.desktop_overlay import CompletionOverlayPayload, DesktopOverlayManager
class _FakeWidget:
def __init__(self, root: "_FakeTk", *, width: int = 360, height: int = 160) -> None:
self._root = root
self._width = width
self._height = height
self._exists = True
self._after_ids: dict[str, tuple[int, Any]] = {}
def withdraw(self) -> None:
return None
def overrideredirect(self, *_args: Any, **_kwargs: Any) -> None:
return None
def attributes(self, *_args: Any, **_kwargs: Any) -> None:
return None
def configure(self, *_args: Any, **_kwargs: Any) -> None:
return None
def pack(self, *_args: Any, **_kwargs: Any) -> None:
return None
def place(self, *_args: Any, **_kwargs: Any) -> None:
return None
def update_idletasks(self) -> None:
return None
def winfo_width(self) -> int:
return self._width
def winfo_height(self) -> int:
return self._height
def winfo_exists(self) -> bool:
return self._exists
def geometry(self, *_args: Any, **_kwargs: Any) -> None:
return None
def deiconify(self) -> None:
return None
def destroy(self) -> None:
self._exists = False
def after(self, delay_ms: int, callback: Any) -> str:
after_id = self._root._schedule(delay_ms, callback)
self._after_ids[after_id] = (delay_ms, callback)
return after_id
def after_cancel(self, after_id: str) -> None:
self._after_ids.pop(after_id, None)
self._root._cancel(after_id)
class _FakeButton(_FakeWidget):
def __init__(self, root: "_FakeTk", command: Any | None = None, **_kwargs: Any) -> None:
super().__init__(root)
self.command = command
class _FakeTk(_FakeWidget):
def __init__(self) -> None:
super().__init__(self)
self._events: deque[tuple[str, int, Any]] = deque()
self._event_seq = 0
self.scheduled_delays: list[int] = []
self.cards: list[_FakeWidget] = []
def withdraw(self) -> None:
return None
def winfo_screenwidth(self) -> int:
return 1920
def _schedule(self, delay_ms: int, callback: Any) -> str:
after_id = f"after-{self._event_seq}"
self._event_seq += 1
self.scheduled_delays.append(delay_ms)
self._events.append((after_id, delay_ms, callback))
return after_id
def _cancel(self, after_id: str) -> None:
self._events = deque(event for event in self._events if event[0] != after_id)
def mainloop(self) -> None:
iterations = 0
while self._events and iterations < 20:
after_id, _delay_ms, callback = self._events.popleft()
iterations += 1
callback()
if any(not card.winfo_exists() for card in self.cards):
return
class _FakeTkModule(types.SimpleNamespace):
def __init__(self, root: _FakeTk) -> None:
super().__init__()
self._root = root
def Tk(self) -> _FakeTk:
return self._root
def Toplevel(self, _root: _FakeTk) -> _FakeWidget:
card = _FakeWidget(self._root)
self._root.cards.append(card)
return card
def Frame(self, root: _FakeWidget, **_kwargs: Any) -> _FakeWidget:
return _FakeWidget(root._root)
def Label(self, root: _FakeWidget, **_kwargs: Any) -> _FakeWidget:
return _FakeWidget(root._root)
def Button(self, root: _FakeWidget, command: Any | None = None, **_kwargs: Any) -> _FakeButton:
return _FakeButton(root._root, command=command)
def test_completion_overlay_auto_dismisses(monkeypatch: Any) -> None:
root = _FakeTk()
fake_tk = _FakeTkModule(root)
monkeypatch.setitem(__import__("sys").modules, "tkinter", fake_tk)
manager = DesktopOverlayManager(auto_dismiss_seconds=0.01)
manager._queue.put(
CompletionOverlayPayload(
job_id="job-123",
objective="Write a report",
return_message="Finished",
steps=5,
elapsed_seconds=12.4,
)
)
manager._ui_main()
assert any(delay == 10 for delay in root.scheduled_delays)
assert root.cards[0]._exists is False

View File

@@ -9,6 +9,24 @@ import src.server as server_module
from src.config import AppConfig
_TERMINAL_STATUSES = {"completed", "failed", "cancelled"}
def _objective_category(objective: str) -> str:
text = objective.lower()
if any(keyword in text for keyword in ("browser", "website", "amazon", "google", "login", "shopping", "checkout", "orders")):
return "Browser / web"
if any(keyword in text for keyword in ("file", "folder", "directory", "terminal", "shell", "command", "cli", "script", "git", "repo", "install", "pip", "npm")):
return "Files / terminal"
if any(keyword in text for keyword in ("write", "summary", "document", "docs", "report", "email", "message", "readme", "markdown")):
return "Writing / docs"
if any(keyword in text for keyword in ("data", "analysis", "csv", "spreadsheet", "sheet", "table", "chart", "dashboard", "metric", "sql")):
return "Data / analysis"
if any(keyword in text for keyword in ("code", "bug", "fix", "test", "debug", "api", "backend", "frontend", "database", "deploy", "docker", "service", "build")):
return "Development / ops"
return "Other"
class FakeJobManager:
def __init__(self, *, config: AppConfig, db: Any, broadcast: Any = None) -> None:
self.config = config
@@ -26,6 +44,15 @@ class FakeJobManager:
command_timeout: int = 45,
type_interval: float = 0.02,
click_pause: float = 0.10,
reasoning_effort: str = "medium",
screen_context_decay_steps: int = 4,
max_visual_context_images: int = 3,
native_automation_mode: str = "prefer",
dialog_timeout_seconds: float = 12.0,
focus_timeout_seconds: float = 8.0,
ui_element_timeout_seconds: float = 8.0,
max_retries_per_surface: int = 3,
pretty_logs: bool = False,
disabled_tools: list[str] | None = None,
safety_override: bool = False,
no_failsafe: bool = False,
@@ -33,6 +60,11 @@ class FakeJobManager:
self._counter += 1
job_id = f"job_fake_{self._counter:03d}"
selected_model = (model or self.config.default_model).strip()
artifacts_dir = (self.config.runs_dir / f"run_{job_id}").resolve()
artifacts_dir.mkdir(parents=True, exist_ok=True)
screenshot_path = artifacts_dir / "screen_step_001.png"
screenshot_path.write_bytes(b"not-a-real-png")
created_at = f"2026-05-27T00:00:{self._counter:02d}Z"
self.last_submit_payload = {
"objective": objective,
"model": selected_model,
@@ -42,6 +74,15 @@ class FakeJobManager:
"command_timeout": command_timeout,
"type_interval": type_interval,
"click_pause": click_pause,
"reasoning_effort": reasoning_effort,
"screen_context_decay_steps": screen_context_decay_steps,
"max_visual_context_images": max_visual_context_images,
"native_automation_mode": native_automation_mode,
"dialog_timeout_seconds": dialog_timeout_seconds,
"focus_timeout_seconds": focus_timeout_seconds,
"ui_element_timeout_seconds": ui_element_timeout_seconds,
"max_retries_per_surface": max_retries_per_surface,
"pretty_logs": pretty_logs,
"no_failsafe": no_failsafe,
}
self._jobs[job_id] = {
@@ -49,6 +90,10 @@ class FakeJobManager:
"objective": objective,
"model": selected_model,
"status": "running",
"created_at": created_at,
"started_at": created_at,
"ended_at": None,
"steps": 1,
"result": "Running",
"response": {"return": "Running", "data": None},
"return": "Running",
@@ -61,7 +106,7 @@ class FakeJobManager:
"total_tokens": 14,
"estimated_cost_usd": 0.0001,
},
"artifacts_dir": str(self.config.runs_dir.resolve()),
"artifacts_dir": str(artifacts_dir),
}
self._events[job_id] = [
{
@@ -70,7 +115,47 @@ class FakeJobManager:
"ts": "2026-05-27T00:00:00Z",
"step": 1,
"event_type": "tool_called",
"payload": {"tool": "execute_command"},
"payload": {"tool": "click", "args": {"coordinate": {"x": 320, "y": 180}}},
},
{
"id": 2,
"job_id": job_id,
"ts": "2026-05-27T00:00:01Z",
"step": 1,
"event_type": "tool_result",
"payload": {"tool": "click", "result": {"ok": True, "clicked": {"x": 322, "y": 182}}},
},
{
"id": 3,
"job_id": job_id,
"ts": "2026-05-27T00:00:02Z",
"step": 1,
"event_type": "tool_called",
"payload": {"tool": "type", "args": {"text": "hello world"}},
},
{
"id": 4,
"job_id": job_id,
"ts": "2026-05-27T00:00:03Z",
"step": 1,
"event_type": "tool_result",
"payload": {"tool": "type", "result": {"ok": True, "typed_length": 11}},
},
{
"id": 5,
"job_id": job_id,
"ts": "2026-05-27T00:00:04Z",
"step": 1,
"event_type": "visual_update",
"payload": {
"kind": "see_screen",
"image_meta": {
"path": str(screenshot_path),
"width": 1920,
"height": 1080,
"grid": True,
},
},
}
]
return job_id
@@ -101,6 +186,114 @@ class FakeJobManager:
"live_running_threads": 0,
}
def analytics(self) -> dict[str, Any]:
by_category: dict[str, dict[str, Any]] = {}
by_day: dict[str, dict[str, Any]] = {}
def bucket(target: dict[str, dict[str, Any]], key: str) -> dict[str, Any]:
return target.setdefault(
key,
{
"label": key,
"total_jobs": 0,
"finished_jobs": 0,
"completed_jobs": 0,
"failed_jobs": 0,
"cancelled_jobs": 0,
"steps_sum": 0,
"steps_count": 0,
"cost_sum": 0.0,
"cost_count": 0,
},
)
total_jobs = 0
finished_jobs = 0
completed_jobs = 0
failed_jobs = 0
cancelled_jobs = 0
steps_sum = 0
steps_count = 0
cost_sum = 0.0
cost_count = 0
for job in self._jobs.values():
total_jobs += 1
status = str(job.get("status") or "")
finished = status in _TERMINAL_STATUSES
category = _objective_category(str(job.get("objective") or ""))
day = str(job.get("created_at") or "")[:10] or "unknown"
category_bucket = bucket(by_category, category)
day_bucket = bucket(by_day, day)
for item in (category_bucket, day_bucket):
item["total_jobs"] += 1
if not finished:
continue
finished_jobs += 1
if status == "completed":
completed_jobs += 1
elif status == "failed":
failed_jobs += 1
elif status == "cancelled":
cancelled_jobs += 1
steps_raw = job.get("steps")
if steps_raw is not None:
steps = int(steps_raw)
steps_sum += steps
steps_count += 1
for item in (category_bucket, day_bucket):
item["steps_sum"] += steps
item["steps_count"] += 1
estimated_cost_raw = (job.get("usage") or {}).get("estimated_cost_usd")
if estimated_cost_raw is not None:
estimated_cost = float(estimated_cost_raw)
cost_sum += estimated_cost
cost_count += 1
for item in (category_bucket, day_bucket):
item["cost_sum"] += estimated_cost
item["cost_count"] += 1
for item in (category_bucket, day_bucket):
item["finished_jobs"] += 1
if status == "completed":
item["completed_jobs"] += 1
elif status == "failed":
item["failed_jobs"] += 1
elif status == "cancelled":
item["cancelled_jobs"] += 1
def finalize(item: dict[str, Any]) -> dict[str, Any]:
finished = item["finished_jobs"]
return {
"label": item["label"],
"total_jobs": item["total_jobs"],
"finished_jobs": finished,
"completed_jobs": item["completed_jobs"],
"failed_jobs": item["failed_jobs"],
"cancelled_jobs": item["cancelled_jobs"],
"success_rate": round((item["completed_jobs"] / finished) * 100, 2) if finished else 0.0,
"avg_steps": round(item["steps_sum"] / item["steps_count"], 2) if item["steps_count"] else None,
"avg_cost_usd": round(item["cost_sum"] / item["cost_count"], 6) if item["cost_count"] else None,
}
return {
"total_jobs": total_jobs,
"finished_jobs": finished_jobs,
"completed_jobs": completed_jobs,
"failed_jobs": failed_jobs,
"cancelled_jobs": cancelled_jobs,
"success_rate": round((completed_jobs / finished_jobs) * 100, 2) if finished_jobs else 0.0,
"avg_steps": round(steps_sum / steps_count, 2) if steps_count else None,
"avg_cost_usd": round(cost_sum / cost_count, 6) if cost_count else None,
"by_category": sorted((finalize(item) for item in by_category.values()), key=lambda item: (-item["success_rate"], item["label"])),
"timeline": sorted((finalize(item) for item in by_day.values()), key=lambda item: item["label"]),
}
def _build_app(tmp_path: Path, monkeypatch: Any, disable_ui: bool = False):
monkeypatch.setattr(server_module, "JobManager", FakeJobManager)
@@ -114,6 +307,7 @@ def _build_app(tmp_path: Path, monkeypatch: Any, disable_ui: bool = False):
port=8787,
runs_dir=tmp_path / "runs",
db_path=tmp_path / "screenjob_test.db",
prohibited_key_combos=("ctrl+shift+s",),
)
config.runs_dir.mkdir(parents=True, exist_ok=True)
app = server_module.create_app(config)
@@ -145,6 +339,15 @@ def test_create_job_returns_only_job_id_and_defaults_model(tmp_path: Path, monke
manager = app.state.manager
assert manager.last_submit_payload["model"] == "gpt-5.4-mini"
assert manager.last_submit_payload["disabled_tools"] == ["click"]
assert manager.last_submit_payload["reasoning_effort"] == "medium"
assert manager.last_submit_payload["screen_context_decay_steps"] == 4
assert manager.last_submit_payload["max_visual_context_images"] == 3
assert manager.last_submit_payload["native_automation_mode"] == "prefer"
assert manager.last_submit_payload["dialog_timeout_seconds"] == 12.0
assert manager.last_submit_payload["focus_timeout_seconds"] == 8.0
assert manager.last_submit_payload["ui_element_timeout_seconds"] == 8.0
assert manager.last_submit_payload["max_retries_per_surface"] == 3
assert manager.last_submit_payload["pretty_logs"] is False
status_res = client.get(f"/api/jobs/{job_id}/status", headers=headers)
assert status_res.status_code == 200
@@ -153,6 +356,36 @@ def test_create_job_returns_only_job_id_and_defaults_model(tmp_path: Path, monke
assert "data" in status_res.json()["response"]
def test_create_job_rejects_invalid_disabled_tool_names(tmp_path: Path, monkeypatch: Any) -> None:
app, _ = _build_app(tmp_path, monkeypatch, disable_ui=False)
client = TestClient(app)
headers = {"Authorization": "Bearer test_token"}
response = client.post(
"/api/jobs",
headers=headers,
json={"job": "Open amazon.de", "disabled_tools": ["not_a_real_tool"], "safety_override": True},
)
assert response.status_code == 400
assert "Unknown disabled tool" in response.json()["detail"]
def test_create_job_rejects_disabling_task_complete(tmp_path: Path, monkeypatch: Any) -> None:
app, _ = _build_app(tmp_path, monkeypatch, disable_ui=False)
client = TestClient(app)
headers = {"Authorization": "Bearer test_token"}
response = client.post(
"/api/jobs",
headers=headers,
json={"job": "Open amazon.de", "disabled_tools": ["task_complete"], "safety_override": True},
)
assert response.status_code == 400
assert "Cannot disable required tool" in response.json()["detail"]
def test_cancel_endpoint_and_events(tmp_path: Path, monkeypatch: Any) -> None:
app, _ = _build_app(tmp_path, monkeypatch, disable_ui=False)
client = TestClient(app)
@@ -174,12 +407,122 @@ def test_cancel_endpoint_and_events(tmp_path: Path, monkeypatch: Any) -> None:
assert status_after["data"] is None
def test_replay_endpoint_builds_frames_and_overlays(tmp_path: Path, monkeypatch: Any) -> None:
app, _ = _build_app(tmp_path, monkeypatch, disable_ui=False)
client = TestClient(app)
headers = {"Authorization": "Bearer test_token"}
create = client.post("/api/jobs", headers=headers, json={"job": "Replay test"})
job_id = create.json()["job_id"]
replay = client.get(f"/api/jobs/{job_id}/replay?limit=200", headers=headers)
assert replay.status_code == 200
payload = replay.json()
assert payload["job_id"] == job_id
assert payload["total_frames"] == 1
frame = payload["frames"][0]
assert frame["kind"] == "see_screen"
assert frame["is_fullscreen"] is True
labels = [item.get("label", "") for item in frame["overlays"]]
assert any("click" in text.lower() for text in labels)
assert any("typed" in text.lower() for text in labels)
def test_replay_endpoint_skips_visual_paths_outside_artifacts(tmp_path: Path, monkeypatch: Any) -> None:
app, _ = _build_app(tmp_path, monkeypatch, disable_ui=False)
manager = app.state.manager
client = TestClient(app)
headers = {"Authorization": "Bearer test_token"}
create = client.post("/api/jobs", headers=headers, json={"job": "Replay path check"})
job_id = create.json()["job_id"]
manager._events[job_id].append(
{
"id": 999,
"job_id": job_id,
"ts": "2026-05-27T00:01:00Z",
"step": 2,
"event_type": "visual_update",
"payload": {
"kind": "see_screen",
"image_meta": {
"path": str((tmp_path / "outside.png").resolve()),
"width": 100,
"height": 100,
"grid": True,
},
},
}
)
replay = client.get(f"/api/jobs/{job_id}/replay?limit=500", headers=headers)
assert replay.status_code == 200
payload = replay.json()
assert payload["total_frames"] == 1
def test_analytics_endpoint_groups_by_category_and_time(tmp_path: Path, monkeypatch: Any) -> None:
app, _ = _build_app(tmp_path, monkeypatch, disable_ui=False)
manager = app.state.manager
client = TestClient(app)
headers = {"Authorization": "Bearer test_token"}
browser_completed = client.post("/api/jobs", headers=headers, json={"job": "Open amazon.de and checkout"}).json()["job_id"]
browser_failed = client.post("/api/jobs", headers=headers, json={"job": "Open website and login"}).json()["job_id"]
terminal_completed = client.post("/api/jobs", headers=headers, json={"job": "Run a shell command to inspect files"}).json()["job_id"]
manager._jobs[browser_completed].update(
status="completed",
ended_at="2026-05-27T00:10:00Z",
steps=4,
created_at="2026-05-27T00:00:01Z",
usage={**manager._jobs[browser_completed]["usage"], "estimated_cost_usd": 0.12},
)
manager._jobs[browser_failed].update(
status="failed",
ended_at="2026-05-28T00:10:00Z",
steps=6,
created_at="2026-05-28T00:00:01Z",
usage={**manager._jobs[browser_failed]["usage"], "estimated_cost_usd": 0.24},
)
manager._jobs[terminal_completed].update(
status="completed",
ended_at="2026-05-28T00:15:00Z",
steps=10,
created_at="2026-05-28T00:00:02Z",
usage={**manager._jobs[terminal_completed]["usage"], "estimated_cost_usd": 0.05},
)
analytics = client.get("/api/analytics", headers=headers)
assert analytics.status_code == 200
payload = analytics.json()
assert payload["total_jobs"] == 3
assert payload["finished_jobs"] == 3
assert payload["completed_jobs"] == 2
assert payload["failed_jobs"] == 1
assert payload["success_rate"] == 66.67
assert payload["avg_steps"] == 6.67
assert payload["avg_cost_usd"] == 0.136667
browser = next(row for row in payload["by_category"] if row["label"] == "Browser / web")
terminal = next(row for row in payload["by_category"] if row["label"] == "Files / terminal")
assert browser["finished_jobs"] == 2
assert browser["success_rate"] == 50.0
assert browser["avg_steps"] == 5.0
assert terminal["success_rate"] == 100.0
assert [row["label"] for row in payload["timeline"]] == ["2026-05-27", "2026-05-28"]
def test_ui_toggle(tmp_path: Path, monkeypatch: Any) -> None:
app_enabled, _ = _build_app(tmp_path / "enabled", monkeypatch, disable_ui=False)
client_enabled = TestClient(app_enabled)
root_enabled = client_enabled.get("/")
assert root_enabled.status_code == 200
assert "ScreenJob Monitor" in root_enabled.text
assert "Success by Objective Category" in root_enabled.text
js_enabled = client_enabled.get("/ui/monitoring.js")
assert js_enabled.status_code == 200
assert "const tokenInput" in js_enabled.text
app_disabled, _ = _build_app(tmp_path / "disabled", monkeypatch, disable_ui=True)
client_disabled = TestClient(app_disabled)

View File

@@ -72,3 +72,55 @@ def test_storage_response_fallback_uses_result_when_json_missing(tmp_path: Path)
assert job is not None
assert job["response"]["return"] == "Legacy result string"
assert job["response"]["data"] is None
def test_history_db_analytics_groups_by_category_and_day(tmp_path: Path) -> None:
db = HistoryDB(tmp_path / "screenjob_test_analytics.db")
db.create_job(
job_id="job_browser_ok",
objective="Open amazon.de and checkout",
model="gpt-5.4-mini",
created_at="2026-05-27T00:00:01Z",
safety_override=False,
disabled_tools=[],
)
db.update_job("job_browser_ok", status="completed", steps=4, estimated_cost_usd=0.12)
db.create_job(
job_id="job_browser_fail",
objective="Open website and login",
model="gpt-5.4-mini",
created_at="2026-05-28T00:00:01Z",
safety_override=False,
disabled_tools=[],
)
db.update_job("job_browser_fail", status="failed", steps=6, estimated_cost_usd=0.24)
db.create_job(
job_id="job_terminal_ok",
objective="Run a shell command to inspect files",
model="gpt-5.4-mini",
created_at="2026-05-28T00:00:02Z",
safety_override=False,
disabled_tools=[],
)
db.update_job("job_terminal_ok", status="completed", steps=10, estimated_cost_usd=0.05)
analytics = db.analytics()
assert analytics["total_jobs"] == 3
assert analytics["finished_jobs"] == 3
assert analytics["completed_jobs"] == 2
assert analytics["failed_jobs"] == 1
assert analytics["success_rate"] == 66.67
assert analytics["avg_steps"] == 6.67
assert analytics["avg_cost_usd"] == 0.136667
browser = next(row for row in analytics["by_category"] if row["label"] == "Browser / web")
terminal = next(row for row in analytics["by_category"] if row["label"] == "Files / terminal")
assert browser["finished_jobs"] == 2
assert browser["success_rate"] == 50.0
assert browser["avg_steps"] == 5.0
assert terminal["success_rate"] == 100.0
assert [row["label"] for row in analytics["timeline"]] == ["2026-05-27", "2026-05-28"]

238
tests/test_task_manager.py Normal file
View File

@@ -0,0 +1,238 @@
from __future__ import annotations
import threading
from pathlib import Path
from typing import Any
import src.task_manager as task_manager_module
from src.config import AppConfig
from src.models import AgentResult, RunArtifacts, UsageSummary
from src.storage import HistoryDB
from src.task_manager import JobManager
class _OverlayRecorder:
def __init__(self) -> None:
self.calls: list[dict[str, Any]] = []
def show_completion(self, **kwargs: Any) -> None:
self.calls.append(kwargs)
def _build_manager(tmp_path: Path, overlay_manager: _OverlayRecorder) -> tuple[JobManager, HistoryDB, AppConfig]:
config = AppConfig(
openai_api_key="test-key",
screenjob_token="test-token",
disable_ui=False,
default_model="gpt-5.4-mini",
safety_model="gpt-5.4-mini",
host="127.0.0.1",
port=8787,
runs_dir=tmp_path / "runs",
db_path=tmp_path / "screenjob.db",
)
db = HistoryDB(config.db_path)
manager = JobManager(config=config, db=db, overlay_manager=overlay_manager)
return manager, db, config
def _artifacts(tmp_path: Path) -> RunArtifacts:
root = tmp_path / "run_artifacts"
return RunArtifacts(
run_id="test_run",
root_dir=root,
logs_dir=root / "logs",
shots_dir=root / "shots",
enhance_dir=root / "enhanced",
log_file=root / "logs" / "screenjob.log",
)
def _create_job(db: HistoryDB, job_id: str, objective: str) -> None:
db.create_job(
job_id=job_id,
objective=objective,
model="gpt-5.4-mini",
created_at="2026-05-30T12:00:00+00:00",
safety_override=True,
disabled_tools=[],
)
def test_completed_job_triggers_desktop_overlay(tmp_path: Path, monkeypatch) -> None:
overlay = _OverlayRecorder()
manager, db, _config = _build_manager(tmp_path, overlay)
job_id = "job_overlay_complete"
objective = "Save todo-demo.txt in Documents"
_create_job(db, job_id, objective)
result = AgentResult(
completed=True,
result="Saved todo-demo.txt",
return_message="Saved todo-demo.txt",
data={"observed_result": "todo-demo.txt - Notepad is visible"},
steps=11,
started_at=100.0,
ended_at=112.6,
usage=UsageSummary(),
)
monkeypatch.setattr(task_manager_module, "run_job", lambda **_kwargs: (result, _artifacts(tmp_path)))
manager._execute_job(
job_id=job_id,
objective=objective,
model="gpt-5.4-mini",
disabled_tools=[],
safety_override=True,
max_steps=60,
command_timeout=45,
type_interval=0.02,
click_pause=0.10,
reasoning_effort="medium",
screen_context_decay_steps=4,
max_visual_context_images=3,
native_automation_mode="prefer",
dialog_timeout_seconds=12.0,
focus_timeout_seconds=8.0,
ui_element_timeout_seconds=8.0,
max_retries_per_surface=3,
pretty_logs=False,
no_failsafe=False,
cancel_event=threading.Event(),
)
assert overlay.calls == [
{
"job_id": job_id,
"objective": objective,
"return_message": "Saved todo-demo.txt",
"steps": 11,
"elapsed_seconds": 12.599999999999994,
}
]
assert db.get_job(job_id)["status"] == "completed"
def test_non_completed_jobs_do_not_trigger_desktop_overlay(tmp_path: Path, monkeypatch) -> None:
overlay = _OverlayRecorder()
manager, db, _config = _build_manager(tmp_path, overlay)
failed_job_id = "job_overlay_failed"
_create_job(db, failed_job_id, "Fail intentionally")
failed_result = AgentResult(
completed=False,
result="Failure",
return_message="Failure",
data=None,
steps=7,
started_at=10.0,
ended_at=18.0,
usage=UsageSummary(),
error="Failure",
)
monkeypatch.setattr(task_manager_module, "run_job", lambda **_kwargs: (failed_result, _artifacts(tmp_path)))
manager._execute_job(
job_id=failed_job_id,
objective="Fail intentionally",
model="gpt-5.4-mini",
disabled_tools=[],
safety_override=True,
max_steps=60,
command_timeout=45,
type_interval=0.02,
click_pause=0.10,
reasoning_effort="medium",
screen_context_decay_steps=4,
max_visual_context_images=3,
native_automation_mode="prefer",
dialog_timeout_seconds=12.0,
focus_timeout_seconds=8.0,
ui_element_timeout_seconds=8.0,
max_retries_per_surface=3,
pretty_logs=False,
no_failsafe=False,
cancel_event=threading.Event(),
)
cancelled_job_id = "job_overlay_cancelled"
_create_job(db, cancelled_job_id, "Cancel intentionally")
cancelled_result = AgentResult(
completed=False,
result="Cancelled",
return_message="Cancelled",
data=None,
steps=4,
started_at=20.0,
ended_at=23.0,
usage=UsageSummary(),
error="Cancelled",
cancelled=True,
)
monkeypatch.setattr(task_manager_module, "run_job", lambda **_kwargs: (cancelled_result, _artifacts(tmp_path)))
manager._execute_job(
job_id=cancelled_job_id,
objective="Cancel intentionally",
model="gpt-5.4-mini",
disabled_tools=[],
safety_override=True,
max_steps=60,
command_timeout=45,
type_interval=0.02,
click_pause=0.10,
reasoning_effort="medium",
screen_context_decay_steps=4,
max_visual_context_images=3,
native_automation_mode="prefer",
dialog_timeout_seconds=12.0,
focus_timeout_seconds=8.0,
ui_element_timeout_seconds=8.0,
max_retries_per_surface=3,
pretty_logs=False,
no_failsafe=False,
cancel_event=threading.Event(),
)
assert overlay.calls == []
def test_rejected_job_does_not_trigger_desktop_overlay(tmp_path: Path, monkeypatch) -> None:
overlay = _OverlayRecorder()
manager, db, _config = _build_manager(tmp_path, overlay)
job_id = "job_overlay_rejected"
_create_job(db, job_id, "Do something unsafe")
monkeypatch.setattr(task_manager_module, "create_openai_client", lambda *_args, **_kwargs: object())
monkeypatch.setattr(
task_manager_module,
"assess_task_safety",
lambda *_args, **_kwargs: (False, "Unsafe request", {"decision": "blocked"}),
)
manager._execute_job(
job_id=job_id,
objective="Do something unsafe",
model="gpt-5.4-mini",
disabled_tools=[],
safety_override=False,
max_steps=60,
command_timeout=45,
type_interval=0.02,
click_pause=0.10,
reasoning_effort="medium",
screen_context_decay_steps=4,
max_visual_context_images=3,
native_automation_mode="prefer",
dialog_timeout_seconds=12.0,
focus_timeout_seconds=8.0,
ui_element_timeout_seconds=8.0,
max_retries_per_surface=3,
pretty_logs=False,
no_failsafe=False,
cancel_event=threading.Event(),
)
assert overlay.calls == []
events = db.get_job_events(job_id)
assert events[-1]["event_type"] == "job_rejected"

13
todo.md
View File

@@ -4,21 +4,20 @@
- [Bug] Enforce single active desktop-control run (or a strict queue) so concurrent jobs cannot fight over the same mouse/keyboard/screen session.
- [Bug] Fix run artifact collisions in `setup_artifacts()` (`run_id` is second-granularity, so two jobs in the same second can share/overwrite the same directory).
- [Bug] Remove global logger handler clobbering in `setup_logger()` (`logging.getLogger("screenjob").handlers.clear()` breaks concurrent runs and can redirect logs to the wrong file).
- [Bug] More consistent clicks and more uses of enhance images.
- [x] More consistent clicks and more uses of enhance images.
## P1
- [x] Move ui.py into a seperate html file and js file.
- [x] Think harder using effort "medium" by default.
- [x] Decay old screenshots after 3 to 5 steps to save (1) tokens and (2) brain fuck in the agents.
- [Bug] Validate `disabled_tools` against an allowlist and disallow disabling critical completion flow (`task_complete`) to avoid guaranteed step-limit failures.
- [Bug] Improve `execute_command` cancellation/timeout handling to terminate full process trees, not only the parent shell process.
- [Bug] Reduce API/UI token leakage risk by moving away from query-string token usage for websocket/artifact access where possible.
- [Idea] Add per-token rate limiting and request size limits (objective length + payload bounds) for API hardening.
## P2
- [Bug] Fix UI event style mapping mismatch (`tool_called` events are emitted, but UI color map expects `tool_call`).
- [Idea] Reduce monitoring UI backend load by throttling websocket-triggered refreshes and avoiding full job/event re-fetch on every event.
- [Idea] Add cursor-based pagination for jobs/events instead of large fixed limits.
- [Idea] Support offline/self-hosted UI assets (bundle Tailwind instead of CDN dependency).
- [Idea] Add retention controls/pruning for old runs, screenshots, and DB rows.
## P3
- [Idea] Add Replay Mode; Ability to replay a session by reconstructing the screen from screenshots and overlaying tool calls and click and type events.
- [Idea] Add lightweight analytics dashboards (success rate by objective category, avg steps/cost over time).
- [x] Add Replay Mode; Ability to replay a session by reconstructing the screen from screenshots and overlaying tool calls and click and type events.
- [x] Add lightweight analytics dashboards (success rate by objective category, avg steps/cost over time).

53
tray_service_control.ps1 Normal file
View File

@@ -0,0 +1,53 @@
[CmdletBinding()]
param(
[ValidateSet("start", "stop", "restart")]
[string]$Action,
[string]$ServiceName = "ScreenJobBackend"
)
Set-StrictMode -Version Latest
$ErrorActionPreference = "Stop"
function Wait-ForStatus {
param(
[Parameter(Mandatory = $true)]$Service,
[Parameter(Mandatory = $true)][System.ServiceProcess.ServiceControllerStatus]$TargetStatus,
[int]$TimeoutSeconds = 20
)
$deadline = (Get-Date).AddSeconds($TimeoutSeconds)
while ((Get-Date) -lt $deadline) {
$Service.Refresh()
if ($Service.Status -eq $TargetStatus) {
return
}
Start-Sleep -Milliseconds 350
}
throw "Timed out waiting for service '$($Service.ServiceName)' to reach status '$TargetStatus'."
}
$service = Get-Service -Name $ServiceName -ErrorAction Stop
switch ($Action) {
"start" {
if ($service.Status -ne [System.ServiceProcess.ServiceControllerStatus]::Running) {
Start-Service -Name $ServiceName -ErrorAction Stop
Wait-ForStatus -Service $service -TargetStatus ([System.ServiceProcess.ServiceControllerStatus]::Running)
}
}
"stop" {
if ($service.Status -ne [System.ServiceProcess.ServiceControllerStatus]::Stopped) {
Stop-Service -Name $ServiceName -Force -ErrorAction Stop
Wait-ForStatus -Service $service -TargetStatus ([System.ServiceProcess.ServiceControllerStatus]::Stopped)
}
}
"restart" {
if ($service.Status -eq [System.ServiceProcess.ServiceControllerStatus]::Running) {
Restart-Service -Name $ServiceName -Force -ErrorAction Stop
} else {
Start-Service -Name $ServiceName -ErrorAction Stop
}
Wait-ForStatus -Service $service -TargetStatus ([System.ServiceProcess.ServiceControllerStatus]::Running)
}
}

View File

@@ -0,0 +1,45 @@
[CmdletBinding(SupportsShouldProcess = $true)]
param(
[switch]$AllUsers,
[string]$ServiceName = "ScreenJobBackend"
)
Set-StrictMode -Version Latest
$ErrorActionPreference = "Stop"
$scriptDir = Split-Path -Parent $PSCommandPath
$shortcutName = "ScreenJob Backend.lnk"
$startupFolder = if ($AllUsers) {
[Environment]::GetFolderPath("CommonStartup")
} else {
[Environment]::GetFolderPath("Startup")
}
$shortcutPath = Join-Path $startupFolder $shortcutName
$service = Get-Service -Name $ServiceName -ErrorAction SilentlyContinue
if ($null -ne $service) {
if ($PSCmdlet.ShouldProcess($ServiceName, "Remove legacy Windows service")) {
if ($service.Status -ne "Stopped") {
Stop-Service -Name $ServiceName -Force -ErrorAction Stop
}
& sc.exe delete $ServiceName | Out-Null
if ($LASTEXITCODE -ne 0) {
throw "Failed to delete service '$ServiceName' (sc.exe exit code $LASTEXITCODE)."
}
Write-Host "Removed legacy Windows service: $ServiceName"
}
}
if (Test-Path -LiteralPath $shortcutPath) {
if ($PSCmdlet.ShouldProcess($shortcutPath, "Remove backend startup shortcut")) {
Remove-Item -LiteralPath $shortcutPath -Force
Write-Host "Removed backend startup shortcut: $shortcutPath"
}
} else {
Write-Host "No backend startup shortcut found at: $shortcutPath"
}
Write-Host "Backend launcher uninstalled successfully." -ForegroundColor Green