Compare commits

..

11 Commits

Author SHA1 Message Date
Space-Banane
880bfb1c70 Fix tray health detection and harden backend service startup
All checks were successful
CI / test (push) Successful in 7s
2026-05-28 13:44:31 +02:00
Space-Banane
114ddd80d6 Add Windows service host and system tray controller
All checks were successful
CI / test (push) Successful in 7s
2026-05-28 13:30:27 +02:00
314311d8fc Merge pull request 'Add lightweight analytics dashboard' (#1) from feat/lightweight-dash into master
All checks were successful
CI / test (push) Successful in 7s
Reviewed-on: #1
2026-05-27 22:50:08 +02:00
Space-Banane
8126b57404 Add lightweight analytics dashboard
All checks were successful
CI / test (push) Successful in 7s
CI / test (pull_request) Successful in 7s
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-27 22:34:26 +02:00
Space-Banane
cceed18cf1 feat: (literally) "enhance" functionality with new parameters and improved image processing
All checks were successful
CI / test (push) Successful in 7s
2026-05-27 22:14:32 +02:00
Space-Banane
880468ef02 Mark completed P1 TODO items as done 2026-05-27 22:05:57 +02:00
Space-Banane
b05a7be668 Compact screenshot context every 4 steps by default 2026-05-27 22:04:15 +02:00
Space-Banane
0c019474af Default model reasoning effort to medium 2026-05-27 22:02:20 +02:00
Space-Banane
a8ef8ee552 Split monitor UI into separate HTML and JS assets
All checks were successful
CI / test (push) Successful in 7s
2026-05-27 22:01:06 +02:00
Space-Banane
111a1e84af feat: implement replay functionality with UI controls and backend support 2026-05-27 21:57:37 +02:00
Space-Banane
620fcc4aa6 removed slop 2026-05-27 21:53:32 +02:00
28 changed files with 2752 additions and 333 deletions

5
.gitignore vendored
View File

@@ -20,3 +20,8 @@ screenjob.db
# IDE # IDE
.vscode/ .vscode/
.idea/ .idea/
# Service host build/publish artifacts
service_host/**/bin/
service_host/**/obj/
service_host/publish/

View File

@@ -109,6 +109,77 @@ Or use the PowerShell launcher:
.\start_backend.ps1 .\start_backend.ps1
``` ```
### Windows Service
Run these from an elevated PowerShell session (Run as Administrator):
Requires .NET SDK 10+ (installer publishes a native service host executable).
Install and start at boot:
```powershell
.\install_backend_service.ps1 -ForceReinstall -StartAfterInstall -DelayedAutoStart
```
Check status:
```powershell
Get-Service -Name ScreenJobBackend
```
Stop/start manually:
```powershell
Stop-Service -Name ScreenJobBackend
Start-Service -Name ScreenJobBackend
```
Uninstall:
```powershell
.\uninstall_backend_service.ps1
```
Service logs are written to:
```text
screenjob_runs/service/backend-service.stdout.log
screenjob_runs/service/backend-service.stderr.log
```
### System Tray Icon (Windows)
Start tray icon now:
```powershell
powershell -NoProfile -ExecutionPolicy Bypass -STA -File .\screenjob_tray.ps1
```
Install startup shortcut (current user):
```powershell
.\install_tray_startup_shortcut.ps1
```
Install startup shortcut for all users:
```powershell
.\install_tray_startup_shortcut.ps1 -AllUsers
```
Remove startup shortcut:
```powershell
.\install_tray_startup_shortcut.ps1 -Remove
```
Tray menu actions:
- Refresh service status
- Start/Stop/Restart service (prompts for admin/UAC)
- Open dashboard URL from `.env` `SCREENJOB_HOST` / `SCREENJOB_PORT`
- Open service logs folder
- Exit tray icon process
Auth for all API routes: Auth for all API routes:
- `Authorization: Bearer <SCREENJOB_TOKEN>` - `Authorization: Bearer <SCREENJOB_TOKEN>`
@@ -156,13 +227,21 @@ Each job payload includes:
- Read-only dashboard (no run controls) - Read-only dashboard (no run controls)
- Requires token input - Requires token input
- Live updates via `/ws` - Live updates via `/ws`
- Analytics dashboards for success rate by objective category and daily averages
- Set `DISABLE_UI=true` to disable UI - Set `DISABLE_UI=true` to disable UI
### Analytics API
- `GET /api/analytics`
- Returns objective-category success rates plus average steps/cost over time
## Agent Instructions (Practical) ## Agent Instructions (Practical)
- Prefer `execute_command` for deterministic actions (opening URLs, filesystem checks). - Prefer `execute_command` for deterministic actions (opening URLs, filesystem checks).
- Use `see_screen` before UI interaction. - Use `see_screen` before UI interaction.
- Use `enhance` when text is unclear. - Use `enhance` before clicking small/ambiguous targets; prefer `region="small"` for compact controls.
- Use `enhance` `mode="text"` for tiny labels/text, or `mode="ui"` for general UI.
- Optionally set `enhance` `scale` (2-6) for tighter zoom control.
- Use `press_key` for non-text keys (Enter, Tab, arrows, Escape). - Use `press_key` for non-text keys (Enter, Tab, arrows, Escape).
- For shortcuts, use one `press_key` call with combo syntax (example: `win+r`). - For shortcuts, use one `press_key` call with combo syntax (example: `win+r`).
- Use `click` offsets via `offset_up/down/left/right` and optional `sleep_after_seconds`. - Use `click` offsets via `offset_up/down/left/right` and optional `sleep_after_seconds`.

View File

@@ -37,6 +37,14 @@ Keyboard combo rule:
- For shortcuts, use one `press_key` call with combo syntax, for example: `win+r`, `ctrl+shift+esc`. - For shortcuts, use one `press_key` call with combo syntax, for example: `win+r`, `ctrl+shift+esc`.
- Do not split modifier combos into separate calls. - Do not split modifier combos into separate calls.
Enhance-first click rule:
- Before clicking small buttons/icons, dense UI, or ambiguous targets, call `enhance` first.
- Preferred preset for tiny controls: `enhance(coordinate, region="small", mode="ui")`.
- For tiny labels/text: use `mode="text"` to improve readability.
- Optional zoom control: set `scale` from `2` to `6` (defaults are tuned by region).
- After checking the enhanced image, click using the same target coordinate (or a small directional offset if needed).
Verification rule: Verification rule:
- Before `task_complete`, verify actual on-screen content matches the expected outcome. - Before `task_complete`, verify actual on-screen content matches the expected outcome.

125
install_backend_service.ps1 Normal file
View File

@@ -0,0 +1,125 @@
[CmdletBinding(SupportsShouldProcess = $true)]
param(
[string]$ServiceName = "ScreenJobBackend",
[string]$DisplayName = "ScreenJob Backend",
[string]$Description = "Runs the ScreenJob backend (start_backend.ps1) as a Windows service.",
[ValidateSet("Automatic", "Manual", "Disabled")]
[string]$StartupType = "Automatic",
[switch]$DelayedAutoStart,
[switch]$ForceReinstall,
[switch]$StartAfterInstall
)
Set-StrictMode -Version Latest
$ErrorActionPreference = "Stop"
function Test-IsAdministrator {
$identity = [Security.Principal.WindowsIdentity]::GetCurrent()
$principal = New-Object Security.Principal.WindowsPrincipal($identity)
return $principal.IsInRole([Security.Principal.WindowsBuiltInRole]::Administrator)
}
if (-not (Test-IsAdministrator)) {
throw "Run this script from an elevated PowerShell session (Run as Administrator)."
}
$scriptDir = Split-Path -Parent $PSCommandPath
$backendScript = Join-Path $scriptDir "start_backend.ps1"
if (-not (Test-Path -LiteralPath $backendScript)) {
throw "Backend launcher script not found: $backendScript"
}
$projectFile = Join-Path $scriptDir "service_host\ScreenJob.WindowsServiceHost\ScreenJob.WindowsServiceHost.csproj"
if (-not (Test-Path -LiteralPath $projectFile)) {
throw "Windows service host project not found: $projectFile"
}
$dotnetCmd = Get-Command dotnet -ErrorAction SilentlyContinue
if ($null -eq $dotnetCmd) {
throw "dotnet SDK was not found in PATH. Install .NET SDK 10+ and retry."
}
$publishDir = Join-Path $scriptDir "service_host\publish"
$serviceExe = Join-Path $publishDir "ScreenJob.WindowsServiceHost.exe"
$logDir = Join-Path $scriptDir "screenjob_runs\service"
$existingService = Get-Service -Name $ServiceName -ErrorAction SilentlyContinue
if ($null -ne $existingService) {
if (-not $ForceReinstall) {
throw "Service '$ServiceName' already exists. Re-run with -ForceReinstall to replace it."
}
if ($PSCmdlet.ShouldProcess($ServiceName, "Remove existing service")) {
if ($existingService.Status -ne "Stopped") {
Stop-Service -Name $ServiceName -Force -ErrorAction Stop
}
& sc.exe delete $ServiceName | Out-Null
if ($LASTEXITCODE -ne 0) {
throw "Failed to delete existing service '$ServiceName' (sc.exe exit code $LASTEXITCODE)."
}
$deadline = (Get-Date).AddSeconds(15)
while ((Get-Date) -lt $deadline) {
$stillThere = Get-Service -Name $ServiceName -ErrorAction SilentlyContinue
if ($null -eq $stillThere) {
break
}
Start-Sleep -Milliseconds 300
}
}
}
if ($PSCmdlet.ShouldProcess($projectFile, "Publish Windows service host")) {
if (Test-Path -LiteralPath $serviceExe) {
Remove-Item -LiteralPath $serviceExe -Force -ErrorAction SilentlyContinue
}
& $dotnetCmd.Source publish `
$projectFile `
-c Release `
-r win-x64 `
--self-contained false `
-p:PublishSingleFile=true `
-o $publishDir
if ($LASTEXITCODE -ne 0) {
throw "dotnet publish failed with exit code $LASTEXITCODE."
}
}
if (-not (Test-Path -LiteralPath $serviceExe)) {
throw "Published service executable not found: $serviceExe"
}
$binaryPath = "`"$serviceExe`" --backend-script `"$backendScript`" --working-dir `"$scriptDir`" --log-dir `"$logDir`""
if ($PSCmdlet.ShouldProcess($ServiceName, "Create service")) {
New-Service `
-Name $ServiceName `
-BinaryPathName $binaryPath `
-DisplayName $DisplayName `
-Description $Description `
-StartupType $StartupType
if ($StartupType -eq "Automatic" -and $DelayedAutoStart) {
& sc.exe config $ServiceName start= delayed-auto | Out-Null
if ($LASTEXITCODE -ne 0) {
throw "Failed to enable delayed auto-start for '$ServiceName' (sc.exe exit code $LASTEXITCODE)."
}
}
# Restart on first/second/subsequent failure after 5 seconds.
& sc.exe failure $ServiceName reset= 86400 actions= restart/5000/restart/5000/restart/5000 | Out-Null
if ($LASTEXITCODE -ne 0) {
throw "Failed to configure failure actions for '$ServiceName' (sc.exe exit code $LASTEXITCODE)."
}
if ($StartAfterInstall) {
Start-Service -Name $ServiceName -ErrorAction Stop
}
}
Write-Host "Service '$ServiceName' installed successfully." -ForegroundColor Green
Write-Host "Check status with: Get-Service -Name $ServiceName"
Write-Host "View logs in: $logDir"

View File

@@ -0,0 +1,47 @@
[CmdletBinding(SupportsShouldProcess = $true)]
param(
[switch]$Remove,
[switch]$AllUsers
)
Set-StrictMode -Version Latest
$ErrorActionPreference = "Stop"
$scriptDir = Split-Path -Parent $PSCommandPath
$vbsLauncher = Join-Path $scriptDir "start_screenjob_tray_hidden.vbs"
$shortcutName = "ScreenJob Tray.lnk"
if (-not (Test-Path -LiteralPath $vbsLauncher)) {
throw "Launcher file not found: $vbsLauncher"
}
$startupFolder = if ($AllUsers) {
[Environment]::GetFolderPath("CommonStartup")
} else {
[Environment]::GetFolderPath("Startup")
}
$shortcutPath = Join-Path $startupFolder $shortcutName
if ($Remove) {
if (Test-Path -LiteralPath $shortcutPath) {
if ($PSCmdlet.ShouldProcess($shortcutPath, "Remove startup shortcut")) {
Remove-Item -LiteralPath $shortcutPath -Force
Write-Host "Removed startup shortcut: $shortcutPath"
}
} else {
Write-Host "No startup shortcut found at: $shortcutPath"
}
return
}
if ($PSCmdlet.ShouldProcess($shortcutPath, "Create startup shortcut")) {
$shell = New-Object -ComObject WScript.Shell
$shortcut = $shell.CreateShortcut($shortcutPath)
$shortcut.TargetPath = "$env:SystemRoot\System32\wscript.exe"
$shortcut.Arguments = '"' + $vbsLauncher + '"'
$shortcut.WorkingDirectory = $scriptDir
$shortcut.Description = "Launch ScreenJob tray icon at sign-in."
$shortcut.Save()
Write-Host "Created startup shortcut: $shortcutPath"
}

307
screenjob_tray.ps1 Normal file
View File

@@ -0,0 +1,307 @@
param(
[string]$ServiceName = "ScreenJobBackend"
)
Set-StrictMode -Version Latest
$ErrorActionPreference = "Stop"
Add-Type -AssemblyName System.Windows.Forms
Add-Type -AssemblyName System.Drawing
$scriptDir = Split-Path -Parent $MyInvocation.MyCommand.Path
$controlScript = Join-Path $scriptDir "tray_service_control.ps1"
$logsDir = Join-Path $scriptDir "screenjob_runs\service"
$defaultHost = "127.0.0.1"
$defaultPort = "8787"
function Read-EnvConfig {
param([string]$EnvFilePath)
$result = @{}
if (-not (Test-Path -LiteralPath $EnvFilePath)) {
return $result
}
foreach ($line in Get-Content -Path $EnvFilePath) {
$trimmed = $line.Trim()
if ($trimmed.Length -eq 0 -or $trimmed.StartsWith("#")) {
continue
}
$parts = $trimmed.Split("=", 2)
if ($parts.Count -eq 2) {
$key = $parts[0].Trim()
$value = $parts[1].Trim()
if (($value.StartsWith('"') -and $value.EndsWith('"')) -or ($value.StartsWith("'") -and $value.EndsWith("'"))) {
$value = $value.Substring(1, $value.Length - 2)
}
$result[$key] = $value
}
}
return $result
}
function Get-ServiceStatusSafe {
param([string]$Name)
try {
$svc = Get-Service -Name $Name -ErrorAction Stop
return $svc.Status.ToString()
} catch {
return "NotInstalled"
}
}
function Invoke-ServiceActionElevated {
param(
[Parameter(Mandatory = $true)][string]$Action,
[Parameter(Mandatory = $true)][string]$Name
)
if (-not (Test-Path -LiteralPath $controlScript)) {
[System.Windows.Forms.MessageBox]::Show(
"Missing control script: $controlScript",
"ScreenJob Tray",
[System.Windows.Forms.MessageBoxButtons]::OK,
[System.Windows.Forms.MessageBoxIcon]::Error
) | Out-Null
return
}
$argList = @(
"-NoProfile",
"-ExecutionPolicy", "Bypass",
"-File", "`"$controlScript`"",
"-Action", $Action,
"-ServiceName", $Name
)
try {
Start-Process -FilePath "powershell.exe" -ArgumentList $argList -Verb RunAs -WindowStyle Hidden | Out-Null
} catch {
# User canceled UAC prompt or launch failed.
}
}
function Get-DashboardUrl {
$envFile = Join-Path $scriptDir ".env"
$envVars = Read-EnvConfig -EnvFilePath $envFile
$dashboardHost = $defaultHost
$dashboardPort = $defaultPort
if ($envVars.ContainsKey("SCREENJOB_HOST") -and -not [string]::IsNullOrWhiteSpace($envVars["SCREENJOB_HOST"])) {
$dashboardHost = $envVars["SCREENJOB_HOST"]
}
if ($envVars.ContainsKey("SCREENJOB_PORT") -and -not [string]::IsNullOrWhiteSpace($envVars["SCREENJOB_PORT"])) {
$dashboardPort = $envVars["SCREENJOB_PORT"]
}
$connectHost = Resolve-ConnectHost -ConfiguredHost $dashboardHost
return "http://{0}:{1}/" -f $connectHost, $dashboardPort
}
function Resolve-ConnectHost {
param([string]$ConfiguredHost)
if ([string]::IsNullOrWhiteSpace($ConfiguredHost)) {
return "127.0.0.1"
}
switch ($ConfiguredHost.Trim().ToLowerInvariant()) {
"0.0.0.0" { return "127.0.0.1" }
"::" { return "127.0.0.1" }
"*" { return "127.0.0.1" }
default { return $ConfiguredHost }
}
}
function Get-HealthCheckHosts {
param([string]$ConfiguredHost)
if ([string]::IsNullOrWhiteSpace($ConfiguredHost)) {
return @("127.0.0.1", "localhost")
}
$normalized = $ConfiguredHost.Trim().ToLowerInvariant()
switch ($normalized) {
"0.0.0.0" { return @("127.0.0.1", "localhost", "::1") }
"::" { return @("127.0.0.1", "localhost", "::1") }
"*" { return @("127.0.0.1", "localhost", "::1") }
default { return @($ConfiguredHost) }
}
}
function Test-TcpEndpoint {
param(
[Parameter(Mandatory = $true)][string]$HostName,
[Parameter(Mandatory = $true)][int]$Port,
[int]$TimeoutMs = 1200
)
$client = New-Object System.Net.Sockets.TcpClient
try {
$async = $client.BeginConnect($HostName, $Port, $null, $null)
$connected = $async.AsyncWaitHandle.WaitOne($TimeoutMs, $false)
if (-not $connected) {
return $false
}
$client.EndConnect($async) | Out-Null
return $true
} catch {
return $false
} finally {
$client.Dispose()
}
}
function Get-BackendReachability {
$envFile = Join-Path $scriptDir ".env"
$envVars = Read-EnvConfig -EnvFilePath $envFile
$configuredHost = $defaultHost
$configuredPort = $defaultPort
if ($envVars.ContainsKey("SCREENJOB_HOST") -and -not [string]::IsNullOrWhiteSpace($envVars["SCREENJOB_HOST"])) {
$configuredHost = $envVars["SCREENJOB_HOST"]
}
if ($envVars.ContainsKey("SCREENJOB_PORT") -and -not [string]::IsNullOrWhiteSpace($envVars["SCREENJOB_PORT"])) {
$configuredPort = $envVars["SCREENJOB_PORT"]
}
$portNumber = 8787
[void][int]::TryParse([string]$configuredPort, [ref]$portNumber)
$hostsToTry = Get-HealthCheckHosts -ConfiguredHost $configuredHost
foreach ($candidateHost in $hostsToTry) {
if (Test-TcpEndpoint -HostName $candidateHost -Port $portNumber) {
return $true
}
}
return $false
}
function Update-TrayState {
param(
[System.Windows.Forms.NotifyIcon]$NotifyIcon,
[System.Windows.Forms.ToolStripMenuItem]$StatusItem,
[string]$Name
)
$status = Get-ServiceStatusSafe -Name $Name
$isBackendReachable = Get-BackendReachability
$displayStatus = $status
if ($status -eq "Running" -and -not $isBackendReachable) {
$displayStatus = "Running (Backend Down)"
} elseif ($status -eq "Stopped" -and $isBackendReachable) {
$displayStatus = "Stopped (Backend Up)"
} elseif ($status -eq "NotInstalled" -and $isBackendReachable) {
$displayStatus = "NotInstalled (Backend Up)"
}
$StatusItem.Text = "Status: $displayStatus"
switch ($displayStatus) {
"Running" {
$NotifyIcon.Icon = [System.Drawing.SystemIcons]::Information
}
"Stopped" {
$NotifyIcon.Icon = [System.Drawing.SystemIcons]::Warning
}
default {
$NotifyIcon.Icon = [System.Drawing.SystemIcons]::Error
}
}
$tooltip = "ScreenJob Backend: $displayStatus"
if ($tooltip.Length -gt 63) {
$tooltip = $tooltip.Substring(0, 63)
}
$NotifyIcon.Text = $tooltip
}
$appContext = New-Object System.Windows.Forms.ApplicationContext
$notifyIcon = New-Object System.Windows.Forms.NotifyIcon
$notifyIcon.Visible = $false
$menu = New-Object System.Windows.Forms.ContextMenuStrip
$statusItem = New-Object System.Windows.Forms.ToolStripMenuItem "Status: Unknown"
$statusItem.Enabled = $false
$refreshItem = New-Object System.Windows.Forms.ToolStripMenuItem "Refresh Status"
$refreshItem.Add_Click({
Update-TrayState -NotifyIcon $notifyIcon -StatusItem $statusItem -Name $ServiceName
})
$startItem = New-Object System.Windows.Forms.ToolStripMenuItem "Start Service (Admin)"
$startItem.Add_Click({
Invoke-ServiceActionElevated -Action "start" -Name $ServiceName
})
$stopItem = New-Object System.Windows.Forms.ToolStripMenuItem "Stop Service (Admin)"
$stopItem.Add_Click({
Invoke-ServiceActionElevated -Action "stop" -Name $ServiceName
})
$restartItem = New-Object System.Windows.Forms.ToolStripMenuItem "Restart Service (Admin)"
$restartItem.Add_Click({
Invoke-ServiceActionElevated -Action "restart" -Name $ServiceName
})
$dashboardItem = New-Object System.Windows.Forms.ToolStripMenuItem "Open Dashboard"
$dashboardItem.Add_Click({
$url = Get-DashboardUrl
Start-Process $url | Out-Null
})
$logsItem = New-Object System.Windows.Forms.ToolStripMenuItem "Open Service Logs"
$logsItem.Add_Click({
if (-not (Test-Path -LiteralPath $logsDir)) {
New-Item -ItemType Directory -Path $logsDir -Force | Out-Null
}
Start-Process explorer.exe $logsDir | Out-Null
})
$openFolderItem = New-Object System.Windows.Forms.ToolStripMenuItem "Open Project Folder"
$openFolderItem.Add_Click({
Start-Process explorer.exe $scriptDir | Out-Null
})
$exitItem = New-Object System.Windows.Forms.ToolStripMenuItem "Exit Tray"
$exitItem.Add_Click({
$refreshTimer.Stop()
$notifyIcon.Visible = $false
$notifyIcon.Dispose()
$menu.Dispose()
$appContext.ExitThread()
})
[void]$menu.Items.Add($statusItem)
[void]$menu.Items.Add($refreshItem)
[void]$menu.Items.Add((New-Object System.Windows.Forms.ToolStripSeparator))
[void]$menu.Items.Add($startItem)
[void]$menu.Items.Add($stopItem)
[void]$menu.Items.Add($restartItem)
[void]$menu.Items.Add((New-Object System.Windows.Forms.ToolStripSeparator))
[void]$menu.Items.Add($dashboardItem)
[void]$menu.Items.Add($logsItem)
[void]$menu.Items.Add($openFolderItem)
[void]$menu.Items.Add((New-Object System.Windows.Forms.ToolStripSeparator))
[void]$menu.Items.Add($exitItem)
$notifyIcon.ContextMenuStrip = $menu
$notifyIcon.Visible = $true
$notifyIcon.Add_DoubleClick({
$url = Get-DashboardUrl
Start-Process $url | Out-Null
})
$refreshTimer = New-Object System.Windows.Forms.Timer
$refreshTimer.Interval = 5000
$refreshTimer.Add_Tick({
Update-TrayState -NotifyIcon $notifyIcon -StatusItem $statusItem -Name $ServiceName
})
Update-TrayState -NotifyIcon $notifyIcon -StatusItem $statusItem -Name $ServiceName
$refreshTimer.Start()
[System.Windows.Forms.Application]::Run($appContext)

View File

@@ -0,0 +1,138 @@
using System.Diagnostics;
using Microsoft.Extensions.Hosting;
using Microsoft.Extensions.Logging;
namespace ScreenJob.WindowsServiceHost;
internal sealed class BackendProcessService : BackgroundService
{
private readonly ILogger<BackendProcessService> _logger;
private readonly ServiceOptions _options;
private readonly object _logLock = new();
private Process? _backendProcess;
private string _stdoutLogPath = string.Empty;
private string _stderrLogPath = string.Empty;
public BackendProcessService(ILogger<BackendProcessService> logger, ServiceOptions options)
{
_logger = logger;
_options = options;
}
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
Directory.CreateDirectory(_options.LogDirectory);
_stdoutLogPath = Path.Combine(_options.LogDirectory, "backend-service.stdout.log");
_stderrLogPath = Path.Combine(_options.LogDirectory, "backend-service.stderr.log");
LogStdOut("Service host starting backend process.");
LogStdOut($"Script: {_options.BackendScriptPath}");
LogStdOut($"Working directory: {_options.WorkingDirectory}");
var powershellPath = Path.Combine(
Environment.GetFolderPath(Environment.SpecialFolder.Windows),
"System32",
"WindowsPowerShell",
"v1.0",
"powershell.exe");
var startInfo = new ProcessStartInfo
{
FileName = powershellPath,
Arguments = $"-NoProfile -ExecutionPolicy Bypass -File \"{_options.BackendScriptPath}\"",
WorkingDirectory = _options.WorkingDirectory,
RedirectStandardOutput = true,
RedirectStandardError = true,
UseShellExecute = false,
CreateNoWindow = true
};
_backendProcess = new Process { StartInfo = startInfo };
if (!_backendProcess.Start())
{
throw new InvalidOperationException("Failed to start backend process.");
}
LogStdOut($"Backend process started with PID {_backendProcess.Id}.");
_logger.LogInformation("Backend process started with PID {Pid}.", _backendProcess.Id);
var stdoutPump = PumpStreamAsync(_backendProcess.StandardOutput, LogStdOut, stoppingToken);
var stderrPump = PumpStreamAsync(_backendProcess.StandardError, LogStdErr, stoppingToken);
try
{
await _backendProcess.WaitForExitAsync(stoppingToken);
var exitCode = _backendProcess.ExitCode;
LogStdErr($"Backend process exited unexpectedly with code {exitCode}.");
_logger.LogError("Backend process exited unexpectedly with code {ExitCode}.", exitCode);
Environment.ExitCode = exitCode == 0 ? 1 : exitCode;
throw new InvalidOperationException(
$"Backend process ended unexpectedly. Service host exit code: {Environment.ExitCode}.");
}
catch (OperationCanceledException)
{
LogStdOut("Service stop requested.");
}
finally
{
await Task.WhenAll(stdoutPump, stderrPump);
}
}
public override async Task StopAsync(CancellationToken cancellationToken)
{
if (_backendProcess is { HasExited: false })
{
try
{
LogStdOut("Stopping backend process.");
_backendProcess.Kill(entireProcessTree: true);
}
catch (Exception ex)
{
LogStdErr($"Failed to stop backend process cleanly: {ex.Message}");
_logger.LogError(ex, "Failed to stop backend process cleanly.");
}
}
await base.StopAsync(cancellationToken);
}
private async Task PumpStreamAsync(
StreamReader reader,
Action<string> sink,
CancellationToken stoppingToken)
{
while (!stoppingToken.IsCancellationRequested)
{
var line = await reader.ReadLineAsync();
if (line is null)
{
break;
}
sink(line);
}
}
private void LogStdOut(string message)
{
WriteLog(_stdoutLogPath, message);
}
private void LogStdErr(string message)
{
WriteLog(_stderrLogPath, message);
}
private void WriteLog(string path, string message)
{
var stamp = DateTimeOffset.Now.ToString("yyyy-MM-dd HH:mm:ss");
var line = $"[{stamp}] {message}{Environment.NewLine}";
lock (_logLock)
{
File.AppendAllText(path, line);
}
}
}

View File

@@ -0,0 +1,18 @@
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
using ScreenJob.WindowsServiceHost;
var options = ServiceOptions.Parse(args);
Host.CreateDefaultBuilder(args)
.UseWindowsService(serviceOptions =>
{
serviceOptions.ServiceName = "ScreenJobBackend";
})
.ConfigureServices(services =>
{
services.AddSingleton(options);
services.AddHostedService<BackendProcessService>();
})
.Build()
.Run();

View File

@@ -0,0 +1,12 @@
<Project Sdk="Microsoft.NET.Sdk.Worker">
<PropertyGroup>
<TargetFramework>net10.0-windows</TargetFramework>
<Nullable>enable</Nullable>
<ImplicitUsings>enable</ImplicitUsings>
<OutputType>Exe</OutputType>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="Microsoft.Extensions.Hosting.WindowsServices" Version="10.0.0" />
</ItemGroup>
</Project>

View File

@@ -0,0 +1,77 @@
namespace ScreenJob.WindowsServiceHost;
internal sealed record ServiceOptions(
string BackendScriptPath,
string WorkingDirectory,
string LogDirectory)
{
public static ServiceOptions Parse(string[] args)
{
var map = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase);
for (var i = 0; i < args.Length; i++)
{
var raw = args[i];
if (!raw.StartsWith("--", StringComparison.Ordinal))
{
continue;
}
var key = raw[2..];
if (string.IsNullOrWhiteSpace(key))
{
continue;
}
if (i + 1 < args.Length && !args[i + 1].StartsWith("--", StringComparison.Ordinal))
{
map[key] = args[++i];
}
else
{
map[key] = "true";
}
}
if (!map.TryGetValue("backend-script", out var backendScript) || string.IsNullOrWhiteSpace(backendScript))
{
throw new ArgumentException("Missing required argument: --backend-script <absolute-path-to-start_backend.ps1>.");
}
if (!Path.IsPathRooted(backendScript))
{
throw new ArgumentException("The --backend-script value must be an absolute path.");
}
if (!File.Exists(backendScript))
{
throw new FileNotFoundException("Backend script not found.", backendScript);
}
if (!map.TryGetValue("working-dir", out var workingDir) || string.IsNullOrWhiteSpace(workingDir))
{
workingDir = Path.GetDirectoryName(backendScript)
?? throw new ArgumentException("Could not resolve working directory from backend script path.");
}
if (!Path.IsPathRooted(workingDir))
{
throw new ArgumentException("The --working-dir value must be an absolute path.");
}
if (!map.TryGetValue("log-dir", out var logDir) || string.IsNullOrWhiteSpace(logDir))
{
logDir = Path.Combine(workingDir, "screenjob_runs", "service");
}
if (!Path.IsPathRooted(logDir))
{
throw new ArgumentException("The --log-dir value must be an absolute path.");
}
return new ServiceOptions(
Path.GetFullPath(backendScript),
Path.GetFullPath(workingDir),
Path.GetFullPath(logDir));
}
}

View File

@@ -9,7 +9,7 @@ import traceback
from typing import Any, Callable from typing import Any, Callable
from openai import OpenAI from openai import OpenAI
from PIL import Image, ImageEnhance, ImageFilter, ImageOps from PIL import Image, ImageDraw, ImageEnhance, ImageFilter, ImageOps
from .models import AgentResult, RunArtifacts, RuntimeOptions, UsageSummary from .models import AgentResult, RunArtifacts, RuntimeOptions, UsageSummary
from .pricing import estimate_cost_usd from .pricing import estimate_cost_usd
@@ -34,7 +34,8 @@ Rules:
- launching apps or running terminal checks - launching apps or running terminal checks
3) For UI tasks, inspect with see_screen before clicking/typing. 3) For UI tasks, inspect with see_screen before clicking/typing.
4) Coordinates are absolute screen pixels (x, y) from top-left. 4) Coordinates are absolute screen pixels (x, y) from top-left.
5) Use enhance(coordinate) when text/UI is unclear. 5) Use enhance before risky clicks: small buttons/icons, dense UI, or when target confidence is below high.
5a) For tiny controls use enhance(coordinate, region="small", mode="ui"). For tiny text use mode="text".
6) For keyboard-heavy interactions, prefer press_key for special keys. 6) For keyboard-heavy interactions, prefer press_key for special keys.
6a) For key combinations, call press_key once with combo syntax (example: "win+r", "ctrl+shift+esc"). Do not split modifier combos across separate calls. 6a) For key combinations, call press_key once with combo syntax (example: "win+r", "ctrl+shift+esc"). Do not split modifier combos across separate calls.
7) You may call multiple tools in one step. If needed, do click then sleep. 7) You may call multiple tools in one step. If needed, do click then sleep.
@@ -76,11 +77,14 @@ class ScreenJobAgent:
self.final_data: Any | None = None self.final_data: Any | None = None
self.previous_response_id: str | None = None self.previous_response_id: str | None = None
self.usage = UsageSummary() self.usage = UsageSummary()
self.objective = ""
self.last_screen_data_url: str | None = None self.last_screen_data_url: str | None = None
self.last_screen_meta: dict[str, Any] | None = None self.last_screen_meta: dict[str, Any] | None = None
self.click_history: list[tuple[int, int, float]] = [] self.click_history: list[tuple[int, int, float]] = []
self.disabled_tools = {tool.strip() for tool in (options.disable_tools or set()) if tool.strip()} self.disabled_tools = {tool.strip() for tool in (options.disable_tools or set()) if tool.strip()}
self.recent_tool_summaries: list[str] = []
self.last_context_compact_step = 0
def _emit(self, event_type: str, payload: dict[str, Any]) -> None: def _emit(self, event_type: str, payload: dict[str, Any]) -> None:
if self.event_callback is None: if self.event_callback is None:
@@ -192,7 +196,10 @@ class ScreenJobAgent:
{ {
"type": "function", "type": "function",
"name": "enhance", "name": "enhance",
"description": "Create enhanced zoom around a coordinate for readability.", "description": (
"Create enhanced zoom around a coordinate for readability and precise targeting. "
"Prefer this before clicking tiny or ambiguous UI targets."
),
"parameters": { "parameters": {
"type": "object", "type": "object",
"properties": { "properties": {
@@ -204,7 +211,19 @@ class ScreenJobAgent:
}, },
"required": ["x", "y"], "required": ["x", "y"],
"additionalProperties": False, "additionalProperties": False,
} },
"region": {
"type": "string",
"enum": ["small", "medium", "large"],
},
"mode": {
"type": "string",
"enum": ["ui", "text"],
},
"scale": {
"type": ["integer", "string"],
"description": "Zoom factor from 2 to 6. Defaults by region.",
},
}, },
"required": ["coordinate"], "required": ["coordinate"],
"additionalProperties": False, "additionalProperties": False,
@@ -352,6 +371,23 @@ class ScreenJobAgent:
sec = max_seconds sec = max_seconds
return sec return sec
def _parse_int(self, value: Any, default: int = 0) -> int:
if value is None:
return default
if isinstance(value, bool):
return int(value)
if isinstance(value, int):
return value
if isinstance(value, float):
return int(round(value))
text = str(value).strip()
if not text:
return default
try:
return int(float(text))
except Exception: # noqa: BLE001
return default
def _tool_see_screen(self, _: dict[str, Any]) -> dict[str, Any]: def _tool_see_screen(self, _: dict[str, Any]) -> dict[str, Any]:
image, meta = self._capture_screen(with_grid=True) image, meta = self._capture_screen(with_grid=True)
out_path = self.artifacts.shots_dir / f"screen_step_{self.step:03d}.png" out_path = self.artifacts.shots_dir / f"screen_step_{self.step:03d}.png"
@@ -369,34 +405,106 @@ class ScreenJobAgent:
def _tool_enhance(self, args: dict[str, Any]) -> dict[str, Any]: def _tool_enhance(self, args: dict[str, Any]) -> dict[str, Any]:
coord = args.get("coordinate") or {} coord = args.get("coordinate") or {}
x = int(coord.get("x", 0)) requested_x = self._parse_int(coord.get("x", 0), default=0)
y = int(coord.get("y", 0)) requested_y = self._parse_int(coord.get("y", 0), default=0)
region = str(args.get("region", "small") or "small").strip().lower()
mode = str(args.get("mode", "ui") or "ui").strip().lower()
if region not in {"small", "medium", "large"}:
region = "small"
if mode not in {"ui", "text"}:
mode = "ui"
region_half_by_preset = {
"small": 96,
"medium": 160,
"large": 240,
}
default_scale_by_region = {
"small": 4,
"medium": 3,
"large": 2,
}
raw_scale = self._parse_int(args.get("scale"), default=0)
scale = raw_scale if raw_scale > 0 else default_scale_by_region[region]
scale = clamp(scale, 2, 6)
base, base_meta = self._capture_screen(with_grid=False) base, base_meta = self._capture_screen(with_grid=False)
width, height = base.size width, height = base.size
region_half = 180 source_x = clamp(requested_x, 0, max(0, width - 1))
left = clamp(x - region_half, 0, width - 1) source_y = clamp(requested_y, 0, max(0, height - 1))
top = clamp(y - region_half, 0, height - 1) region_half = region_half_by_preset[region]
right = clamp(x + region_half, left + 1, width) left = clamp(source_x - region_half, 0, width - 1)
bottom = clamp(y + region_half, top + 1, height) top = clamp(source_y - region_half, 0, height - 1)
right = clamp(source_x + region_half, left + 1, width)
bottom = clamp(source_y + region_half, top + 1, height)
crop = base.crop((left, top, right, bottom)) crop = base.crop((left, top, right, bottom))
upscaled = crop.resize((crop.width * 2, crop.height * 2), Image.Resampling.BICUBIC) out_w = max(2, crop.width * scale)
enhanced = ImageOps.autocontrast(upscaled) out_h = max(2, crop.height * scale)
enhanced = ImageEnhance.Sharpness(enhanced).enhance(2.0) upscaled = crop.resize((out_w, out_h), Image.Resampling.LANCZOS)
enhanced = ImageEnhance.Contrast(enhanced).enhance(1.25)
enhanced = enhanced.filter(ImageFilter.UnsharpMask(radius=1.8, percent=180, threshold=2))
out_path = self.artifacts.enhance_dir / f"enhance_step_{self.step:03d}_{x}_{y}.png" if mode == "text":
text_view = ImageOps.grayscale(upscaled)
text_view = ImageOps.autocontrast(text_view, cutoff=1)
text_view = ImageOps.equalize(text_view)
text_view = ImageEnhance.Contrast(text_view).enhance(1.35)
text_view = ImageEnhance.Sharpness(text_view).enhance(2.1)
processed = text_view.filter(ImageFilter.UnsharpMask(radius=1.2, percent=160, threshold=1)).convert("RGB")
else:
ui_view = ImageOps.autocontrast(upscaled, cutoff=1)
ui_view = ImageEnhance.Contrast(ui_view).enhance(1.2)
ui_view = ImageEnhance.Sharpness(ui_view).enhance(1.8)
processed = ui_view.filter(ImageFilter.UnsharpMask(radius=1.4, percent=150, threshold=2)).convert("RGB")
edges = upscaled.convert("L").filter(ImageFilter.FIND_EDGES)
edges = ImageOps.autocontrast(edges, cutoff=4)
edge_overlay = ImageOps.colorize(edges, black=(0, 0, 0), white=(60, 220, 255))
enhanced = Image.blend(processed, edge_overlay, alpha=0.18)
cx = clamp((source_x - left) * scale, 0, max(0, enhanced.width - 1))
cy = clamp((source_y - top) * scale, 0, max(0, enhanced.height - 1))
draw = ImageDraw.Draw(enhanced)
draw.rectangle([0, 0, enhanced.width - 1, enhanced.height - 1], outline=(255, 80, 80), width=2)
ring_radius = max(10, int(6 * scale / 2))
arm_len = max(14, int(9 * scale / 2))
gap = max(4, int(2 * scale / 2))
line_width = max(2, int(scale / 2))
draw.ellipse(
[cx - ring_radius, cy - ring_radius, cx + ring_radius, cy + ring_radius],
outline=(255, 80, 80),
width=line_width,
)
draw.line([(max(0, cx - arm_len), cy), (max(0, cx - gap), cy)], fill=(255, 80, 80), width=line_width)
draw.line(
[(min(enhanced.width - 1, cx + gap), cy), (min(enhanced.width - 1, cx + arm_len), cy)],
fill=(255, 80, 80),
width=line_width,
)
draw.line([(cx, max(0, cy - arm_len)), (cx, max(0, cy - gap))], fill=(255, 80, 80), width=line_width)
draw.line(
[(cx, min(enhanced.height - 1, cy + gap)), (cx, min(enhanced.height - 1, cy + arm_len))],
fill=(255, 80, 80),
width=line_width,
)
out_path = self.artifacts.enhance_dir / (
f"enhance_step_{self.step:03d}_{source_x}_{source_y}_{region}_{mode}_x{scale}.png"
)
self._save_image(enhanced, out_path) self._save_image(enhanced, out_path)
data_url = image_to_data_url(enhanced, "PNG") data_url = image_to_data_url(enhanced, "PNG")
meta = { meta = {
"captured_at": utc_now_iso(), "captured_at": utc_now_iso(),
"source_coord": {"x": x, "y": y}, "requested_coord": {"x": requested_x, "y": requested_y},
"source_coord": {"x": source_x, "y": source_y},
"source_box": {"left": left, "top": top, "right": right, "bottom": bottom}, "source_box": {"left": left, "top": top, "right": right, "bottom": bottom},
"scale": 2, "region": region,
"mode": mode,
"scale": scale,
"path": str(out_path.resolve()), "path": str(out_path.resolve()),
"size": {"width": enhanced.width, "height": enhanced.height},
"target_pixel": {"x": cx, "y": cy},
"screen_size": {"width": width, "height": height}, "screen_size": {"width": width, "height": height},
"base_capture_meta": base_meta, "base_capture_meta": base_meta,
} }
@@ -628,6 +736,9 @@ class ScreenJobAgent:
return {"_raw": raw} return {"_raw": raw}
def _call_model(self, input_items: list[dict[str, Any]]) -> Any: def _call_model(self, input_items: list[dict[str, Any]]) -> Any:
effort = str(self.options.reasoning_effort or "medium").strip().lower()
if effort not in {"low", "medium", "high"}:
effort = "medium"
return self.client.responses.create( return self.client.responses.create(
model=self.options.model, model=self.options.model,
instructions=SYSTEM_PROMPT, instructions=SYSTEM_PROMPT,
@@ -636,9 +747,85 @@ class ScreenJobAgent:
previous_response_id=self.previous_response_id, previous_response_id=self.previous_response_id,
parallel_tool_calls=True, parallel_tool_calls=True,
max_tool_calls=8, max_tool_calls=8,
reasoning={"effort": effort},
) )
def _record_tool_summary(self, tool_name: str, result: dict[str, Any]) -> None:
ok = bool(result.get("ok"))
status = "ok" if ok else "fail"
summary = f"step={self.step} tool={tool_name} status={status}"
if tool_name == "click":
clicked = result.get("clicked") if isinstance(result.get("clicked"), dict) else {}
x = clicked.get("x")
y = clicked.get("y")
if isinstance(x, int) and isinstance(y, int):
summary = f"{summary} at=({x},{y})"
elif tool_name == "type":
typed_length = int(result.get("typed_length", 0) or 0)
summary = f"{summary} typed_length={typed_length}"
elif tool_name == "press_key":
key = str(result.get("key") or "").strip()
if key:
summary = f"{summary} key={key}"
elif tool_name == "execute_command":
exit_code = result.get("exit_code")
if exit_code is not None:
summary = f"{summary} exit_code={exit_code}"
elif tool_name in {"see_screen", "enhance"}:
meta = result.get("meta") if isinstance(result.get("meta"), dict) else {}
path = str(meta.get("path") or result.get("path") or "").strip()
if path:
summary = f"{summary} image={path}"
if not ok:
error_text = str(result.get("error") or "").strip()
if error_text:
summary = f"{summary} error={error_text[:140]}"
self.recent_tool_summaries.append(summary)
self.recent_tool_summaries = self.recent_tool_summaries[-20:]
def _should_compact_context(self) -> bool:
interval = max(0, int(self.options.screen_context_decay_steps or 0))
if interval <= 0:
return False
if self.previous_response_id is None:
return False
return (self.step - self.last_context_compact_step) >= interval
def _build_compacted_pending_input(self) -> list[dict[str, Any]]:
recent = self.recent_tool_summaries[-8:]
lines = "\n".join(f"- {line}" for line in recent) if recent else "- No recent tool activity."
content = (
"Context compaction activated to decay stale screenshots and reduce token usage.\n"
f"JOB: {self.objective}\n"
f"Current step: {self.step}\n"
"Recent tool activity:\n"
f"{lines}\n"
"Continue execution from the latest screen state. "
"Use tools only, and finish with task_complete when done."
)
compacted_input: list[dict[str, Any]] = [
{
"role": "user",
"content": [
{
"type": "input_text",
"text": content,
}
],
}
]
if self.last_screen_data_url and self.last_screen_meta:
compacted_input.append(
self._build_visual_message(
"Current screen after context compaction",
self.last_screen_data_url,
self.last_screen_meta,
)
)
return compacted_input
def run(self, job: str) -> AgentResult: def run(self, job: str) -> AgentResult:
self.objective = job
started_at = time.time() started_at = time.time()
self.logger.info("Starting run_id=%s model=%s", self.artifacts.run_id, self.options.model) self.logger.info("Starting run_id=%s model=%s", self.artifacts.run_id, self.options.model)
self.logger.info("Job: %s", job) self.logger.info("Job: %s", job)
@@ -648,6 +835,8 @@ class ScreenJobAgent:
{ {
"run_id": self.artifacts.run_id, "run_id": self.artifacts.run_id,
"model": self.options.model, "model": self.options.model,
"reasoning_effort": self.options.reasoning_effort,
"screen_context_decay_steps": self.options.screen_context_decay_steps,
"objective": job, "objective": job,
"disabled_tools": sorted(self.disabled_tools), "disabled_tools": sorted(self.disabled_tools),
}, },
@@ -664,6 +853,8 @@ class ScreenJobAgent:
f"JOB: {job}\n" f"JOB: {job}\n"
"You are in an action loop. Prefer execute_command for deterministic actions. " "You are in an action loop. Prefer execute_command for deterministic actions. "
"For modifier shortcuts, use a single press_key combo (example: win+r). " "For modifier shortcuts, use a single press_key combo (example: win+r). "
"Before clicking tiny buttons/icons or dense UI areas, call enhance first "
"(use region='small'; use mode='text' for tiny text labels). "
"You can return multiple tool calls in one step (example: click then sleep). " "You can return multiple tool calls in one step (example: click then sleep). "
"When done call task_complete(return=..., data=...). " "When done call task_complete(return=..., data=...). "
"Before task_complete, verify the screen content is what was expected " "Before task_complete, verify the screen content is what was expected "
@@ -692,6 +883,19 @@ class ScreenJobAgent:
self.step += 1 self.step += 1
self.logger.info("---- Agent step %d/%d ----", self.step, self.options.max_steps) self.logger.info("---- Agent step %d/%d ----", self.step, self.options.max_steps)
self._emit("step_started", {"step": self.step, "max_steps": self.options.max_steps}) self._emit("step_started", {"step": self.step, "max_steps": self.options.max_steps})
if self._should_compact_context():
self.previous_response_id = None
pending_input = self._build_compacted_pending_input()
self.last_context_compact_step = self.step
self.logger.info("Compacted model context at step %d.", self.step)
self._emit(
"context_compacted",
{
"step": self.step,
"decay_steps": self.options.screen_context_decay_steps,
"recent_tool_summaries": self.recent_tool_summaries[-8:],
},
)
try: try:
response = self._call_model(pending_input) response = self._call_model(pending_input)
self._register_usage(response) self._register_usage(response)
@@ -720,6 +924,8 @@ class ScreenJobAgent:
"text": ( "text": (
"No function call was returned. Continue by using tools. " "No function call was returned. Continue by using tools. "
"Use one press_key call for key combos like win+r. " "Use one press_key call for key combos like win+r. "
"Prefer enhance before clicking small/unclear targets "
"(region='small', mode='ui' or 'text'). "
"You may call multiple tools in one step. " "You may call multiple tools in one step. "
"Before task_complete, verify expected screen content with see_screen/enhance " "Before task_complete, verify expected screen content with see_screen/enhance "
"and include observed_result in data. " "and include observed_result in data. "
@@ -763,6 +969,7 @@ class ScreenJobAgent:
name, name,
json.dumps(result, ensure_ascii=False)[:2500], json.dumps(result, ensure_ascii=False)[:2500],
) )
self._record_tool_summary(name, result)
self._emit("tool_result", {"step": self.step, "tool": name, "result": result}) self._emit("tool_result", {"step": self.step, "tool": name, "result": result})
next_input.append( next_input.append(
{ {

View File

@@ -28,6 +28,18 @@ def build_parser() -> argparse.ArgumentParser:
parser.add_argument("--command-timeout", type=int, default=45, help="Timeout in seconds for execute_command.") parser.add_argument("--command-timeout", type=int, default=45, help="Timeout in seconds for execute_command.")
parser.add_argument("--type-interval", type=float, default=0.02, help="Seconds between typed characters.") parser.add_argument("--type-interval", type=float, default=0.02, help="Seconds between typed characters.")
parser.add_argument("--click-pause", type=float, default=0.10, help="Mouse move duration before click.") parser.add_argument("--click-pause", type=float, default=0.10, help="Mouse move duration before click.")
parser.add_argument(
"--reasoning-effort",
choices=["low", "medium", "high"],
default="medium",
help="Reasoning effort passed to the model.",
)
parser.add_argument(
"--screen-context-decay-steps",
type=int,
default=4,
help="Compact model context every N steps to decay old screenshots (0 disables).",
)
parser.add_argument("--disable-tool", action="append", default=[], help="Disable a tool by name.") parser.add_argument("--disable-tool", action="append", default=[], help="Disable a tool by name.")
parser.add_argument("--skip-safety-check", action="store_true", help="Bypass pre-flight safety check.") parser.add_argument("--skip-safety-check", action="store_true", help="Bypass pre-flight safety check.")
parser.add_argument("--no-failsafe", action="store_true", help="Disable PyAutoGUI fail-safe.") parser.add_argument("--no-failsafe", action="store_true", help="Disable PyAutoGUI fail-safe.")
@@ -78,6 +90,8 @@ def main(argv: list[str] | None = None) -> int:
command_timeout=args.command_timeout, command_timeout=args.command_timeout,
type_interval=args.type_interval, type_interval=args.type_interval,
click_pause=args.click_pause, click_pause=args.click_pause,
reasoning_effort=args.reasoning_effort,
screen_context_decay_steps=max(0, int(args.screen_context_decay_steps)),
disable_tools=set(disabled_tools), disable_tools=set(disabled_tools),
) )
try: try:

View File

@@ -58,4 +58,6 @@ class RuntimeOptions:
command_timeout: int = 45 command_timeout: int = 45
type_interval: float = 0.02 type_interval: float = 0.02
click_pause: float = 0.10 click_pause: float = 0.10
reasoning_effort: str = "medium"
screen_context_decay_steps: int = 4
disable_tools: set[str] | None = None disable_tools: set[str] | None = None

View File

@@ -15,7 +15,8 @@ from pydantic import BaseModel, Field
from .config import AppConfig, load_app_config from .config import AppConfig, load_app_config
from .storage import HistoryDB from .storage import HistoryDB
from .task_manager import JobManager from .task_manager import JobManager
from .ui import monitoring_page_html from .ui import monitoring_js_path, monitoring_page_html
from .utils import utc_now_iso
class CreateJobRequest(BaseModel): class CreateJobRequest(BaseModel):
@@ -25,11 +26,188 @@ class CreateJobRequest(BaseModel):
command_timeout: int = Field(45, ge=1, le=600) command_timeout: int = Field(45, ge=1, le=600)
type_interval: float = Field(0.02, ge=0.0, le=1.0) type_interval: float = Field(0.02, ge=0.0, le=1.0)
click_pause: float = Field(0.10, ge=0.0, le=2.0) click_pause: float = Field(0.10, ge=0.0, le=2.0)
reasoning_effort: str = Field("medium", pattern="^(low|medium|high)$")
screen_context_decay_steps: int = Field(4, ge=0, le=50)
disabled_tools: list[str] = Field(default_factory=list) disabled_tools: list[str] = Field(default_factory=list)
safety_override: bool = False safety_override: bool = False
no_failsafe: bool = False no_failsafe: bool = False
def _safe_int(value: Any) -> int | None:
try:
return int(value)
except Exception: # noqa: BLE001
return None
def _safe_text(value: Any, limit: int = 180) -> str:
text = str(value or "").strip()
if len(text) <= limit:
return text
return f"{text[:limit]}..."
def _resolve_artifact_path(artifacts_dir: Path | None, path_raw: Any) -> Path | None:
if artifacts_dir is None:
return None
text = str(path_raw or "").strip()
if not text:
return None
candidate = Path(text).resolve()
try:
candidate.relative_to(artifacts_dir)
except ValueError:
return None
return candidate
def _extract_replay_action(
event: dict[str, Any],
pending_tool_args: dict[tuple[int, str], list[dict[str, Any]]],
) -> dict[str, Any] | None:
event_type = str(event.get("event_type") or "")
payload = event.get("payload") if isinstance(event.get("payload"), dict) else {}
step = int(event.get("step") or 0)
ts = str(event.get("ts") or "")
event_id = int(event.get("id") or 0)
if event_type == "tool_called":
tool = str(payload.get("tool") or "").strip()
args = payload.get("args") if isinstance(payload.get("args"), dict) else {}
if tool:
pending_tool_args.setdefault((step, tool), []).append(args)
action: dict[str, Any] = {
"ts": ts,
"step": step,
"event_id": event_id,
"kind": "tool_called",
"tool": tool,
"label": f"Call: {tool}" if tool else "Tool call",
}
if tool == "click":
coord = args.get("coordinate") if isinstance(args, dict) else None
if isinstance(coord, dict):
x = _safe_int(coord.get("x"))
y = _safe_int(coord.get("y"))
if x is not None and y is not None:
action["requested_click"] = {"x": x, "y": y}
action["label"] = f"Call: click ({x}, {y})"
elif tool == "type":
text = _safe_text((args or {}).get("text"), 120)
if text:
action["text_preview"] = text
action["label"] = f"Call: type \"{text}\""
return action
if event_type == "tool_result":
tool = str(payload.get("tool") or "").strip()
result = payload.get("result") if isinstance(payload.get("result"), dict) else {}
matching_args: dict[str, Any] = {}
key = (step, tool)
queued = pending_tool_args.get(key) or []
if queued:
matching_args = queued.pop(0)
if not queued:
pending_tool_args.pop(key, None)
action = {
"ts": ts,
"step": step,
"event_id": event_id,
"kind": "tool_result",
"tool": tool,
"ok": bool(result.get("ok")),
"label": f"Result: {tool}",
}
if tool == "click":
clicked = result.get("clicked") if isinstance(result.get("clicked"), dict) else {}
x = _safe_int(clicked.get("x"))
y = _safe_int(clicked.get("y"))
if x is not None and y is not None:
action["click"] = {"x": x, "y": y}
action["label"] = f"Clicked ({x}, {y})" if bool(result.get("ok")) else f"Click failed ({x}, {y})"
elif tool == "type":
text = _safe_text((matching_args or {}).get("text"), 120)
typed_length = _safe_int(result.get("typed_length"))
if typed_length is not None:
action["typed_length"] = typed_length
if text:
action["text_preview"] = text
action["label"] = f"Typed \"{text}\""
elif tool == "press_key":
key_name = _safe_text(result.get("key"), 80)
if key_name:
action["label"] = f"Pressed {key_name}"
elif tool == "execute_command":
command = _safe_text((matching_args or {}).get("command"), 140)
if command:
action["command_preview"] = command
action["label"] = f"Command: {command}"
return action
return None
def _build_replay_payload(job_id: str, job: dict[str, Any], events: list[dict[str, Any]]) -> dict[str, Any]:
artifacts_dir_raw = str(job.get("artifacts_dir") or "").strip()
artifacts_dir = Path(artifacts_dir_raw).resolve() if artifacts_dir_raw else None
pending_tool_args: dict[tuple[int, str], list[dict[str, Any]]] = {}
buffered_actions: list[dict[str, Any]] = []
frames: list[dict[str, Any]] = []
for event in events:
action = _extract_replay_action(event, pending_tool_args)
if action is not None:
buffered_actions.append(action)
if str(event.get("event_type") or "") != "visual_update":
continue
payload = event.get("payload") if isinstance(event.get("payload"), dict) else {}
image_meta = payload.get("image_meta") if isinstance(payload.get("image_meta"), dict) else {}
resolved = _resolve_artifact_path(artifacts_dir, image_meta.get("path"))
if resolved is None or not resolved.exists() or not resolved.is_file():
continue
width = _safe_int(image_meta.get("width"))
height = _safe_int(image_meta.get("height"))
if width is None or height is None:
size = image_meta.get("screen_size") if isinstance(image_meta.get("screen_size"), dict) else {}
width = _safe_int(size.get("width"))
height = _safe_int(size.get("height"))
is_fullscreen = (
str(payload.get("kind") or "") == "see_screen"
and bool(image_meta.get("grid"))
and isinstance(width, int)
and isinstance(height, int)
and width > 0
and height > 0
)
frames.append(
{
"frame_index": len(frames),
"event_id": int(event.get("id") or 0),
"ts": str(event.get("ts") or ""),
"step": int(event.get("step") or 0),
"kind": str(payload.get("kind") or "visual_update"),
"image_path": str(resolved),
"image_meta": image_meta,
"screen_size": {"width": width, "height": height} if width and height else None,
"is_fullscreen": is_fullscreen,
"overlays": buffered_actions,
}
)
buffered_actions = []
return {
"job_id": job_id,
"total_events": len(events),
"total_frames": len(frames),
"frames": frames,
"trailing_events": buffered_actions,
}
class _WebSocketHub: class _WebSocketHub:
def __init__(self) -> None: def __init__(self) -> None:
self._connections: set[WebSocket] = set() self._connections: set[WebSocket] = set()
@@ -126,6 +304,8 @@ def create_app(config: AppConfig | None = None) -> FastAPI:
command_timeout=payload.command_timeout, command_timeout=payload.command_timeout,
type_interval=payload.type_interval, type_interval=payload.type_interval,
click_pause=payload.click_pause, click_pause=payload.click_pause,
reasoning_effort=payload.reasoning_effort,
screen_context_decay_steps=payload.screen_context_decay_steps,
disabled_tools=payload.disabled_tools, disabled_tools=payload.disabled_tools,
safety_override=payload.safety_override, safety_override=payload.safety_override,
no_failsafe=payload.no_failsafe, no_failsafe=payload.no_failsafe,
@@ -161,6 +341,18 @@ def create_app(config: AppConfig | None = None) -> FastAPI:
raise HTTPException(status_code=404, detail="Job not found") raise HTTPException(status_code=404, detail="Job not found")
return {"events": manager.get_events(job_id, limit=limit)} return {"events": manager.get_events(job_id, limit=limit)}
@app.get("/api/jobs/{job_id}/replay")
def get_job_replay(
job_id: str,
limit: int = Query(default=5000, ge=1, le=5000),
_: None = Depends(require_token),
) -> dict[str, Any]:
job = manager.get_job(job_id)
if job is None:
raise HTTPException(status_code=404, detail="Job not found")
events = manager.get_events(job_id, limit=limit)
return _build_replay_payload(job_id, job, events)
@app.post("/api/jobs/{job_id}/cancel") @app.post("/api/jobs/{job_id}/cancel")
def cancel_job(job_id: str, _: None = Depends(require_token)) -> dict[str, Any]: def cancel_job(job_id: str, _: None = Depends(require_token)) -> dict[str, Any]:
job = manager.get_job(job_id) job = manager.get_job(job_id)
@@ -195,11 +387,21 @@ def create_app(config: AppConfig | None = None) -> FastAPI:
def stats(_: None = Depends(require_token)) -> dict[str, Any]: def stats(_: None = Depends(require_token)) -> dict[str, Any]:
return manager.stats() return manager.stats()
@app.get("/api/analytics")
def analytics(_: None = Depends(require_token)) -> dict[str, Any]:
payload = manager.analytics()
payload["generated_at"] = utc_now_iso()
return payload
if not app_config.disable_ui: if not app_config.disable_ui:
@app.get("/", response_class=HTMLResponse) @app.get("/", response_class=HTMLResponse)
def ui_root() -> str: def ui_root() -> str:
return monitoring_page_html(device_hostname=device_hostname) return monitoring_page_html(device_hostname=device_hostname)
@app.get("/ui/monitoring.js")
def ui_monitoring_js() -> FileResponse:
return FileResponse(str(monitoring_js_path()), media_type="application/javascript")
@app.websocket("/ws") @app.websocket("/ws")
async def ws_endpoint(websocket: WebSocket, token: str = Query(default="")) -> None: async def ws_endpoint(websocket: WebSocket, token: str = Query(default="")) -> None:
if not token or not secrets.compare_digest(token, app_config.screenjob_token): if not token or not secrets.compare_digest(token, app_config.screenjob_token):

View File

@@ -7,6 +7,39 @@ from pathlib import Path
from typing import Any from typing import Any
_TERMINAL_STATUSES = {"completed", "failed", "cancelled"}
_CATEGORY_RULES: tuple[tuple[str, tuple[str, ...]], ...] = (
(
"Browser / web",
("browser", "website", "webpage", "chrome", "url", "amazon", "google", "login", "shopping", "checkout", "orders"),
),
(
"Files / terminal",
("file", "folder", "directory", "terminal", "shell", "command", "cli", "script", "git", "repo", "install", "pip", "npm", "powershell", "bash"),
),
(
"Writing / docs",
("write", "summary", "summarize", "document", "docs", "report", "email", "message", "readme", "markdown", "note", "proposal"),
),
(
"Data / analysis",
("data", "analysis", "analyze", "csv", "spreadsheet", "sheet", "table", "chart", "dashboard", "metric", "metrics", "sql"),
),
(
"Development / ops",
("code", "bug", "fix", "test", "debug", "api", "backend", "frontend", "database", "deploy", "docker", "service", "build"),
),
)
def _objective_category(objective: str) -> str:
text = objective.lower()
for category, keywords in _CATEGORY_RULES:
if any(keyword in text for keyword in keywords):
return category
return "Other"
class HistoryDB: class HistoryDB:
def __init__(self, db_path: Path) -> None: def __init__(self, db_path: Path) -> None:
self.db_path = db_path self.db_path = db_path
@@ -184,6 +217,131 @@ class HistoryDB:
).fetchone() ).fetchone()
return dict(totals) if totals else {} return dict(totals) if totals else {}
def analytics(self) -> dict[str, Any]:
with self._connect() as conn:
rows = conn.execute(
"""
SELECT job_id, objective, status, steps, estimated_cost_usd, created_at
FROM jobs
ORDER BY created_at ASC, job_id ASC
"""
).fetchall()
total_jobs = 0
finished_jobs = 0
completed_jobs = 0
failed_jobs = 0
cancelled_jobs = 0
steps_sum = 0
steps_count = 0
cost_sum = 0.0
cost_count = 0
by_category: dict[str, dict[str, Any]] = {}
by_day: dict[str, dict[str, Any]] = {}
def _bucket(target: dict[str, dict[str, Any]], key: str) -> dict[str, Any]:
bucket = target.setdefault(
key,
{
"label": key,
"total_jobs": 0,
"finished_jobs": 0,
"completed_jobs": 0,
"failed_jobs": 0,
"cancelled_jobs": 0,
"steps_sum": 0,
"steps_count": 0,
"cost_sum": 0.0,
"cost_count": 0,
},
)
return bucket
for row in rows:
total_jobs += 1
status = str(row["status"] or "")
finished = status in _TERMINAL_STATUSES
completed = status == "completed"
objective = str(row["objective"] or "")
category = _objective_category(objective)
created_at = str(row["created_at"] or "")
day = created_at[:10] if len(created_at) >= 10 else created_at or "unknown"
category_bucket = _bucket(by_category, category)
day_bucket = _bucket(by_day, day)
for bucket in (category_bucket, day_bucket):
bucket["total_jobs"] += 1
if not finished:
continue
finished_jobs += 1
if completed:
completed_jobs += 1
elif status == "failed":
failed_jobs += 1
elif status == "cancelled":
cancelled_jobs += 1
steps = row["steps"]
if steps is not None:
step_value = int(steps)
steps_sum += step_value
steps_count += 1
for bucket in (category_bucket, day_bucket):
bucket["steps_sum"] += step_value
bucket["steps_count"] += 1
estimated_cost = row["estimated_cost_usd"]
if estimated_cost is not None:
cost_value = float(estimated_cost)
cost_sum += cost_value
cost_count += 1
for bucket in (category_bucket, day_bucket):
bucket["cost_sum"] += cost_value
bucket["cost_count"] += 1
for bucket in (category_bucket, day_bucket):
bucket["finished_jobs"] += 1
if completed:
bucket["completed_jobs"] += 1
elif status == "failed":
bucket["failed_jobs"] += 1
elif status == "cancelled":
bucket["cancelled_jobs"] += 1
def _finalize(bucket: dict[str, Any]) -> dict[str, Any]:
finished = bucket["finished_jobs"]
return {
"label": bucket["label"],
"total_jobs": bucket["total_jobs"],
"finished_jobs": finished,
"completed_jobs": bucket["completed_jobs"],
"failed_jobs": bucket["failed_jobs"],
"cancelled_jobs": bucket["cancelled_jobs"],
"success_rate": round((bucket["completed_jobs"] / finished) * 100, 2) if finished else 0.0,
"avg_steps": round(bucket["steps_sum"] / bucket["steps_count"], 2) if bucket["steps_count"] else None,
"avg_cost_usd": round(bucket["cost_sum"] / bucket["cost_count"], 6) if bucket["cost_count"] else None,
}
category_rows = [_finalize(bucket) for bucket in by_category.values()]
category_rows.sort(key=lambda item: (-item["success_rate"], item["label"]))
day_rows = [_finalize(bucket) for bucket in by_day.values()]
day_rows.sort(key=lambda item: item["label"])
return {
"total_jobs": total_jobs,
"finished_jobs": finished_jobs,
"completed_jobs": completed_jobs,
"failed_jobs": failed_jobs,
"cancelled_jobs": cancelled_jobs,
"success_rate": round((completed_jobs / finished_jobs) * 100, 2) if finished_jobs else 0.0,
"avg_steps": round(steps_sum / steps_count, 2) if steps_count else None,
"avg_cost_usd": round(cost_sum / cost_count, 6) if cost_count else None,
"by_category": category_rows,
"timeline": day_rows,
}
def _row_to_job(self, row: sqlite3.Row) -> dict[str, Any]: def _row_to_job(self, row: sqlite3.Row) -> dict[str, Any]:
disabled_tools: list[str] = [] disabled_tools: list[str] = []
try: try:

View File

@@ -48,6 +48,8 @@ class JobManager:
command_timeout: int = 45, command_timeout: int = 45,
type_interval: float = 0.02, type_interval: float = 0.02,
click_pause: float = 0.10, click_pause: float = 0.10,
reasoning_effort: str = "medium",
screen_context_decay_steps: int = 4,
disabled_tools: list[str] | None = None, disabled_tools: list[str] | None = None,
safety_override: bool = False, safety_override: bool = False,
no_failsafe: bool = False, no_failsafe: bool = False,
@@ -93,6 +95,8 @@ class JobManager:
"command_timeout": command_timeout, "command_timeout": command_timeout,
"type_interval": type_interval, "type_interval": type_interval,
"click_pause": click_pause, "click_pause": click_pause,
"reasoning_effort": reasoning_effort,
"screen_context_decay_steps": screen_context_decay_steps,
"no_failsafe": no_failsafe, "no_failsafe": no_failsafe,
"cancel_event": cancel_event, "cancel_event": cancel_event,
}, },
@@ -121,6 +125,8 @@ class JobManager:
command_timeout: int, command_timeout: int,
type_interval: float, type_interval: float,
click_pause: float, click_pause: float,
reasoning_effort: str,
screen_context_decay_steps: int,
no_failsafe: bool, no_failsafe: bool,
cancel_event: threading.Event, cancel_event: threading.Event,
) -> None: ) -> None:
@@ -218,6 +224,8 @@ class JobManager:
command_timeout=command_timeout, command_timeout=command_timeout,
type_interval=type_interval, type_interval=type_interval,
click_pause=click_pause, click_pause=click_pause,
reasoning_effort=reasoning_effort,
screen_context_decay_steps=max(0, int(screen_context_decay_steps)),
disable_tools=set(disabled_tools), disable_tools=set(disabled_tools),
) )
try: try:
@@ -343,6 +351,9 @@ class JobManager:
stats["live_running_threads"] = sum(1 for job in self._running.values() if job.thread.is_alive()) stats["live_running_threads"] = sum(1 for job in self._running.values() if job.thread.is_alive())
return stats return stats
def analytics(self) -> dict[str, Any]:
return self.db.analytics()
def _normalize_job_payload(self, job: dict[str, Any]) -> dict[str, Any]: def _normalize_job_payload(self, job: dict[str, Any]) -> dict[str, Any]:
response = job.get("response") response = job.get("response")
if not isinstance(response, dict): if not isinstance(response, dict):

310
src/ui.py
View File

@@ -1,307 +1,19 @@
from __future__ import annotations from __future__ import annotations
from html import escape from html import escape
from pathlib import Path
_UI_DIR = Path(__file__).resolve().parent / "ui_assets"
_HTML_TEMPLATE_PATH = _UI_DIR / "monitoring.html"
_JS_PATH = _UI_DIR / "monitoring.js"
def monitoring_page_html(device_hostname: str = "") -> str: def monitoring_page_html(device_hostname: str = "") -> str:
host_suffix = f" ({escape(device_hostname)})" if device_hostname else "" host_suffix = f" ({escape(device_hostname)})" if device_hostname else ""
return """<!doctype html> html = _HTML_TEMPLATE_PATH.read_text(encoding="utf-8")
<html lang="en"> return html.replace("__MONITOR_HOST__", host_suffix)
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>ScreenJob Monitor</title>
<script src="https://cdn.tailwindcss.com"></script>
</head>
<body class="bg-slate-950 text-slate-100 min-h-screen">
<div class="max-w-7xl mx-auto p-4 md:p-8 space-y-6">
<header class="flex flex-col gap-3 md:flex-row md:items-center md:justify-between">
<div>
<h1 class="text-2xl md:text-3xl font-bold tracking-tight">ScreenJob Monitor<span class="text-slate-400 text-base md:text-lg font-medium">__MONITOR_HOST__</span></h1>
<p class="text-slate-400 text-sm">Read-only monitoring for active and historical tasks.</p>
</div>
<div class="flex flex-col md:flex-row gap-2 md:items-center">
<input id="tokenInput" type="password" placeholder="SCREENJOB_TOKEN" class="bg-slate-900 border border-slate-700 rounded px-3 py-2 text-sm w-72" />
<button id="saveTokenBtn" class="bg-cyan-500 hover:bg-cyan-400 text-slate-950 font-semibold px-4 py-2 rounded">Connect</button>
</div>
</header>
<section class="grid grid-cols-2 md:grid-cols-6 gap-3" id="stats"></section>
<section class="grid grid-cols-1 lg:grid-cols-5 gap-4"> def monitoring_js_path() -> Path:
<div class="lg:col-span-2 bg-slate-900/70 border border-slate-800 rounded-xl p-4"> return _JS_PATH
<div class="flex items-center justify-between mb-3">
<h2 class="font-semibold">Jobs</h2>
<button id="refreshBtn" class="text-xs bg-slate-800 px-2 py-1 rounded">Refresh</button>
</div>
<div id="jobList" class="space-y-2 max-h-[62vh] overflow-auto"></div>
</div>
<div class="lg:col-span-3 bg-slate-900/70 border border-slate-800 rounded-xl p-4 space-y-3">
<h2 class="font-semibold">Job Detail</h2>
<pre id="jobDetail" class="bg-slate-950 border border-slate-800 rounded p-3 text-xs overflow-auto max-h-[24vh]"></pre>
<h3 class="font-semibold text-sm">Latest Visual</h3>
<div class="bg-slate-950 border border-slate-800 rounded p-2">
<img id="latestVisual" alt="Latest visual update" class="max-h-[24vh] w-full object-contain rounded" />
</div>
<div class="flex items-center justify-between">
<h3 class="font-semibold text-sm">Live Events</h3>
<label for="eventsViewToggle" class="flex items-center gap-2 text-xs text-slate-300 cursor-pointer select-none">
<span>Raw</span>
<input id="eventsViewToggle" type="checkbox" class="accent-cyan-400 h-4 w-4" />
<span>Beautiful</span>
</label>
</div>
<div id="events" class="bg-slate-950 border border-slate-800 rounded p-3 text-xs overflow-auto max-h-[36vh] space-y-1"></div>
</div>
</section>
</div>
<script>
const tokenInput = document.getElementById("tokenInput");
const saveTokenBtn = document.getElementById("saveTokenBtn");
const refreshBtn = document.getElementById("refreshBtn");
const jobListEl = document.getElementById("jobList");
const jobDetailEl = document.getElementById("jobDetail");
const eventsEl = document.getElementById("events");
const statsEl = document.getElementById("stats");
const latestVisualEl = document.getElementById("latestVisual");
const eventsViewToggle = document.getElementById("eventsViewToggle");
const state = {
token: localStorage.getItem("screenjob_token") || "",
jobs: [],
selectedJobId: null,
ws: null,
wsReconnectTimer: null,
eventsViewMode: localStorage.getItem("screenjob_events_view_mode") === "beautiful" ? "beautiful" : "raw"
};
const manuallyClosedSockets = new WeakSet();
tokenInput.value = state.token;
function authHeaders() {
return { "Authorization": "Bearer " + state.token };
}
async function api(path, opts = {}) {
if (!state.token) throw new Error("Token required");
const headers = Object.assign({}, authHeaders(), opts.headers || {});
const response = await fetch(path, Object.assign({}, opts, { headers }));
if (!response.ok) throw new Error(await response.text());
return response.json();
}
function renderStats(stats) {
const cards = [
["Total Jobs", stats.total_jobs || 0],
["Running", stats.running_jobs || 0],
["Completed", stats.completed_jobs || 0],
["Failed", stats.failed_jobs || 0],
["Cancelled", stats.cancelled_jobs || 0],
["Total Cost (USD)", Number(stats.total_estimated_cost || 0).toFixed(4)]
];
statsEl.innerHTML = cards.map(([name, val]) => `
<div class="bg-slate-900/70 border border-slate-800 rounded-xl p-3">
<div class="text-slate-400 text-xs">${name}</div>
<div class="text-lg font-semibold">${val}</div>
</div>
`).join("");
}
function renderJobs() {
jobListEl.innerHTML = state.jobs.map((job) => {
const active = job.job_id === state.selectedJobId;
return `
<button data-job-id="${job.job_id}" class="w-full text-left p-3 rounded border ${active ? "border-cyan-400 bg-slate-800" : "border-slate-800 bg-slate-950"} hover:bg-slate-800">
<div class="flex items-center justify-between">
<span class="font-medium">${job.job_id}</span>
<span class="text-xs px-2 py-0.5 rounded bg-slate-700">${job.status}</span>
</div>
<div class="text-xs text-slate-400 mt-1">${job.model}</div>
<div class="text-xs text-slate-300 mt-1 line-clamp-2">${job.objective}</div>
<div class="text-xs text-slate-500 mt-1">$${Number((job.usage && job.usage.estimated_cost_usd) || 0).toFixed(6)}</div>
</button>
`;
}).join("");
for (const btn of jobListEl.querySelectorAll("button[data-job-id]")) {
btn.addEventListener("click", () => {
state.selectedJobId = btn.getAttribute("data-job-id");
renderJobs();
refreshJobDetail();
});
}
}
function pushEventLine(obj) {
if (!obj || !obj.job_id || !obj.event_type) return;
const line = document.createElement("div");
const ts = obj.ts || "-";
const step = (obj.step ?? "-");
if (state.eventsViewMode === "raw") {
line.className = "border-b border-slate-800 pb-1";
line.textContent = `[${ts}] ${obj.job_id} step=${step} ${obj.event_type} ${JSON.stringify(obj.payload || {})}`;
} else {
const typeColors = {
info: "bg-sky-900/50 text-sky-200 border border-sky-800",
warning: "bg-amber-900/40 text-amber-200 border border-amber-800",
error: "bg-rose-900/40 text-rose-200 border border-rose-800",
visual_update: "bg-emerald-900/40 text-emerald-200 border border-emerald-800",
tool_call: "bg-violet-900/40 text-violet-200 border border-violet-800",
tool_result: "bg-indigo-900/40 text-indigo-200 border border-indigo-800"
};
const dt = new Date(ts);
const tsText = Number.isNaN(dt.getTime()) ? ts : dt.toLocaleString();
const payload = obj.payload || {};
line.className = "rounded-lg border border-slate-800 bg-slate-900/80 p-2 space-y-2";
const header = document.createElement("div");
header.className = "flex flex-wrap items-center gap-2";
const typePill = document.createElement("span");
typePill.className = `px-2 py-0.5 rounded text-[10px] font-semibold ${typeColors[obj.event_type] || "bg-slate-800 text-slate-200 border border-slate-700"}`;
typePill.textContent = obj.event_type;
const stepPill = document.createElement("span");
stepPill.className = "px-2 py-0.5 rounded text-[10px] bg-slate-800 text-slate-300 border border-slate-700";
stepPill.textContent = `step ${step}`;
const tsSpan = document.createElement("span");
tsSpan.className = "text-[10px] text-slate-400";
tsSpan.textContent = tsText;
header.appendChild(typePill);
header.appendChild(stepPill);
header.appendChild(tsSpan);
const jobLine = document.createElement("div");
jobLine.className = "text-[11px] text-slate-300 font-medium";
jobLine.textContent = obj.job_id;
const body = document.createElement("pre");
body.className = "bg-slate-950 border border-slate-800 rounded p-2 text-[11px] text-slate-200 overflow-auto";
body.textContent = JSON.stringify(payload, null, 2);
line.appendChild(header);
line.appendChild(jobLine);
line.appendChild(body);
}
eventsEl.prepend(line);
while (eventsEl.childNodes.length > 400) {
eventsEl.removeChild(eventsEl.lastChild);
}
}
function scheduleWsReconnect() {
if (state.wsReconnectTimer || !state.token) return;
state.wsReconnectTimer = setTimeout(() => {
state.wsReconnectTimer = null;
connectWs();
}, 1200);
}
function updateLatestVisualFromEvent(ev) {
if (!ev || ev.event_type !== "visual_update") return;
if (!state.selectedJobId || ev.job_id !== state.selectedJobId) return;
const imagePath = ev.payload && ev.payload.image_meta && ev.payload.image_meta.path;
if (!imagePath) return;
const q = encodeURIComponent(imagePath);
latestVisualEl.src = `/api/jobs/${state.selectedJobId}/artifact?path=${q}&token=${encodeURIComponent(state.token)}`;
}
async function refreshJobs() {
const payload = await api("/api/jobs?limit=100");
state.jobs = payload.jobs || [];
if (!state.selectedJobId && state.jobs.length > 0) state.selectedJobId = state.jobs[0].job_id;
renderJobs();
}
async function refreshStats() {
const payload = await api("/api/stats");
renderStats(payload);
}
async function refreshJobDetail() {
if (!state.selectedJobId) return;
const [job, events] = await Promise.all([
api(`/api/jobs/${state.selectedJobId}`),
api(`/api/jobs/${state.selectedJobId}/events?limit=120`)
]);
jobDetailEl.textContent = JSON.stringify(job, null, 2);
eventsEl.innerHTML = "";
const list = (events.events || []).slice().reverse();
for (const ev of list) pushEventLine(ev);
const visual = list.find((ev) => ev.event_type === "visual_update");
if (visual) updateLatestVisualFromEvent(visual);
}
function connectWs() {
if (!state.token) return;
if (state.ws && (state.ws.readyState === WebSocket.OPEN || state.ws.readyState === WebSocket.CONNECTING)) {
return;
}
const scheme = location.protocol === "https:" ? "wss" : "ws";
const ws = new WebSocket(`${scheme}://${location.host}/ws?token=${encodeURIComponent(state.token)}`);
state.ws = ws;
ws.onmessage = async (event) => {
try {
const payload = JSON.parse(event.data);
if (!payload || payload.event_type === "connected") return;
pushEventLine(payload);
updateLatestVisualFromEvent(payload);
if (!state.selectedJobId || payload.job_id === state.selectedJobId) {
await refreshJobDetail();
}
await refreshJobs();
await refreshStats();
} catch (err) {
console.error(err);
}
};
ws.onclose = () => {
if (state.ws === ws) state.ws = null;
if (manuallyClosedSockets.has(ws)) {
manuallyClosedSockets.delete(ws);
return;
}
scheduleWsReconnect();
};
}
async function fullRefresh() {
await refreshJobs();
await refreshStats();
await refreshJobDetail();
}
async function connect() {
state.token = tokenInput.value.trim();
localStorage.setItem("screenjob_token", state.token);
if (state.ws) {
manuallyClosedSockets.add(state.ws);
try { state.ws.close(); } catch (_) {}
state.ws = null;
}
if (state.wsReconnectTimer) {
clearTimeout(state.wsReconnectTimer);
state.wsReconnectTimer = null;
}
await fullRefresh();
connectWs();
}
function syncEventsViewToggle() {
eventsViewToggle.checked = state.eventsViewMode === "beautiful";
}
saveTokenBtn.addEventListener("click", () => connect().catch((err) => alert(err.message)));
refreshBtn.addEventListener("click", () => fullRefresh().catch((err) => alert(err.message)));
eventsViewToggle.addEventListener("change", () => {
state.eventsViewMode = eventsViewToggle.checked ? "beautiful" : "raw";
localStorage.setItem("screenjob_events_view_mode", state.eventsViewMode);
refreshJobDetail().catch((err) => alert(err.message));
});
syncEventsViewToggle();
if (state.token) connect().catch(() => {});
</script>
</body>
</html>
""".replace("__MONITOR_HOST__", host_suffix)

View File

@@ -0,0 +1,106 @@
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>ScreenJob Monitor</title>
<script src="https://cdn.tailwindcss.com"></script>
</head>
<body class="bg-slate-950 text-slate-100 min-h-screen">
<div class="max-w-7xl mx-auto p-4 md:p-8 space-y-6">
<header class="flex flex-col gap-3 md:flex-row md:items-center md:justify-between">
<div>
<h1 class="text-2xl md:text-3xl font-bold tracking-tight">ScreenJob Monitor<span class="text-slate-400 text-base md:text-lg font-medium">__MONITOR_HOST__</span></h1>
<p class="text-slate-400 text-sm">Read-only monitoring for active and historical tasks.</p>
</div>
<div class="flex flex-col md:flex-row gap-2 md:items-center">
<input id="tokenInput" type="password" placeholder="SCREENJOB_TOKEN" class="bg-slate-900 border border-slate-700 rounded px-3 py-2 text-sm w-72" />
<button id="saveTokenBtn" class="bg-cyan-500 hover:bg-cyan-400 text-slate-950 font-semibold px-4 py-2 rounded">Connect</button>
</div>
</header>
<section class="grid grid-cols-2 md:grid-cols-6 gap-3" id="stats"></section>
<section class="space-y-3">
<div class="flex items-center justify-between gap-3">
<h2 class="font-semibold">Analytics</h2>
<div id="analyticsMeta" class="text-[11px] text-slate-400"></div>
</div>
<div id="analyticsSummary" class="grid grid-cols-2 md:grid-cols-4 gap-3"></div>
<div class="grid grid-cols-1 xl:grid-cols-2 gap-4">
<div class="bg-slate-900/70 border border-slate-800 rounded-xl p-4 space-y-3">
<div class="flex items-center justify-between gap-3">
<h3 class="font-semibold text-sm">Success by Objective Category</h3>
<div id="analyticsCategorySummary" class="text-[11px] text-slate-400"></div>
</div>
<div id="analyticsCategories" class="space-y-3"></div>
</div>
<div class="bg-slate-900/70 border border-slate-800 rounded-xl p-4 space-y-3">
<div class="flex items-center justify-between gap-3">
<h3 class="font-semibold text-sm">Avg Steps / Cost Over Time</h3>
<div id="analyticsTrendSummary" class="text-[11px] text-slate-400"></div>
</div>
<div id="analyticsTrends" class="space-y-4"></div>
</div>
</div>
</section>
<section class="grid grid-cols-1 lg:grid-cols-5 gap-4">
<div class="lg:col-span-2 bg-slate-900/70 border border-slate-800 rounded-xl p-4">
<div class="flex items-center justify-between mb-3">
<h2 class="font-semibold">Jobs</h2>
<button id="refreshBtn" class="text-xs bg-slate-800 px-2 py-1 rounded">Refresh</button>
</div>
<div id="jobList" class="space-y-2 max-h-[62vh] overflow-auto"></div>
</div>
<div class="lg:col-span-3 bg-slate-900/70 border border-slate-800 rounded-xl p-4 space-y-3">
<h2 class="font-semibold">Job Detail</h2>
<pre id="jobDetail" class="bg-slate-950 border border-slate-800 rounded p-3 text-xs overflow-auto max-h-[24vh]"></pre>
<h3 class="font-semibold text-sm">Latest Visual</h3>
<div class="bg-slate-950 border border-slate-800 rounded p-2">
<img id="latestVisual" alt="Latest visual update" class="max-h-[24vh] w-full object-contain rounded" />
</div>
<div class="flex items-center justify-between">
<h3 class="font-semibold text-sm">Replay</h3>
<div id="replayStatus" class="text-[11px] text-slate-400">No replay loaded.</div>
</div>
<div class="flex flex-wrap items-center gap-2">
<button id="replayPlayBtn" class="text-xs bg-slate-800 px-2 py-1 rounded">Play</button>
<button id="replayPrevBtn" class="text-xs bg-slate-800 px-2 py-1 rounded">Prev</button>
<button id="replayNextBtn" class="text-xs bg-slate-800 px-2 py-1 rounded">Next</button>
<label class="text-xs text-slate-300 flex items-center gap-1">
Speed
<select id="replaySpeed" class="bg-slate-900 border border-slate-700 rounded px-1 py-0.5">
<option value="0.5">0.5x</option>
<option value="1" selected>1.0x</option>
<option value="1.5">1.5x</option>
<option value="2">2.0x</option>
</select>
</label>
</div>
<input id="replaySeek" type="range" min="0" max="0" value="0" class="w-full accent-cyan-400" />
<div class="bg-slate-950 border border-slate-800 rounded p-2">
<div class="relative w-full min-h-[180px] bg-black/40 rounded">
<img id="replayVisual" alt="Replay frame" class="max-h-[30vh] w-full object-contain rounded" />
<svg id="replayOverlay" class="absolute inset-0 w-full h-full pointer-events-none" preserveAspectRatio="xMidYMid meet"></svg>
</div>
<div id="replayFrameMeta" class="text-[11px] text-slate-400 mt-2"></div>
<div id="replayFrameEvents" class="mt-2 space-y-1"></div>
</div>
<div class="flex items-center justify-between">
<h3 class="font-semibold text-sm">Live Events</h3>
<label for="eventsViewToggle" class="flex items-center gap-2 text-xs text-slate-300 cursor-pointer select-none">
<span>Raw</span>
<input id="eventsViewToggle" type="checkbox" class="accent-cyan-400 h-4 w-4" />
<span>Beautiful</span>
</label>
</div>
<div id="events" class="bg-slate-950 border border-slate-800 rounded p-3 text-xs overflow-auto max-h-[36vh] space-y-1"></div>
</div>
</section>
</div>
<script src="/ui/monitoring.js"></script>
</body>
</html>

625
src/ui_assets/monitoring.js Normal file
View File

@@ -0,0 +1,625 @@
const tokenInput = document.getElementById("tokenInput");
const saveTokenBtn = document.getElementById("saveTokenBtn");
const refreshBtn = document.getElementById("refreshBtn");
const jobListEl = document.getElementById("jobList");
const jobDetailEl = document.getElementById("jobDetail");
const eventsEl = document.getElementById("events");
const statsEl = document.getElementById("stats");
const latestVisualEl = document.getElementById("latestVisual");
const eventsViewToggle = document.getElementById("eventsViewToggle");
const replayVisualEl = document.getElementById("replayVisual");
const replayOverlayEl = document.getElementById("replayOverlay");
const replayFrameMetaEl = document.getElementById("replayFrameMeta");
const replayFrameEventsEl = document.getElementById("replayFrameEvents");
const replayStatusEl = document.getElementById("replayStatus");
const replayPlayBtn = document.getElementById("replayPlayBtn");
const replayPrevBtn = document.getElementById("replayPrevBtn");
const replayNextBtn = document.getElementById("replayNextBtn");
const replaySpeedEl = document.getElementById("replaySpeed");
const replaySeekEl = document.getElementById("replaySeek");
const analyticsMetaEl = document.getElementById("analyticsMeta");
const analyticsSummaryEl = document.getElementById("analyticsSummary");
const analyticsCategorySummaryEl = document.getElementById("analyticsCategorySummary");
const analyticsCategoriesEl = document.getElementById("analyticsCategories");
const analyticsTrendSummaryEl = document.getElementById("analyticsTrendSummary");
const analyticsTrendsEl = document.getElementById("analyticsTrends");
const state = {
token: localStorage.getItem("screenjob_token") || "",
jobs: [],
selectedJobId: null,
ws: null,
wsReconnectTimer: null,
eventsViewMode: localStorage.getItem("screenjob_events_view_mode") === "beautiful" ? "beautiful" : "raw",
replay: {
frames: [],
trailingEvents: [],
frameIndex: 0,
isPlaying: false,
speed: 1,
timer: null
}
};
const manuallyClosedSockets = new WeakSet();
const analyticsRefreshEvents = new Set(["job_finished", "job_failed", "job_rejected"]);
tokenInput.value = state.token;
function authHeaders() {
return { "Authorization": "Bearer " + state.token };
}
async function api(path, opts = {}) {
if (!state.token) throw new Error("Token required");
const headers = Object.assign({}, authHeaders(), opts.headers || {});
const response = await fetch(path, Object.assign({}, opts, { headers }));
if (!response.ok) throw new Error(await response.text());
return response.json();
}
function renderStats(stats) {
const cards = [
["Total Jobs", stats.total_jobs || 0],
["Running", stats.running_jobs || 0],
["Completed", stats.completed_jobs || 0],
["Failed", stats.failed_jobs || 0],
["Cancelled", stats.cancelled_jobs || 0],
["Total Cost (USD)", Number(stats.total_estimated_cost || 0).toFixed(4)]
];
statsEl.innerHTML = cards.map(([name, val]) => `
<div class="bg-slate-900/70 border border-slate-800 rounded-xl p-3">
<div class="text-slate-400 text-xs">${name}</div>
<div class="text-lg font-semibold">${val}</div>
</div>
`).join("");
}
function escapeHtml(value) {
return String(value ?? "").replace(/[&<>"']/g, (ch) => ({
"&": "&amp;",
"<": "&lt;",
">": "&gt;",
'"': "&quot;",
"'": "&#39;"
})[ch]);
}
function formatNumber(value, digits = 2) {
const num = Number(value);
return Number.isFinite(num) ? num.toFixed(digits) : "—";
}
function formatCurrency(value, digits = 6) {
const num = Number(value);
return Number.isFinite(num) ? `$${num.toFixed(digits)}` : "—";
}
function formatPercent(value) {
const num = Number(value);
return Number.isFinite(num) ? `${num.toFixed(1)}%` : "—";
}
function formatDateLabel(value) {
const dt = new Date(value);
if (Number.isNaN(dt.getTime())) return String(value || "—");
return dt.toLocaleDateString(undefined, { month: "short", day: "numeric" });
}
function renderMetricCard(label, value) {
return `
<div class="bg-slate-950 border border-slate-800 rounded-xl p-3">
<div class="text-[11px] uppercase tracking-wide text-slate-400">${escapeHtml(label)}</div>
<div class="text-xl font-semibold mt-1">${escapeHtml(value)}</div>
</div>
`;
}
function renderLineChart(title, points, options = {}) {
const color = options.color || "#22d3ee";
const valueLabel = options.valueLabel || "";
const sourcePoints = Array.isArray(points)
? points.filter((point) => Number.isFinite(Number(point.value)))
: [];
if (!sourcePoints.length) {
return `
<div class="rounded-lg border border-slate-800 bg-slate-950/70 p-3">
<div class="flex items-center justify-between gap-3">
<div>
<div class="text-xs text-slate-400">${escapeHtml(title)}</div>
<div class="text-sm text-slate-200 font-semibold">No data yet</div>
</div>
</div>
</div>
`;
}
const width = 640;
const height = 220;
const margin = { top: 20, right: 18, bottom: 34, left: 44 };
const values = sourcePoints.map((point) => Number(point.value));
const minValue = Math.min(...values);
const maxValue = Math.max(...values);
const span = maxValue - minValue || 1;
const chartWidth = width - margin.left - margin.right;
const chartHeight = height - margin.top - margin.bottom;
const xStep = sourcePoints.length > 1 ? chartWidth / (sourcePoints.length - 1) : 0;
const coords = sourcePoints.map((point, index) => ({
x: margin.left + (index * xStep),
y: margin.top + ((maxValue - Number(point.value)) / span) * chartHeight,
}));
const linePath = coords.map((point, index) => `${index === 0 ? "M" : "L"} ${point.x} ${point.y}`).join(" ");
const baseline = height - margin.bottom;
const midIndex = Math.floor(sourcePoints.length / 2);
const xLabels = [
{ index: 0, label: sourcePoints[0].label },
{ index: midIndex, label: sourcePoints[midIndex].label },
{ index: sourcePoints.length - 1, label: sourcePoints[sourcePoints.length - 1].label },
].filter((item, index, array) => item.label && array.findIndex((candidate) => candidate.index === item.index) === index);
const minLabel = options.formatValue ? options.formatValue(minValue) : formatNumber(minValue, 2);
const maxLabel = options.formatValue ? options.formatValue(maxValue) : formatNumber(maxValue, 2);
const latest = sourcePoints[sourcePoints.length - 1];
const latestValue = options.formatValue ? options.formatValue(latest.value) : formatNumber(latest.value, 2);
return `
<div class="rounded-lg border border-slate-800 bg-slate-950/70 p-3 space-y-2">
<div class="flex items-center justify-between gap-3">
<div>
<div class="text-xs text-slate-400">${escapeHtml(title)}</div>
<div class="text-sm text-slate-200 font-semibold">${escapeHtml(latestValue)}${valueLabel ? ` <span class="text-slate-500 font-normal">${escapeHtml(valueLabel)}</span>` : ""}</div>
</div>
<div class="text-[11px] text-slate-400 text-right">
<div>${escapeHtml(sourcePoints.length)} points</div>
<div>${escapeHtml(minLabel)} - ${escapeHtml(maxLabel)}</div>
</div>
</div>
<svg viewBox="0 0 ${width} ${height}" class="w-full h-56">
${Array.from({ length: 4 }, (_, idx) => {
const y = margin.top + (chartHeight / 3) * idx;
return `<line x1="${margin.left}" y1="${y}" x2="${width - margin.right}" y2="${y}" stroke="rgba(51, 65, 85, 0.7)" stroke-width="1" />`;
}).join("")}
<line x1="${margin.left}" y1="${baseline}" x2="${width - margin.right}" y2="${baseline}" stroke="rgba(71, 85, 105, 0.8)" stroke-width="1.5" />
<path d="${linePath}" fill="none" stroke="${color}" stroke-width="3" stroke-linecap="round" stroke-linejoin="round" />
${coords.map((point) => `
<circle cx="${point.x}" cy="${point.y}" r="4.5" fill="${color}" />
`).join("")}
<text x="${margin.left - 8}" y="${margin.top + 4}" text-anchor="end" class="fill-slate-400 text-[10px]">${escapeHtml(maxLabel)}</text>
<text x="${margin.left - 8}" y="${baseline}" text-anchor="end" class="fill-slate-400 text-[10px]">${escapeHtml(minLabel)}</text>
${xLabels.map((item) => `
<text x="${coords[item.index].x}" y="${height - 10}" text-anchor="middle" class="fill-slate-500 text-[10px]">${escapeHtml(formatDateLabel(item.label))}</text>
`).join("")}
</svg>
</div>
`;
}
function renderAnalytics(payload) {
const analytics = payload || {};
const categories = Array.isArray(analytics.by_category) ? analytics.by_category : [];
const timeline = Array.isArray(analytics.timeline) ? analytics.timeline : [];
const finishedCategories = categories.filter((row) => Number(row.finished_jobs || 0) > 0);
if (analyticsMetaEl) {
analyticsMetaEl.textContent = analytics.generated_at
? `Updated ${new Date(analytics.generated_at).toLocaleString()}`
: "Historical snapshot";
}
analyticsSummaryEl.innerHTML = [
renderMetricCard("Finished Jobs", analytics.finished_jobs || 0),
renderMetricCard("Success Rate", formatPercent(analytics.success_rate)),
renderMetricCard("Avg Steps", formatNumber(analytics.avg_steps, 1)),
renderMetricCard("Avg Cost", formatCurrency(analytics.avg_cost_usd)),
].join("");
analyticsCategorySummaryEl.textContent = finishedCategories.length
? `${finishedCategories.length} categories`
: "No finished jobs yet";
if (finishedCategories.length) {
analyticsCategoriesEl.innerHTML = finishedCategories.map((row) => {
const successRate = Number(row.success_rate || 0);
const completed = Number(row.completed_jobs || 0);
const finished = Number(row.finished_jobs || 0);
const total = Number(row.total_jobs || 0);
const avgSteps = row.avg_steps == null ? "—" : formatNumber(row.avg_steps, 1);
const avgCost = row.avg_cost_usd == null ? "—" : formatCurrency(row.avg_cost_usd);
return `
<div class="rounded-lg border border-slate-800 bg-slate-950/70 p-3 space-y-2">
<div class="flex items-start justify-between gap-3">
<div>
<div class="font-medium">${escapeHtml(row.label || "Other")}</div>
<div class="text-[11px] text-slate-400">${finished} finished · ${completed} completed · ${total} total</div>
</div>
<div class="text-right">
<div class="text-base font-semibold">${formatPercent(successRate)}</div>
<div class="text-[11px] text-slate-500">success rate</div>
</div>
</div>
<div class="h-2 rounded bg-slate-800 overflow-hidden">
<div class="h-full rounded bg-cyan-400" style="width: ${Math.max(0, Math.min(successRate, 100))}%"></div>
</div>
<div class="grid grid-cols-2 gap-2 text-[11px] text-slate-300">
<div>Avg steps: ${escapeHtml(avgSteps)}</div>
<div>Avg cost: ${escapeHtml(avgCost)}</div>
</div>
</div>
`;
}).join("");
} else {
analyticsCategoriesEl.innerHTML = `
<div class="rounded-lg border border-dashed border-slate-800 bg-slate-950/70 p-4 text-sm text-slate-400">
No finished jobs yet.
</div>
`;
}
analyticsTrendSummaryEl.textContent = timeline.length ? `${timeline.length} days` : "No daily data yet";
analyticsTrendsEl.innerHTML = [
renderLineChart("Average steps per day", timeline.map((row) => ({ label: row.label, value: row.avg_steps })), { color: "#38bdf8" }),
renderLineChart("Average cost per day", timeline.map((row) => ({ label: row.label, value: row.avg_cost_usd })), {
color: "#34d399",
valueLabel: "USD",
formatValue: (value) => formatCurrency(value),
}),
].join("");
}
function renderJobs() {
jobListEl.innerHTML = state.jobs.map((job) => {
const active = job.job_id === state.selectedJobId;
return `
<button data-job-id="${job.job_id}" class="w-full text-left p-3 rounded border ${active ? "border-cyan-400 bg-slate-800" : "border-slate-800 bg-slate-950"} hover:bg-slate-800">
<div class="flex items-center justify-between">
<span class="font-medium">${job.job_id}</span>
<span class="text-xs px-2 py-0.5 rounded bg-slate-700">${job.status}</span>
</div>
<div class="text-xs text-slate-400 mt-1">${job.model}</div>
<div class="text-xs text-slate-300 mt-1 line-clamp-2">${job.objective}</div>
<div class="text-xs text-slate-500 mt-1">$${Number((job.usage && job.usage.estimated_cost_usd) || 0).toFixed(6)}</div>
</button>
`;
}).join("");
for (const btn of jobListEl.querySelectorAll("button[data-job-id]")) {
btn.addEventListener("click", () => {
state.selectedJobId = btn.getAttribute("data-job-id");
renderJobs();
refreshJobDetail();
});
}
}
function pushEventLine(obj) {
if (!obj || !obj.job_id || !obj.event_type) return;
const line = document.createElement("div");
const ts = obj.ts || "-";
const step = (obj.step ?? "-");
if (state.eventsViewMode === "raw") {
line.className = "border-b border-slate-800 pb-1";
line.textContent = `[${ts}] ${obj.job_id} step=${step} ${obj.event_type} ${JSON.stringify(obj.payload || {})}`;
} else {
const typeColors = {
info: "bg-sky-900/50 text-sky-200 border border-sky-800",
warning: "bg-amber-900/40 text-amber-200 border border-amber-800",
error: "bg-rose-900/40 text-rose-200 border border-rose-800",
visual_update: "bg-emerald-900/40 text-emerald-200 border border-emerald-800",
tool_call: "bg-violet-900/40 text-violet-200 border border-violet-800",
tool_result: "bg-indigo-900/40 text-indigo-200 border border-indigo-800"
};
const dt = new Date(ts);
const tsText = Number.isNaN(dt.getTime()) ? ts : dt.toLocaleString();
const payload = obj.payload || {};
line.className = "rounded-lg border border-slate-800 bg-slate-900/80 p-2 space-y-2";
const header = document.createElement("div");
header.className = "flex flex-wrap items-center gap-2";
const typePill = document.createElement("span");
typePill.className = `px-2 py-0.5 rounded text-[10px] font-semibold ${typeColors[obj.event_type] || "bg-slate-800 text-slate-200 border border-slate-700"}`;
typePill.textContent = obj.event_type;
const stepPill = document.createElement("span");
stepPill.className = "px-2 py-0.5 rounded text-[10px] bg-slate-800 text-slate-300 border border-slate-700";
stepPill.textContent = `step ${step}`;
const tsSpan = document.createElement("span");
tsSpan.className = "text-[10px] text-slate-400";
tsSpan.textContent = tsText;
header.appendChild(typePill);
header.appendChild(stepPill);
header.appendChild(tsSpan);
const jobLine = document.createElement("div");
jobLine.className = "text-[11px] text-slate-300 font-medium";
jobLine.textContent = obj.job_id;
const body = document.createElement("pre");
body.className = "bg-slate-950 border border-slate-800 rounded p-2 text-[11px] text-slate-200 overflow-auto";
body.textContent = JSON.stringify(payload, null, 2);
line.appendChild(header);
line.appendChild(jobLine);
line.appendChild(body);
}
eventsEl.prepend(line);
while (eventsEl.childNodes.length > 400) {
eventsEl.removeChild(eventsEl.lastChild);
}
}
function clearReplayTimer() {
if (state.replay.timer) {
clearTimeout(state.replay.timer);
state.replay.timer = null;
}
}
function stopReplay() {
state.replay.isPlaying = false;
clearReplayTimer();
replayPlayBtn.textContent = "Play";
}
function replayImageSrc(path) {
const q = encodeURIComponent(path || "");
return `/api/jobs/${state.selectedJobId}/artifact?path=${q}&token=${encodeURIComponent(state.token)}`;
}
function renderReplayOverlay(frame) {
replayOverlayEl.innerHTML = "";
const size = frame && frame.screen_size;
if (!frame || !frame.is_fullscreen || !size || !size.width || !size.height) {
replayOverlayEl.removeAttribute("viewBox");
return;
}
replayOverlayEl.setAttribute("viewBox", `0 0 ${size.width} ${size.height}`);
const overlayEvents = Array.isArray(frame.overlays) ? frame.overlays : [];
const points = overlayEvents.filter((ev) => ev && ev.kind === "tool_result" && ev.tool === "click" && ev.click);
for (const ev of points) {
const x = Number(ev.click.x);
const y = Number(ev.click.y);
if (!Number.isFinite(x) || !Number.isFinite(y)) continue;
const halo = document.createElementNS("http://www.w3.org/2000/svg", "circle");
halo.setAttribute("cx", String(x));
halo.setAttribute("cy", String(y));
halo.setAttribute("r", "14");
halo.setAttribute("fill", "rgba(14, 165, 233, 0.22)");
halo.setAttribute("stroke", "#38bdf8");
halo.setAttribute("stroke-width", "2");
const dot = document.createElementNS("http://www.w3.org/2000/svg", "circle");
dot.setAttribute("cx", String(x));
dot.setAttribute("cy", String(y));
dot.setAttribute("r", "4");
dot.setAttribute("fill", "#38bdf8");
replayOverlayEl.appendChild(halo);
replayOverlayEl.appendChild(dot);
}
}
function renderReplayFrameEvents(frame) {
replayFrameEventsEl.innerHTML = "";
if (!frame) return;
const events = Array.isArray(frame.overlays) ? frame.overlays : [];
const shown = events.slice(-8);
for (const ev of shown) {
const row = document.createElement("div");
row.className = "text-[11px] rounded border border-slate-800 bg-slate-900/80 px-2 py-1";
row.textContent = ev.label || `${ev.kind || "event"} ${ev.tool || ""}`.trim();
replayFrameEventsEl.appendChild(row);
}
if (!shown.length) {
const empty = document.createElement("div");
empty.className = "text-[11px] text-slate-500";
empty.textContent = "No overlay events for this frame.";
replayFrameEventsEl.appendChild(empty);
}
}
function setReplayFrame(index) {
const frames = state.replay.frames;
if (!frames.length) {
replayVisualEl.removeAttribute("src");
replayOverlayEl.innerHTML = "";
replayFrameMetaEl.textContent = "No replay frames.";
replaySeekEl.value = "0";
replaySeekEl.max = "0";
replayStatusEl.textContent = "No replay loaded.";
return;
}
const bounded = Math.max(0, Math.min(index, frames.length - 1));
state.replay.frameIndex = bounded;
const frame = frames[bounded];
replayVisualEl.src = replayImageSrc(frame.image_path);
replayFrameMetaEl.textContent = `Frame ${bounded + 1}/${frames.length} | step ${frame.step} | ${frame.kind} | ${frame.ts}`;
replaySeekEl.max = String(Math.max(0, frames.length - 1));
replaySeekEl.value = String(bounded);
replayStatusEl.textContent = state.replay.isPlaying ? "Playing replay." : "Replay ready.";
renderReplayOverlay(frame);
renderReplayFrameEvents(frame);
}
function advanceReplay() {
const frames = state.replay.frames;
if (!state.replay.isPlaying || !frames.length) return;
if (state.replay.frameIndex >= frames.length - 1) {
stopReplay();
setReplayFrame(frames.length - 1);
replayStatusEl.textContent = "Replay finished.";
return;
}
setReplayFrame(state.replay.frameIndex + 1);
clearReplayTimer();
const delayMs = Math.max(120, Math.round(700 / (state.replay.speed || 1)));
state.replay.timer = setTimeout(advanceReplay, delayMs);
}
function toggleReplayPlay() {
if (!state.replay.frames.length) return;
if (state.replay.isPlaying) {
stopReplay();
setReplayFrame(state.replay.frameIndex);
return;
}
state.replay.isPlaying = true;
replayPlayBtn.textContent = "Pause";
replayStatusEl.textContent = "Playing replay.";
advanceReplay();
}
function resetReplay(payload) {
stopReplay();
const replayPayload = payload || {};
state.replay.frames = Array.isArray(replayPayload.frames) ? replayPayload.frames : [];
state.replay.trailingEvents = Array.isArray(replayPayload.trailing_events) ? replayPayload.trailing_events : [];
state.replay.frameIndex = 0;
setReplayFrame(0);
}
function scheduleWsReconnect() {
if (state.wsReconnectTimer || !state.token) return;
state.wsReconnectTimer = setTimeout(() => {
state.wsReconnectTimer = null;
connectWs();
}, 1200);
}
function updateLatestVisualFromEvent(ev) {
if (!ev || ev.event_type !== "visual_update") return;
if (!state.selectedJobId || ev.job_id !== state.selectedJobId) return;
const imagePath = ev.payload && ev.payload.image_meta && ev.payload.image_meta.path;
if (!imagePath) return;
const q = encodeURIComponent(imagePath);
latestVisualEl.src = `/api/jobs/${state.selectedJobId}/artifact?path=${q}&token=${encodeURIComponent(state.token)}`;
}
async function refreshJobs() {
const payload = await api("/api/jobs?limit=100");
state.jobs = payload.jobs || [];
if (!state.selectedJobId && state.jobs.length > 0) state.selectedJobId = state.jobs[0].job_id;
renderJobs();
}
async function refreshStats() {
const payload = await api("/api/stats");
renderStats(payload);
}
async function refreshAnalytics() {
const payload = await api("/api/analytics");
renderAnalytics(payload);
}
async function refreshJobDetail() {
if (!state.selectedJobId) return;
const [job, events, replay] = await Promise.all([
api(`/api/jobs/${state.selectedJobId}`),
api(`/api/jobs/${state.selectedJobId}/events?limit=120`),
api(`/api/jobs/${state.selectedJobId}/replay?limit=5000`)
]);
jobDetailEl.textContent = JSON.stringify(job, null, 2);
eventsEl.innerHTML = "";
const list = (events.events || []).slice().reverse();
for (const ev of list) pushEventLine(ev);
const visual = list.find((ev) => ev.event_type === "visual_update");
if (visual) updateLatestVisualFromEvent(visual);
resetReplay(replay);
}
function connectWs() {
if (!state.token) return;
if (state.ws && (state.ws.readyState === WebSocket.OPEN || state.ws.readyState === WebSocket.CONNECTING)) {
return;
}
const scheme = location.protocol === "https:" ? "wss" : "ws";
const ws = new WebSocket(`${scheme}://${location.host}/ws?token=${encodeURIComponent(state.token)}`);
state.ws = ws;
ws.onmessage = async (event) => {
try {
const payload = JSON.parse(event.data);
if (!payload || payload.event_type === "connected") return;
pushEventLine(payload);
updateLatestVisualFromEvent(payload);
if (!state.selectedJobId || payload.job_id === state.selectedJobId) {
await refreshJobDetail();
}
await refreshJobs();
await refreshStats();
if (analyticsRefreshEvents.has(payload.event_type)) {
await refreshAnalytics();
}
} catch (err) {
console.error(err);
}
};
ws.onclose = () => {
if (state.ws === ws) state.ws = null;
if (manuallyClosedSockets.has(ws)) {
manuallyClosedSockets.delete(ws);
return;
}
scheduleWsReconnect();
};
}
async function fullRefresh() {
await refreshJobs();
await refreshStats();
await refreshAnalytics();
await refreshJobDetail();
}
async function connect() {
state.token = tokenInput.value.trim();
localStorage.setItem("screenjob_token", state.token);
if (state.ws) {
manuallyClosedSockets.add(state.ws);
try { state.ws.close(); } catch (_) {}
state.ws = null;
}
if (state.wsReconnectTimer) {
clearTimeout(state.wsReconnectTimer);
state.wsReconnectTimer = null;
}
await fullRefresh();
connectWs();
}
function syncEventsViewToggle() {
eventsViewToggle.checked = state.eventsViewMode === "beautiful";
}
saveTokenBtn.addEventListener("click", () => connect().catch((err) => alert(err.message)));
refreshBtn.addEventListener("click", () => fullRefresh().catch((err) => alert(err.message)));
eventsViewToggle.addEventListener("change", () => {
state.eventsViewMode = eventsViewToggle.checked ? "beautiful" : "raw";
localStorage.setItem("screenjob_events_view_mode", state.eventsViewMode);
refreshJobDetail().catch((err) => alert(err.message));
});
replayPlayBtn.addEventListener("click", () => toggleReplayPlay());
replayPrevBtn.addEventListener("click", () => {
stopReplay();
setReplayFrame(state.replay.frameIndex - 1);
});
replayNextBtn.addEventListener("click", () => {
stopReplay();
setReplayFrame(state.replay.frameIndex + 1);
});
replaySpeedEl.addEventListener("change", () => {
const speed = Number(replaySpeedEl.value);
state.replay.speed = Number.isFinite(speed) && speed > 0 ? speed : 1;
if (state.replay.isPlaying) {
clearReplayTimer();
advanceReplay();
}
});
replaySeekEl.addEventListener("input", () => {
stopReplay();
setReplayFrame(Number(replaySeekEl.value || 0));
});
syncEventsViewToggle();
resetReplay(null);
if (state.token) connect().catch(() => {});

View File

@@ -15,10 +15,76 @@ function Test-EnvVarLine {
return [bool](Select-String -Path $FilePath -Pattern ("^\s*" + [regex]::Escape($Name) + "=") -Quiet) return [bool](Select-String -Path $FilePath -Pattern ("^\s*" + [regex]::Escape($Name) + "=") -Quiet)
} }
if (-not (Get-Command python -ErrorAction SilentlyContinue)) { function Resolve-PythonExecutable {
throw "Python was not found in PATH. Install Python 3.11+ and retry." $venvPython = Join-Path $scriptDir ".venv\Scripts\python.exe"
if (Test-Path -LiteralPath $venvPython) {
return $venvPython
}
$pythonCmd = Get-Command python -ErrorAction SilentlyContinue
if ($null -ne $pythonCmd -and (Test-Path -LiteralPath $pythonCmd.Source)) {
return $pythonCmd.Source
}
$candidatePyLaunchers = @()
$pyFromPath = Get-Command py -ErrorAction SilentlyContinue
if ($null -ne $pyFromPath -and (Test-Path -LiteralPath $pyFromPath.Source)) {
$candidatePyLaunchers += $pyFromPath.Source
}
$candidatePyLaunchers += "C:\Windows\py.exe"
if ($scriptDir -match "^[A-Za-z]:\\Users\\[^\\]+") {
$repoUserHome = $Matches[0]
$candidatePyLaunchers += (Join-Path $repoUserHome "AppData\Local\Programs\Python\Launcher\py.exe")
}
foreach ($pyLauncher in ($candidatePyLaunchers | Select-Object -Unique)) {
if (-not (Test-Path -LiteralPath $pyLauncher)) {
continue
}
try {
$resolved = (& $pyLauncher -3 -c "import sys; print(sys.executable)" 2>$null | Select-Object -Last 1).Trim()
if ($resolved -and (Test-Path -LiteralPath $resolved)) {
return $resolved
}
} catch {
continue
}
}
$candidatePythonPaths = @()
if ($scriptDir -match "^[A-Za-z]:\\Users\\[^\\]+") {
$repoUserHome = $Matches[0]
$pythonBase = Join-Path $repoUserHome "AppData\Local\Programs\Python"
if (Test-Path -LiteralPath $pythonBase) {
$candidatePythonPaths += (Get-ChildItem -LiteralPath $pythonBase -Directory -ErrorAction SilentlyContinue |
Sort-Object Name -Descending |
ForEach-Object { Join-Path $_.FullName "python.exe" })
}
}
$candidatePythonPaths += @(
"C:\Python314\python.exe",
"C:\Python313\python.exe",
"C:\Python312\python.exe",
"C:\Python311\python.exe",
"C:\Program Files\Python314\python.exe",
"C:\Program Files\Python313\python.exe",
"C:\Program Files\Python312\python.exe",
"C:\Program Files\Python311\python.exe"
)
foreach ($candidate in ($candidatePythonPaths | Select-Object -Unique)) {
if (Test-Path -LiteralPath $candidate) {
return $candidate
}
}
throw "Python was not found. Install Python 3.11+ system-wide, or create .venv in the repo root."
} }
$pythonExe = Resolve-PythonExecutable
$envFile = Join-Path $scriptDir ".env" $envFile = Join-Path $scriptDir ".env"
if (-not (Test-Path -LiteralPath $envFile)) { if (-not (Test-Path -LiteralPath $envFile)) {
Write-Warning ".env was not found at $envFile. Server startup may fail if required vars are missing." Write-Warning ".env was not found at $envFile. Server startup may fail if required vars are missing."
@@ -31,5 +97,5 @@ if (-not (Test-Path -LiteralPath $envFile)) {
} }
} }
Write-Host "Starting ScreenJob backend on configured host/port..." -ForegroundColor Cyan Write-Host "Starting ScreenJob backend with Python: $pythonExe" -ForegroundColor Cyan
python main.py server & $pythonExe main.py server

View File

@@ -0,0 +1,11 @@
Option Explicit
Dim shell, fso, scriptDir, psScript, command
Set shell = CreateObject("WScript.Shell")
Set fso = CreateObject("Scripting.FileSystemObject")
scriptDir = fso.GetParentFolderName(WScript.ScriptFullName)
psScript = """" & fso.BuildPath(scriptDir, "screenjob_tray.ps1") & """"
command = "powershell.exe -NoProfile -ExecutionPolicy Bypass -WindowStyle Hidden -STA -File " & psScript
shell.Run command, 0, False

View File

@@ -91,6 +91,41 @@ def test_click_supports_directional_offsets(tmp_path: Path, monkeypatch) -> None
assert click_result["clicked"] == {"x": 110, "y": 102} assert click_result["clicked"] == {"x": 110, "y": 102}
def test_enhance_defaults_to_small_ui_preset(tmp_path: Path, monkeypatch) -> None:
agent = _build_agent(tmp_path, monkeypatch)
result = agent._tool_enhance({"coordinate": {"x": 100, "y": 120}})
assert result["ok"] is True
meta = result["meta"]
assert meta["region"] == "small"
assert meta["mode"] == "ui"
assert meta["scale"] == 4
assert Path(meta["path"]).exists()
assert meta["target_pixel"]["x"] >= 0
assert meta["target_pixel"]["y"] >= 0
def test_enhance_supports_text_mode_and_scale_clamp(tmp_path: Path, monkeypatch) -> None:
agent = _build_agent(tmp_path, monkeypatch)
result = agent._tool_enhance(
{
"coordinate": {"x": -99, "y": 9999},
"region": "medium",
"mode": "text",
"scale": 99,
}
)
assert result["ok"] is True
meta = result["meta"]
assert meta["region"] == "medium"
assert meta["mode"] == "text"
assert meta["scale"] == 6
assert meta["requested_coord"] == {"x": -99, "y": 9999}
assert meta["source_coord"] == {"x": 0, "y": 719}
assert Path(meta["path"]).exists()
def test_press_key_supports_hotkey_combo(tmp_path: Path, monkeypatch) -> None: def test_press_key_supports_hotkey_combo(tmp_path: Path, monkeypatch) -> None:
agent = _build_agent(tmp_path, monkeypatch) agent = _build_agent(tmp_path, monkeypatch)
result = agent._tool_press_key({"key": "meta+r"}) result = agent._tool_press_key({"key": "meta+r"})
@@ -98,3 +133,21 @@ def test_press_key_supports_hotkey_combo(tmp_path: Path, monkeypatch) -> None:
assert result["key"] == "win+r" assert result["key"] == "win+r"
assert result["message"] == "Key combo executed." assert result["message"] == "Key combo executed."
assert agent_module.pyautogui.last_hotkey == ("win", "r") assert agent_module.pyautogui.last_hotkey == ("win", "r")
def test_context_compaction_trigger_and_payload(tmp_path: Path, monkeypatch) -> None:
agent = _build_agent(tmp_path, monkeypatch)
agent.objective = "Open settings app"
agent.previous_response_id = "resp_123"
agent.step = 4
agent.last_context_compact_step = 0
agent.options.screen_context_decay_steps = 4
agent.recent_tool_summaries = ["step=1 tool=see_screen status=ok"]
agent.last_screen_data_url = "data:image/png;base64,abc"
agent.last_screen_meta = {"width": 1280, "height": 720, "path": "C:/tmp/frame.png"}
assert agent._should_compact_context() is True
compacted = agent._build_compacted_pending_input()
assert len(compacted) == 2
assert "Context compaction activated" in compacted[0]["content"][0]["text"]
assert "Open settings app" in compacted[0]["content"][0]["text"]

View File

@@ -29,7 +29,10 @@ def test_cli_emits_structured_return_and_data(monkeypatch: Any, capsys, tmp_path
def fake_assess_task_safety(*_args, **_kwargs): def fake_assess_task_safety(*_args, **_kwargs):
return True, "safe", {"safe": True} return True, "safe", {"safe": True}
captured_kwargs: dict[str, Any] = {}
def fake_run_job(*_args, **_kwargs): def fake_run_job(*_args, **_kwargs):
captured_kwargs.update(_kwargs)
result = AgentResult( result = AgentResult(
completed=True, completed=True,
result="Done", result="Done",
@@ -66,3 +69,5 @@ def test_cli_emits_structured_return_and_data(monkeypatch: Any, capsys, tmp_path
assert payload["response"]["data"] == "file1.txt\nfile2.txt" assert payload["response"]["data"] == "file1.txt\nfile2.txt"
assert payload["return"] == "Task completed successfully" assert payload["return"] == "Task completed successfully"
assert payload["data"] == "file1.txt\nfile2.txt" assert payload["data"] == "file1.txt\nfile2.txt"
assert captured_kwargs["options"].reasoning_effort == "medium"
assert captured_kwargs["options"].screen_context_decay_steps == 4

View File

@@ -9,6 +9,24 @@ import src.server as server_module
from src.config import AppConfig from src.config import AppConfig
_TERMINAL_STATUSES = {"completed", "failed", "cancelled"}
def _objective_category(objective: str) -> str:
text = objective.lower()
if any(keyword in text for keyword in ("browser", "website", "amazon", "google", "login", "shopping", "checkout", "orders")):
return "Browser / web"
if any(keyword in text for keyword in ("file", "folder", "directory", "terminal", "shell", "command", "cli", "script", "git", "repo", "install", "pip", "npm")):
return "Files / terminal"
if any(keyword in text for keyword in ("write", "summary", "document", "docs", "report", "email", "message", "readme", "markdown")):
return "Writing / docs"
if any(keyword in text for keyword in ("data", "analysis", "csv", "spreadsheet", "sheet", "table", "chart", "dashboard", "metric", "sql")):
return "Data / analysis"
if any(keyword in text for keyword in ("code", "bug", "fix", "test", "debug", "api", "backend", "frontend", "database", "deploy", "docker", "service", "build")):
return "Development / ops"
return "Other"
class FakeJobManager: class FakeJobManager:
def __init__(self, *, config: AppConfig, db: Any, broadcast: Any = None) -> None: def __init__(self, *, config: AppConfig, db: Any, broadcast: Any = None) -> None:
self.config = config self.config = config
@@ -26,6 +44,8 @@ class FakeJobManager:
command_timeout: int = 45, command_timeout: int = 45,
type_interval: float = 0.02, type_interval: float = 0.02,
click_pause: float = 0.10, click_pause: float = 0.10,
reasoning_effort: str = "medium",
screen_context_decay_steps: int = 4,
disabled_tools: list[str] | None = None, disabled_tools: list[str] | None = None,
safety_override: bool = False, safety_override: bool = False,
no_failsafe: bool = False, no_failsafe: bool = False,
@@ -33,6 +53,11 @@ class FakeJobManager:
self._counter += 1 self._counter += 1
job_id = f"job_fake_{self._counter:03d}" job_id = f"job_fake_{self._counter:03d}"
selected_model = (model or self.config.default_model).strip() selected_model = (model or self.config.default_model).strip()
artifacts_dir = (self.config.runs_dir / f"run_{job_id}").resolve()
artifacts_dir.mkdir(parents=True, exist_ok=True)
screenshot_path = artifacts_dir / "screen_step_001.png"
screenshot_path.write_bytes(b"not-a-real-png")
created_at = f"2026-05-27T00:00:{self._counter:02d}Z"
self.last_submit_payload = { self.last_submit_payload = {
"objective": objective, "objective": objective,
"model": selected_model, "model": selected_model,
@@ -42,6 +67,8 @@ class FakeJobManager:
"command_timeout": command_timeout, "command_timeout": command_timeout,
"type_interval": type_interval, "type_interval": type_interval,
"click_pause": click_pause, "click_pause": click_pause,
"reasoning_effort": reasoning_effort,
"screen_context_decay_steps": screen_context_decay_steps,
"no_failsafe": no_failsafe, "no_failsafe": no_failsafe,
} }
self._jobs[job_id] = { self._jobs[job_id] = {
@@ -49,6 +76,10 @@ class FakeJobManager:
"objective": objective, "objective": objective,
"model": selected_model, "model": selected_model,
"status": "running", "status": "running",
"created_at": created_at,
"started_at": created_at,
"ended_at": None,
"steps": 1,
"result": "Running", "result": "Running",
"response": {"return": "Running", "data": None}, "response": {"return": "Running", "data": None},
"return": "Running", "return": "Running",
@@ -61,7 +92,7 @@ class FakeJobManager:
"total_tokens": 14, "total_tokens": 14,
"estimated_cost_usd": 0.0001, "estimated_cost_usd": 0.0001,
}, },
"artifacts_dir": str(self.config.runs_dir.resolve()), "artifacts_dir": str(artifacts_dir),
} }
self._events[job_id] = [ self._events[job_id] = [
{ {
@@ -70,7 +101,47 @@ class FakeJobManager:
"ts": "2026-05-27T00:00:00Z", "ts": "2026-05-27T00:00:00Z",
"step": 1, "step": 1,
"event_type": "tool_called", "event_type": "tool_called",
"payload": {"tool": "execute_command"}, "payload": {"tool": "click", "args": {"coordinate": {"x": 320, "y": 180}}},
},
{
"id": 2,
"job_id": job_id,
"ts": "2026-05-27T00:00:01Z",
"step": 1,
"event_type": "tool_result",
"payload": {"tool": "click", "result": {"ok": True, "clicked": {"x": 322, "y": 182}}},
},
{
"id": 3,
"job_id": job_id,
"ts": "2026-05-27T00:00:02Z",
"step": 1,
"event_type": "tool_called",
"payload": {"tool": "type", "args": {"text": "hello world"}},
},
{
"id": 4,
"job_id": job_id,
"ts": "2026-05-27T00:00:03Z",
"step": 1,
"event_type": "tool_result",
"payload": {"tool": "type", "result": {"ok": True, "typed_length": 11}},
},
{
"id": 5,
"job_id": job_id,
"ts": "2026-05-27T00:00:04Z",
"step": 1,
"event_type": "visual_update",
"payload": {
"kind": "see_screen",
"image_meta": {
"path": str(screenshot_path),
"width": 1920,
"height": 1080,
"grid": True,
},
},
} }
] ]
return job_id return job_id
@@ -101,6 +172,114 @@ class FakeJobManager:
"live_running_threads": 0, "live_running_threads": 0,
} }
def analytics(self) -> dict[str, Any]:
by_category: dict[str, dict[str, Any]] = {}
by_day: dict[str, dict[str, Any]] = {}
def bucket(target: dict[str, dict[str, Any]], key: str) -> dict[str, Any]:
return target.setdefault(
key,
{
"label": key,
"total_jobs": 0,
"finished_jobs": 0,
"completed_jobs": 0,
"failed_jobs": 0,
"cancelled_jobs": 0,
"steps_sum": 0,
"steps_count": 0,
"cost_sum": 0.0,
"cost_count": 0,
},
)
total_jobs = 0
finished_jobs = 0
completed_jobs = 0
failed_jobs = 0
cancelled_jobs = 0
steps_sum = 0
steps_count = 0
cost_sum = 0.0
cost_count = 0
for job in self._jobs.values():
total_jobs += 1
status = str(job.get("status") or "")
finished = status in _TERMINAL_STATUSES
category = _objective_category(str(job.get("objective") or ""))
day = str(job.get("created_at") or "")[:10] or "unknown"
category_bucket = bucket(by_category, category)
day_bucket = bucket(by_day, day)
for item in (category_bucket, day_bucket):
item["total_jobs"] += 1
if not finished:
continue
finished_jobs += 1
if status == "completed":
completed_jobs += 1
elif status == "failed":
failed_jobs += 1
elif status == "cancelled":
cancelled_jobs += 1
steps_raw = job.get("steps")
if steps_raw is not None:
steps = int(steps_raw)
steps_sum += steps
steps_count += 1
for item in (category_bucket, day_bucket):
item["steps_sum"] += steps
item["steps_count"] += 1
estimated_cost_raw = (job.get("usage") or {}).get("estimated_cost_usd")
if estimated_cost_raw is not None:
estimated_cost = float(estimated_cost_raw)
cost_sum += estimated_cost
cost_count += 1
for item in (category_bucket, day_bucket):
item["cost_sum"] += estimated_cost
item["cost_count"] += 1
for item in (category_bucket, day_bucket):
item["finished_jobs"] += 1
if status == "completed":
item["completed_jobs"] += 1
elif status == "failed":
item["failed_jobs"] += 1
elif status == "cancelled":
item["cancelled_jobs"] += 1
def finalize(item: dict[str, Any]) -> dict[str, Any]:
finished = item["finished_jobs"]
return {
"label": item["label"],
"total_jobs": item["total_jobs"],
"finished_jobs": finished,
"completed_jobs": item["completed_jobs"],
"failed_jobs": item["failed_jobs"],
"cancelled_jobs": item["cancelled_jobs"],
"success_rate": round((item["completed_jobs"] / finished) * 100, 2) if finished else 0.0,
"avg_steps": round(item["steps_sum"] / item["steps_count"], 2) if item["steps_count"] else None,
"avg_cost_usd": round(item["cost_sum"] / item["cost_count"], 6) if item["cost_count"] else None,
}
return {
"total_jobs": total_jobs,
"finished_jobs": finished_jobs,
"completed_jobs": completed_jobs,
"failed_jobs": failed_jobs,
"cancelled_jobs": cancelled_jobs,
"success_rate": round((completed_jobs / finished_jobs) * 100, 2) if finished_jobs else 0.0,
"avg_steps": round(steps_sum / steps_count, 2) if steps_count else None,
"avg_cost_usd": round(cost_sum / cost_count, 6) if cost_count else None,
"by_category": sorted((finalize(item) for item in by_category.values()), key=lambda item: (-item["success_rate"], item["label"])),
"timeline": sorted((finalize(item) for item in by_day.values()), key=lambda item: item["label"]),
}
def _build_app(tmp_path: Path, monkeypatch: Any, disable_ui: bool = False): def _build_app(tmp_path: Path, monkeypatch: Any, disable_ui: bool = False):
monkeypatch.setattr(server_module, "JobManager", FakeJobManager) monkeypatch.setattr(server_module, "JobManager", FakeJobManager)
@@ -145,6 +324,8 @@ def test_create_job_returns_only_job_id_and_defaults_model(tmp_path: Path, monke
manager = app.state.manager manager = app.state.manager
assert manager.last_submit_payload["model"] == "gpt-5.4-mini" assert manager.last_submit_payload["model"] == "gpt-5.4-mini"
assert manager.last_submit_payload["disabled_tools"] == ["click"] assert manager.last_submit_payload["disabled_tools"] == ["click"]
assert manager.last_submit_payload["reasoning_effort"] == "medium"
assert manager.last_submit_payload["screen_context_decay_steps"] == 4
status_res = client.get(f"/api/jobs/{job_id}/status", headers=headers) status_res = client.get(f"/api/jobs/{job_id}/status", headers=headers)
assert status_res.status_code == 200 assert status_res.status_code == 200
@@ -174,12 +355,122 @@ def test_cancel_endpoint_and_events(tmp_path: Path, monkeypatch: Any) -> None:
assert status_after["data"] is None assert status_after["data"] is None
def test_replay_endpoint_builds_frames_and_overlays(tmp_path: Path, monkeypatch: Any) -> None:
app, _ = _build_app(tmp_path, monkeypatch, disable_ui=False)
client = TestClient(app)
headers = {"Authorization": "Bearer test_token"}
create = client.post("/api/jobs", headers=headers, json={"job": "Replay test"})
job_id = create.json()["job_id"]
replay = client.get(f"/api/jobs/{job_id}/replay?limit=200", headers=headers)
assert replay.status_code == 200
payload = replay.json()
assert payload["job_id"] == job_id
assert payload["total_frames"] == 1
frame = payload["frames"][0]
assert frame["kind"] == "see_screen"
assert frame["is_fullscreen"] is True
labels = [item.get("label", "") for item in frame["overlays"]]
assert any("click" in text.lower() for text in labels)
assert any("typed" in text.lower() for text in labels)
def test_replay_endpoint_skips_visual_paths_outside_artifacts(tmp_path: Path, monkeypatch: Any) -> None:
app, _ = _build_app(tmp_path, monkeypatch, disable_ui=False)
manager = app.state.manager
client = TestClient(app)
headers = {"Authorization": "Bearer test_token"}
create = client.post("/api/jobs", headers=headers, json={"job": "Replay path check"})
job_id = create.json()["job_id"]
manager._events[job_id].append(
{
"id": 999,
"job_id": job_id,
"ts": "2026-05-27T00:01:00Z",
"step": 2,
"event_type": "visual_update",
"payload": {
"kind": "see_screen",
"image_meta": {
"path": str((tmp_path / "outside.png").resolve()),
"width": 100,
"height": 100,
"grid": True,
},
},
}
)
replay = client.get(f"/api/jobs/{job_id}/replay?limit=500", headers=headers)
assert replay.status_code == 200
payload = replay.json()
assert payload["total_frames"] == 1
def test_analytics_endpoint_groups_by_category_and_time(tmp_path: Path, monkeypatch: Any) -> None:
app, _ = _build_app(tmp_path, monkeypatch, disable_ui=False)
manager = app.state.manager
client = TestClient(app)
headers = {"Authorization": "Bearer test_token"}
browser_completed = client.post("/api/jobs", headers=headers, json={"job": "Open amazon.de and checkout"}).json()["job_id"]
browser_failed = client.post("/api/jobs", headers=headers, json={"job": "Open website and login"}).json()["job_id"]
terminal_completed = client.post("/api/jobs", headers=headers, json={"job": "Run a shell command to inspect files"}).json()["job_id"]
manager._jobs[browser_completed].update(
status="completed",
ended_at="2026-05-27T00:10:00Z",
steps=4,
created_at="2026-05-27T00:00:01Z",
usage={**manager._jobs[browser_completed]["usage"], "estimated_cost_usd": 0.12},
)
manager._jobs[browser_failed].update(
status="failed",
ended_at="2026-05-28T00:10:00Z",
steps=6,
created_at="2026-05-28T00:00:01Z",
usage={**manager._jobs[browser_failed]["usage"], "estimated_cost_usd": 0.24},
)
manager._jobs[terminal_completed].update(
status="completed",
ended_at="2026-05-28T00:15:00Z",
steps=10,
created_at="2026-05-28T00:00:02Z",
usage={**manager._jobs[terminal_completed]["usage"], "estimated_cost_usd": 0.05},
)
analytics = client.get("/api/analytics", headers=headers)
assert analytics.status_code == 200
payload = analytics.json()
assert payload["total_jobs"] == 3
assert payload["finished_jobs"] == 3
assert payload["completed_jobs"] == 2
assert payload["failed_jobs"] == 1
assert payload["success_rate"] == 66.67
assert payload["avg_steps"] == 6.67
assert payload["avg_cost_usd"] == 0.136667
browser = next(row for row in payload["by_category"] if row["label"] == "Browser / web")
terminal = next(row for row in payload["by_category"] if row["label"] == "Files / terminal")
assert browser["finished_jobs"] == 2
assert browser["success_rate"] == 50.0
assert browser["avg_steps"] == 5.0
assert terminal["success_rate"] == 100.0
assert [row["label"] for row in payload["timeline"]] == ["2026-05-27", "2026-05-28"]
def test_ui_toggle(tmp_path: Path, monkeypatch: Any) -> None: def test_ui_toggle(tmp_path: Path, monkeypatch: Any) -> None:
app_enabled, _ = _build_app(tmp_path / "enabled", monkeypatch, disable_ui=False) app_enabled, _ = _build_app(tmp_path / "enabled", monkeypatch, disable_ui=False)
client_enabled = TestClient(app_enabled) client_enabled = TestClient(app_enabled)
root_enabled = client_enabled.get("/") root_enabled = client_enabled.get("/")
assert root_enabled.status_code == 200 assert root_enabled.status_code == 200
assert "ScreenJob Monitor" in root_enabled.text assert "ScreenJob Monitor" in root_enabled.text
assert "Success by Objective Category" in root_enabled.text
js_enabled = client_enabled.get("/ui/monitoring.js")
assert js_enabled.status_code == 200
assert "const tokenInput" in js_enabled.text
app_disabled, _ = _build_app(tmp_path / "disabled", monkeypatch, disable_ui=True) app_disabled, _ = _build_app(tmp_path / "disabled", monkeypatch, disable_ui=True)
client_disabled = TestClient(app_disabled) client_disabled = TestClient(app_disabled)

View File

@@ -72,3 +72,55 @@ def test_storage_response_fallback_uses_result_when_json_missing(tmp_path: Path)
assert job is not None assert job is not None
assert job["response"]["return"] == "Legacy result string" assert job["response"]["return"] == "Legacy result string"
assert job["response"]["data"] is None assert job["response"]["data"] is None
def test_history_db_analytics_groups_by_category_and_day(tmp_path: Path) -> None:
db = HistoryDB(tmp_path / "screenjob_test_analytics.db")
db.create_job(
job_id="job_browser_ok",
objective="Open amazon.de and checkout",
model="gpt-5.4-mini",
created_at="2026-05-27T00:00:01Z",
safety_override=False,
disabled_tools=[],
)
db.update_job("job_browser_ok", status="completed", steps=4, estimated_cost_usd=0.12)
db.create_job(
job_id="job_browser_fail",
objective="Open website and login",
model="gpt-5.4-mini",
created_at="2026-05-28T00:00:01Z",
safety_override=False,
disabled_tools=[],
)
db.update_job("job_browser_fail", status="failed", steps=6, estimated_cost_usd=0.24)
db.create_job(
job_id="job_terminal_ok",
objective="Run a shell command to inspect files",
model="gpt-5.4-mini",
created_at="2026-05-28T00:00:02Z",
safety_override=False,
disabled_tools=[],
)
db.update_job("job_terminal_ok", status="completed", steps=10, estimated_cost_usd=0.05)
analytics = db.analytics()
assert analytics["total_jobs"] == 3
assert analytics["finished_jobs"] == 3
assert analytics["completed_jobs"] == 2
assert analytics["failed_jobs"] == 1
assert analytics["success_rate"] == 66.67
assert analytics["avg_steps"] == 6.67
assert analytics["avg_cost_usd"] == 0.136667
browser = next(row for row in analytics["by_category"] if row["label"] == "Browser / web")
terminal = next(row for row in analytics["by_category"] if row["label"] == "Files / terminal")
assert browser["finished_jobs"] == 2
assert browser["success_rate"] == 50.0
assert browser["avg_steps"] == 5.0
assert terminal["success_rate"] == 100.0
assert [row["label"] for row in analytics["timeline"]] == ["2026-05-27", "2026-05-28"]

13
todo.md
View File

@@ -4,21 +4,20 @@
- [Bug] Enforce single active desktop-control run (or a strict queue) so concurrent jobs cannot fight over the same mouse/keyboard/screen session. - [Bug] Enforce single active desktop-control run (or a strict queue) so concurrent jobs cannot fight over the same mouse/keyboard/screen session.
- [Bug] Fix run artifact collisions in `setup_artifacts()` (`run_id` is second-granularity, so two jobs in the same second can share/overwrite the same directory). - [Bug] Fix run artifact collisions in `setup_artifacts()` (`run_id` is second-granularity, so two jobs in the same second can share/overwrite the same directory).
- [Bug] Remove global logger handler clobbering in `setup_logger()` (`logging.getLogger("screenjob").handlers.clear()` breaks concurrent runs and can redirect logs to the wrong file). - [Bug] Remove global logger handler clobbering in `setup_logger()` (`logging.getLogger("screenjob").handlers.clear()` breaks concurrent runs and can redirect logs to the wrong file).
- [Bug] More consistent clicks and more uses of enhance images. - [x] More consistent clicks and more uses of enhance images.
## P1 ## P1
- [x] Move ui.py into a seperate html file and js file.
- [x] Think harder using effort "medium" by default.
- [x] Decay old screenshots after 3 to 5 steps to save (1) tokens and (2) brain fuck in the agents.
- [Bug] Validate `disabled_tools` against an allowlist and disallow disabling critical completion flow (`task_complete`) to avoid guaranteed step-limit failures. - [Bug] Validate `disabled_tools` against an allowlist and disallow disabling critical completion flow (`task_complete`) to avoid guaranteed step-limit failures.
- [Bug] Improve `execute_command` cancellation/timeout handling to terminate full process trees, not only the parent shell process. - [Bug] Improve `execute_command` cancellation/timeout handling to terminate full process trees, not only the parent shell process.
- [Bug] Reduce API/UI token leakage risk by moving away from query-string token usage for websocket/artifact access where possible.
- [Idea] Add per-token rate limiting and request size limits (objective length + payload bounds) for API hardening.
## P2 ## P2
- [Bug] Fix UI event style mapping mismatch (`tool_called` events are emitted, but UI color map expects `tool_call`). - [Bug] Fix UI event style mapping mismatch (`tool_called` events are emitted, but UI color map expects `tool_call`).
- [Idea] Reduce monitoring UI backend load by throttling websocket-triggered refreshes and avoiding full job/event re-fetch on every event. - [Idea] Reduce monitoring UI backend load by throttling websocket-triggered refreshes and avoiding full job/event re-fetch on every event.
- [Idea] Add cursor-based pagination for jobs/events instead of large fixed limits.
- [Idea] Support offline/self-hosted UI assets (bundle Tailwind instead of CDN dependency).
- [Idea] Add retention controls/pruning for old runs, screenshots, and DB rows. - [Idea] Add retention controls/pruning for old runs, screenshots, and DB rows.
## P3 ## P3
- [Idea] Add Replay Mode; Ability to replay a session by reconstructing the screen from screenshots and overlaying tool calls and click and type events. - [x] Add Replay Mode; Ability to replay a session by reconstructing the screen from screenshots and overlaying tool calls and click and type events.
- [Idea] Add lightweight analytics dashboards (success rate by objective category, avg steps/cost over time). - [x] Add lightweight analytics dashboards (success rate by objective category, avg steps/cost over time).

53
tray_service_control.ps1 Normal file
View File

@@ -0,0 +1,53 @@
[CmdletBinding()]
param(
[ValidateSet("start", "stop", "restart")]
[string]$Action,
[string]$ServiceName = "ScreenJobBackend"
)
Set-StrictMode -Version Latest
$ErrorActionPreference = "Stop"
function Wait-ForStatus {
param(
[Parameter(Mandatory = $true)]$Service,
[Parameter(Mandatory = $true)][System.ServiceProcess.ServiceControllerStatus]$TargetStatus,
[int]$TimeoutSeconds = 20
)
$deadline = (Get-Date).AddSeconds($TimeoutSeconds)
while ((Get-Date) -lt $deadline) {
$Service.Refresh()
if ($Service.Status -eq $TargetStatus) {
return
}
Start-Sleep -Milliseconds 350
}
throw "Timed out waiting for service '$($Service.ServiceName)' to reach status '$TargetStatus'."
}
$service = Get-Service -Name $ServiceName -ErrorAction Stop
switch ($Action) {
"start" {
if ($service.Status -ne [System.ServiceProcess.ServiceControllerStatus]::Running) {
Start-Service -Name $ServiceName -ErrorAction Stop
Wait-ForStatus -Service $service -TargetStatus ([System.ServiceProcess.ServiceControllerStatus]::Running)
}
}
"stop" {
if ($service.Status -ne [System.ServiceProcess.ServiceControllerStatus]::Stopped) {
Stop-Service -Name $ServiceName -Force -ErrorAction Stop
Wait-ForStatus -Service $service -TargetStatus ([System.ServiceProcess.ServiceControllerStatus]::Stopped)
}
}
"restart" {
if ($service.Status -eq [System.ServiceProcess.ServiceControllerStatus]::Running) {
Restart-Service -Name $ServiceName -Force -ErrorAction Stop
} else {
Start-Service -Name $ServiceName -ErrorAction Stop
}
Wait-ForStatus -Service $service -TargetStatus ([System.ServiceProcess.ServiceControllerStatus]::Running)
}
}

View File

@@ -0,0 +1,36 @@
[CmdletBinding(SupportsShouldProcess = $true)]
param(
[string]$ServiceName = "ScreenJobBackend"
)
Set-StrictMode -Version Latest
$ErrorActionPreference = "Stop"
function Test-IsAdministrator {
$identity = [Security.Principal.WindowsIdentity]::GetCurrent()
$principal = New-Object Security.Principal.WindowsPrincipal($identity)
return $principal.IsInRole([Security.Principal.WindowsBuiltInRole]::Administrator)
}
if (-not (Test-IsAdministrator)) {
throw "Run this script from an elevated PowerShell session (Run as Administrator)."
}
$service = Get-Service -Name $ServiceName -ErrorAction SilentlyContinue
if ($null -eq $service) {
Write-Host "Service '$ServiceName' is not installed."
exit 0
}
if ($PSCmdlet.ShouldProcess($ServiceName, "Uninstall service")) {
if ($service.Status -ne "Stopped") {
Stop-Service -Name $ServiceName -Force -ErrorAction Stop
}
& sc.exe delete $ServiceName | Out-Null
if ($LASTEXITCODE -ne 0) {
throw "Failed to delete service '$ServiceName' (sc.exe exit code $LASTEXITCODE)."
}
}
Write-Host "Service '$ServiceName' uninstalled successfully." -ForegroundColor Green