space/clickthrough

Fork 0

Files

Luna 097c6a095c

python-syntax / syntax-check (push) Successful in 5s

Details

python-syntax / syntax-check (pull_request) Successful in 4s

Details

feat(ocr): add /ocr endpoint for screen, region, and image input

2026-04-06 13:48:33 +02:00

4.6 KiB

Raw Blame History

API Reference (v0.1)

Base URL: http://127.0.0.1:8123

If CLICKTHROUGH_TOKEN is set, include header:

x-clickthrough-token: <token>

`GET /health`

Returns status and runtime safety flags, including exec capability config.

`GET /screen`

Query params:

with_grid (bool, default true)
grid_rows (int, default env or 12)
grid_cols (int, default env or 12)
include_labels (bool, default true)
image_format (png|jpeg, default png)
jpeg_quality (1-100, default 85)
asImage (bool, default false) — if true, return raw image bytes only (image/png or image/jpeg)

Default response includes base64 image and metadata (meta.region, optional meta.grid).

`POST /zoom`

Body:

{
  "center_x": 1200,
  "center_y": 700,
  "width": 500,
  "height": 350,
  "with_grid": true,
  "grid_rows": 20,
  "grid_cols": 20,
  "include_labels": true,
  "image_format": "png",
  "jpeg_quality": 90
}

Query params:

asImage (bool, default false) — if true, return raw image bytes only (image/png or image/jpeg)

Default response returns cropped image + region metadata in global pixel coordinates.

`POST /action`

Body: one action.

Pointer target modes

Pixel target

{
  "mode": "pixel",
  "x": 100,
  "y": 200,
  "dx": 0,
  "dy": 0
}

Grid target

{
  "mode": "grid",
  "region_x": 0,
  "region_y": 0,
  "region_width": 1920,
  "region_height": 1080,
  "rows": 12,
  "cols": 12,
  "row": 5,
  "col": 9,
  "dx": 0.0,
  "dy": 0.0
}

dx/dy are normalized offsets in [-1, 1] inside the selected cell.

Action examples

Click:

{
  "action": "click",
  "target": {
    "mode": "grid",
    "region_x": 0,
    "region_y": 0,
    "region_width": 1920,
    "region_height": 1080,
    "rows": 12,
    "cols": 12,
    "row": 7,
    "col": 3,
    "dx": 0.2,
    "dy": -0.1
  },
  "clicks": 1,
  "button": "left"
}

Scroll:

{
  "action": "scroll",
  "target": {"mode": "pixel", "x": 1300, "y": 740},
  "scroll_amount": -500
}

Type text:

{
  "action": "type",
  "text": "hello world",
  "interval_ms": 20
}

Hotkey:

{
  "action": "hotkey",
  "keys": ["ctrl", "l"]
}

`POST /ocr`

Extract visible text from either a full screenshot, a region crop, or caller-provided image bytes.

Body:

{
  "mode": "screen",
  "language_hint": "eng",
  "min_confidence": 0.4
}

Modes:

screen (default): OCR over full captured monitor
region: OCR over explicit region (region_x, region_y, region_width, region_height)
image: OCR over provided image_base64 (supports plain base64 or data URL)

Region mode example:

{
  "mode": "region",
  "region_x": 220,
  "region_y": 160,
  "region_width": 900,
  "region_height": 400,
  "language_hint": "eng",
  "min_confidence": 0.5
}

Image mode example:

{
  "mode": "image",
  "image_base64": "iVBORw0KGgoAAAANSUhEUgAA...",
  "language_hint": "eng"
}

Response shape:

{
  "ok": true,
  "request_id": "...",
  "time_ms": 1710000000000,
  "result": {
    "mode": "screen",
    "language_hint": "eng",
    "min_confidence": 0.4,
    "region": {"x": 0, "y": 0, "width": 1920, "height": 1080},
    "blocks": [
      {
        "text": "Settings",
        "confidence": 0.9821,
        "bbox": {"x": 144, "y": 92, "width": 96, "height": 21}
      }
    ]
  }
}

Notes:

Output is deterministic JSON (stable ordering by top-to-bottom, then left-to-right).
bbox coordinates are in global screen space for screen/region, and image-local for image.
Requires tesseract executable plus Python package pytesseract.

`POST /exec`

Execute a shell command on the host running Clickthrough.

Requirements:

CLICKTHROUGH_EXEC_SECRET must be configured on the server
send header x-clickthrough-exec-secret: <secret>

{
  "command": "Get-Process | Select-Object -First 5",
  "shell": "powershell",
  "timeout_s": 20,
  "cwd": "C:/Users/Paul",
  "dry_run": false
}

Notes:

shell supports powershell, bash, cmd
if shell is omitted, server uses CLICKTHROUGH_EXEC_DEFAULT_SHELL
output is truncated based on CLICKTHROUGH_EXEC_MAX_OUTPUT_CHARS
endpoint can be disabled with CLICKTHROUGH_EXEC_ENABLED=false
if CLICKTHROUGH_EXEC_SECRET is missing, /exec is blocked (403)

Response includes stdout, stderr, exit_code, timeout state, and execution metadata.

`POST /batch`

Runs multiple action payloads sequentially.

{
  "actions": [
    {"action": "move", "target": {"mode": "pixel", "x": 100, "y": 100}},
    {"action": "click", "target": {"mode": "pixel", "x": 100, "y": 100}}
  ],
  "stop_on_error": true
}

4.6 KiB Raw Blame History

API Reference (v0.1)

GET /health

GET /screen

POST /zoom

POST /action