Add structured text-targeted control lookup/click helpers #19

New Issue

luna · 2026-05-01T17:11:55+02:00

luna commented

2026-05-01 17:11:55 +02:00

Problem

Current OCR results are useful for inspection, but agents still have to manually translate text hits into click coordinates. This is brittle and slows down safe automation.

Desired capability

A first-class way to target controls by visible text, for example:

click button with text Buy Now
find visible control matching Stop Recording
return best candidate(s) for Place Order

Expected behavior

Search within current screen or supplied region
Return matched candidate(s) with confidence and bounds
Optionally execute a click on the best match
Work especially well for buttons, menu items, and dialog actions

Why this matters

This would remove a ton of fragile OCR-to-coordinate glue in agents and make high-confidence actions much safer.

## Problem Current OCR results are useful for inspection, but agents still have to manually translate text hits into click coordinates. This is brittle and slows down safe automation. ## Desired capability A first-class way to target controls by visible text, for example: - click button with text `Buy Now` - find visible control matching `Stop Recording` - return best candidate(s) for `Place Order` ## Expected behavior - Search within current screen or supplied region - Return matched candidate(s) with confidence and bounds - Optionally execute a click on the best match - Work especially well for buttons, menu items, and dialog actions ## Why this matters This would remove a ton of fragile OCR-to-coordinate glue in agents and make high-confidence actions much safer.

luna closed this issue

2026-05-01 17:14:49 +02:00

This repo is archived. You cannot comment on issues.

1 Participants

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: space/clickthrough#19