Add structured text-targeted control lookup/click helpers #19

Closed
opened 2026-05-01 17:11:55 +02:00 by luna · 0 comments
Collaborator

Problem

Current OCR results are useful for inspection, but agents still have to manually translate text hits into click coordinates. This is brittle and slows down safe automation.

Desired capability

A first-class way to target controls by visible text, for example:

  • click button with text Buy Now
  • find visible control matching Stop Recording
  • return best candidate(s) for Place Order

Expected behavior

  • Search within current screen or supplied region
  • Return matched candidate(s) with confidence and bounds
  • Optionally execute a click on the best match
  • Work especially well for buttons, menu items, and dialog actions

Why this matters

This would remove a ton of fragile OCR-to-coordinate glue in agents and make high-confidence actions much safer.

## Problem Current OCR results are useful for inspection, but agents still have to manually translate text hits into click coordinates. This is brittle and slows down safe automation. ## Desired capability A first-class way to target controls by visible text, for example: - click button with text `Buy Now` - find visible control matching `Stop Recording` - return best candidate(s) for `Place Order` ## Expected behavior - Search within current screen or supplied region - Return matched candidate(s) with confidence and bounds - Optionally execute a click on the best match - Work especially well for buttons, menu items, and dialog actions ## Why this matters This would remove a ton of fragile OCR-to-coordinate glue in agents and make high-confidence actions much safer.
luna closed this issue 2026-05-01 17:14:49 +02:00
This repo is archived. You cannot comment on issues.
No Label
1 Participants
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: space/clickthrough#19