This repository has been archived on 2026-05-20. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Files
clickthrough/TODO.md
space 9e816e0417
All checks were successful
python-syntax / syntax-check (push) Successful in 6s
Add pytesseract OCR, click_text interact action, and interact verify endpoint
2026-05-03 20:57:34 +02:00

34 lines
1.6 KiB
Markdown

# TODO
## Project: Clickthrough v1
## Current Status
- [x] Draft implementation plan approved
- [x] Build FastAPI server with screenshot + grid + zoom + actions
- [x] Add auth + safety guardrails (token, dry-run, bounds)
- [x] Add AgentSkill docs for operating the API reliably
- [x] Add Gitea CI workflow for Python syntax safety
- [x] Add usage docs + quickstart
- [x] Run local syntax validation
## Notes
- API responses now include request IDs, timestamps, and coordinate metadata
- Local syntax checks passed (`py_compile`, `compileall`)
- CI workflow runs syntax checks on push + PR
## Next
- [x] Add `POST /exec` endpoint (PowerShell/Bash/CMD) with timeout + stdout/stderr
- [x] Add exec configuration via env (`CLICKTHROUGH_EXEC_*`)
- [x] Document exec API + config
- [x] Create backlog issues for OCR/find/window/input/session-state improvements
- [x] Open PR for exec feature branch and review/merge
- [x] Require configured exec secret + per-request exec secret header
- [x] Upgrade skill with verify-before-click rules, confidence thresholds, two-phase risky actions, and Spotify playbook
- [x] Add top-level skill section for instance setup + mini API docs
- [x] Clarify user-owned setup responsibilities vs agent responsibilities in skill docs
## Deferred Backlog
- [ ] Higher-level task macros composed from `see` + `interact` + `interact/verify` primitives
- [ ] Additional verify primitives beyond `ocr_text_near_point` (image-diff region, window title/process state, color/pixel checks)
- [ ] Broader API simplification pass to reduce payload overlap and consolidate shared OCR options