First MVP

This commit is contained in:
Space-Banane
2026-05-22 19:25:57 +02:00
parent 673f70b32a
commit 860ccb731d
40 changed files with 2336 additions and 0 deletions

245
Idea.md Normal file
View File

@@ -0,0 +1,245 @@
Architecture:
```text
Gitea
└─ webhook: pull_request_comment / issue_comment
└─ gitea-codex-bot API
├─ verifies X-Gitea-Signature
├─ checks body starts with @codex review
├─ queues review job
└─ worker:
├─ clones repo / fetches PR branches
├─ builds git diff + context
├─ runs codex headless
├─ parses JSON findings
└─ posts review comment as codex-bot
```
Use a real Gitea user, e.g. `codex-bot`. Give it a token with minimum access: read repo, read PRs/issues, write comments. Do not use your personal admin token. Gitea exposes Swagger/OpenAPI per instance at `/api/swagger` and `/swagger.v1.json`, so you can wire against your actual server version instead of guessing endpoints. ([Gitea Documentation][3])
MVP behavior:
```text
User comments:
@codex review
Bot replies:
👀 Codex review queued for commit abc123...
Later edits/posts:
## Codex Review
Verdict: patch mostly correct
Confidence: 0.78
Findings:
1. src/auth.ts:42-55
Token validation accepts expired tokens in one path.
2. api/users.ts:88
Missing permission check before update.
No blocking issues found in tests.
```
For v1, post one normal PR timeline comment. Do not fight inline comments yet. Gitea has PR review webhook concepts, but line-level diff review API support can be version-sensitive/awkward; there are still recent reports about API-token support for diff-level review comments being unclear. ([Gitea Documentation][1]) Summary comments are reliable and still useful.
Core trigger logic:
```ts
if (event !== "pull_request_comment" && event !== "issue_comment") return;
if (!payload.is_pull && !payload.pull_request) return;
if (payload.sender.username === "codex-bot") return;
if (!payload.comment.body.trim().startsWith("@codex review")) return;
enqueueReview(payload.repository.full_name, payload.pull_request.number);
```
Job flow:
```text
1. Verify webhook HMAC.
2. Dedupe by delivery ID/comment ID.
3. Parse command:
@codex review
@codex review security
@codex review tests
@codex review --full
4. Create “queued” comment.
5. Clone/fetch repo into isolated temp dir.
6. Checkout PR head.
7. Generate:
git diff base...head
changed file list
optional full changed-file content
optional test output
8. Run Codex headless with JSON schema.
9. Validate JSON.
10. Post/update review comment.
```
Use SQLite first:
```sql
reviews(
id,
repo,
pr_number,
head_sha,
trigger_comment_id,
status,
requested_by,
created_at,
updated_at,
result_json
)
```
Suggested service stack:
```text
Backend: Python FastAPI or Node/TS Fastify
Queue: SQLite jobs first, Redis later
Runner: Docker worker container
Storage: /var/lib/gitea-codex-bot
Auth: bot PAT + webhook secret
Deployment: docker compose
```
Config:
```env
GITEA_BASE_URL=https://git.example.com
GITEA_TOKEN=...
GITEA_BOT_USERNAME=codex-bot
GITEA_WEBHOOK_SECRET=...
OPENAI_API_KEY=...
WORKDIR=/var/lib/gitea-codex/worktrees
MAX_DIFF_BYTES=200000
MAX_REVIEW_MINUTES=10
CONCURRENCY=1
```
Good commands to support later:
```text
@codex review
@codex review security
@codex review performance
@codex review tests
@codex review --full
@codex explain
@codex fix
@codex fix --branch
@codex ignore
@codex rerun
```
Best v2 feature: persistent review comment. Instead of spamming new comments, the bot finds its previous comment on that PR and edits it:
```text
<!-- codex-review:head_sha=abc123 -->
## Codex Review
...
```
Then reruns replace the same block.
Best v3 feature: fixes. User comments:
```text
@codex fix finding 2
```
Bot creates a branch:
```text
codex/pr-42-fix-permission-check
```
Then opens a PR or pushes to the existing PR branch only if allowed. Keep this disabled by default. Review-only is safer.
Security rules that matter:
```text
- Verify X-Gitea-Signature.
- Ignore bots own comments.
- Allowlist repos/orgs.
- Never run on untrusted fork PRs unless sandboxed hard.
- No Docker socket mount.
- No host filesystem mount except temp workdir.
- Timeout every job.
- Limit diff size.
- Redact .env, secrets, keys.
- Use bot token, not admin token.
- Log prompt + result, but not secrets.
```
Prompt shape for Codex:
```text
You are reviewing a Gitea pull request.
Focus only on issues introduced by this PR.
Prioritize correctness, security, data loss, broken behavior, bad migrations, and missing tests.
Avoid style nitpicks.
Return JSON:
{
"verdict": "correct" | "has_issues",
"confidence": 0.0-1.0,
"summary": "...",
"findings": [
{
"severity": "low|medium|high|critical",
"file": "...",
"line_start": 1,
"line_end": 1,
"title": "...",
"body": "...",
"suggestion": "..."
}
]
}
```
Practical build order:
```text
1. Make bot account + token.
2. Add webhook receiver.
3. Verify signature + parse @codex review.
4. Post “queued” comment.
5. Clone repo and generate diff.
6. Run Codex headless.
7. Post one summary comment.
8. Add dedupe + SQLite.
9. Add per-repo config file.
10. Add optional inline comments/fix branches later.
```
Per-repo config idea:
```yaml
# .codex-review.yml
enabled: true
review:
default_mode: summary
max_diff_bytes: 200000
include_tests: true
focus:
- correctness
- security
- maintainability
ignore:
- "dist/**"
- "pnpm-lock.yaml"
- "*.min.js"
commands:
allow_fix: false
```
Final recommendation: external webhook bot, summary comments first, bot account + token, Codex headless JSON, SQLite queue. Inline review comments and auto-fix branches are v2/v3. Trying to make the first version “full GitHub Copilot Reviews clone” is how this becomes annoying trash.
[1]: https://docs.gitea.com/usage/repository/webhooks "Webhooks | Gitea Documentation"
[2]: https://developers.openai.com/cookbook/examples/codex/build_code_review_with_codex_sdk "Build Code Review with the Codex SDK"
[3]: https://docs.gitea.com/development/api-usage?utm_source=chatgpt.com "API Usage"