improvements: Big Reviewer Update #4

Merged
space merged 7 commits from improv/big-reviewer-update into main 2026-05-23 14:22:33 +02:00
Owner
No description provided.
space added 1 commit 2026-05-23 13:37:24 +02:00
feat: Enhance review prompt with detailed instructions and placeholders for empty sections
All checks were successful
ci / test (pull_request) Successful in 34s
ci / publish (pull_request) Has been skipped
d6a9397914
Author
Owner

@codex review

@codex review
First-time contributor

👀 Codex review queued for commit d6a9397.

👀 Codex review queued for commit `d6a9397`.
space added 1 commit 2026-05-23 13:53:30 +02:00
Resolve Runner issue, runner refused to launch
All checks were successful
ci / test (pull_request) Successful in 35s
ci / publish (pull_request) Has been skipped
c3925f37e1
First-time contributor

Review Result

This PR introduces a checkout robustness regression that can cause valid PR reviews to fail.

Findings

1) PR checkout assumes both refs exist on one remote

In src/gitea_codex_bot/workers/container_runner.py (around the new fetch step), the runner does:

  • clone one clone_url as origin
  • git fetch --no-tags origin <base_ref> <head_ref>

For fork and other cross-repo topologies, base_ref and head_ref are not guaranteed to both exist on that same remote. If either ref is missing, fetch fails and review aborts before checkout.

This is a correctness issue because it can reject reviewable PRs purely due to ref layout.

Suggested Fix

Use SHA-based fetch (or independent tolerant fetches) rather than requiring both branch names from one remote. The runner already has base_sha and head_sha, which are stable identifiers for this purpose.

Test Gap

I did not see a test covering fork/cross-remote PR checkout where one of base_ref/head_ref is absent on origin. Adding that case would prevent regressions here.

Note: model gpt-5.3-codex, input 26991, output 2212 tokens used.

<!-- codex-review:head_sha=d6a9397914696fcb0ded6e081b1e44fd71b0c755 --> ## Review Result This PR introduces a checkout robustness regression that can cause valid PR reviews to fail. ## Findings ### 1) PR checkout assumes both refs exist on one remote In `src/gitea_codex_bot/workers/container_runner.py` (around the new fetch step), the runner does: - clone one `clone_url` as `origin` - `git fetch --no-tags origin <base_ref> <head_ref>` For fork and other cross-repo topologies, `base_ref` and `head_ref` are not guaranteed to both exist on that same remote. If either ref is missing, fetch fails and review aborts before checkout. This is a correctness issue because it can reject reviewable PRs purely due to ref layout. ## Suggested Fix Use SHA-based fetch (or independent tolerant fetches) rather than requiring both branch names from one remote. The runner already has `base_sha` and `head_sha`, which are stable identifiers for this purpose. ## Test Gap I did not see a test covering fork/cross-remote PR checkout where one of `base_ref`/`head_ref` is absent on `origin`. Adding that case would prevent regressions here. _Note: model `gpt-5.3-codex`, input `26991`, output `2212` tokens used._
space added 1 commit 2026-05-23 13:57:58 +02:00
[fix]. Harden fork PR fetch + config marker
All checks were successful
ci / test (pull_request) Successful in 33s
ci / publish (pull_request) Has been skipped
01e10abc71
space added 1 commit 2026-05-23 14:03:25 +02:00
[refactor]. Remove legacy review prompt path
All checks were successful
ci / test (pull_request) Successful in 34s
ci / publish (pull_request) Has been skipped
30aa737516
space changed title from feat: Enhance review prompt with detailed instructions and placeholders for empty sections to improvements: Big Reviewer Update 2026-05-23 14:04:16 +02:00
space added 1 commit 2026-05-23 14:07:54 +02:00
[feat]. Add @codex -h help summary command
All checks were successful
ci / test (pull_request) Successful in 38s
ci / publish (pull_request) Has been skipped
9392591429
Author
Owner

@codex -h

@codex -h
First-time contributor

Codex Help

Supported commands:

  • @codex review [security|performance|tests] [--full]
  • @codex rerun
  • @codex explain
  • @codex fix [--branch ...]
  • @codex ignore
  • @codex -h / @codex --help / @codex help

Status note:

  • Pending jobs on this PR: 1
  • Fix command enabled: false
  • Latest review command: review status succeeded. Summary: The PR introduces a likely functional regression in PR checkout for some pull request topologies (especially forks) by fetching both base and head refs from a single origin remot

Discussion summary (4 comments, human 2, bot 2):

  • @space: @codex review
  • @gitea-codex: 👀 Codex review queued for commit d6a9397.
  • @gitea-codex: ## Review Result This PR introduces a checkout robustness regression that can cause valid PR reviews to fail...
  • @space: @codex -h
## Codex Help Supported commands: - `@codex review [security|performance|tests] [--full]` - `@codex rerun` - `@codex explain` - `@codex fix [--branch ...]` - `@codex ignore` - `@codex -h` / `@codex --help` / `@codex help` Status note: - Pending jobs on this PR: `1` - Fix command enabled: `false` - Latest review command: `review` status `succeeded`. Summary: The PR introduces a likely functional regression in PR checkout for some pull request topologies (especially forks) by fetching both base and head refs from a single `origin` remot Discussion summary (4 comments, human `2`, bot `2`): - @space: @codex review - @gitea-codex: 👀 Codex review queued for commit `d6a9397`. - @gitea-codex: <!-- codex-review:head_sha=d6a9397914696fcb0ded6e081b1e44fd71b0c755 --> ## Review Result This PR introduces a checkout robustness regression that can cause valid PR reviews to fail... - @space: @codex -h
Author
Owner

@codex review

@codex review
First-time contributor

👀 Codex review queued for commit 9392591.

👀 Codex review queued for commit `9392591`.
First-time contributor

Review Result

I found functional regressions introduced by this PR that affect review correctness and behavior.

Findings

1) PR-scoped context removed from model input (high)

The new ephemeral runner path no longer passes explicit PR review context (changed files, unified diff, base/head scope, etc.) and now sends only a short free-form prompt (review: ...).

This can cause reviews to drift from "issues introduced by this PR" and produce incorrect findings.

2) .codex-review.yml behavior regression (medium)

The code still loads repo config, but key config-driven behaviors previously applied in prompt construction are no longer enforced in the new path (notably default_mode fallback for @codex review without explicit mode, and other scope-shaping behavior).

Repos depending on these settings will observe changed review behavior.

3) Output marker parsing can fail on collisions (medium)

Result extraction uses global first-occurrence marker matching. If marker text appears inside generated JSON content, extraction can truncate and fail.

Use end-marker search anchored after the located start marker, and ideally randomized per-run markers.

Suggested Follow-up

  1. Restore explicit PR scope/context in runner prompting or use a native base/head review target.

  2. Reintroduce config semantics (default_mode and other scope controls) in the new container path.

  3. Harden marker parsing and add tests for marker-collision scenarios.

Note: model gpt-5.3-codex, input 68257, output 3309 tokens used.

<!-- codex-review:head_sha=93925914290f32b8ff25b598a4969de7b4d2c811 --> ## Review Result I found functional regressions introduced by this PR that affect review correctness and behavior. ## Findings ### 1) PR-scoped context removed from model input (high) The new ephemeral runner path no longer passes explicit PR review context (changed files, unified diff, base/head scope, etc.) and now sends only a short free-form prompt (`review: ...`). This can cause reviews to drift from "issues introduced by this PR" and produce incorrect findings. ### 2) `.codex-review.yml` behavior regression (medium) The code still loads repo config, but key config-driven behaviors previously applied in prompt construction are no longer enforced in the new path (notably `default_mode` fallback for `@codex review` without explicit mode, and other scope-shaping behavior). Repos depending on these settings will observe changed review behavior. ### 3) Output marker parsing can fail on collisions (medium) Result extraction uses global first-occurrence marker matching. If marker text appears inside generated JSON content, extraction can truncate and fail. Use end-marker search anchored after the located start marker, and ideally randomized per-run markers. ## Suggested Follow-up 1. Restore explicit PR scope/context in runner prompting or use a native base/head review target. 2. Reintroduce config semantics (`default_mode` and other scope controls) in the new container path. 3. Harden marker parsing and add tests for marker-collision scenarios. _Note: model `gpt-5.3-codex`, input `68257`, output `3309` tokens used._
space added 1 commit 2026-05-23 14:15:07 +02:00
[fix]. Restore PR-scoped review + remove fix cmd
All checks were successful
ci / test (pull_request) Successful in 32s
ci / publish (pull_request) Has been skipped
08075cb3c4
Author
Owner

@codex review "give me a merge score"

@codex review "give me a merge score"
First-time contributor

👀 Codex review queued for commit 08075cb.

👀 Codex review queued for commit `08075cb`.
First-time contributor

Review Result

I found one behavior regression introduced by this PR.

Findings

1) @codex fix now gets silently ignored

src/gitea_codex_bot/services/commands.py removes fix from SUPPORTED_COMMANDS, so parse_command() returns None for @codex fix ....

Because of that, no job is queued for new fix comments, and the dispatcher's new deprecation path ("@codex fix is no longer supported") is not reached for normal webhook traffic.

This changes behavior from explicit handling to silent no-op for users.

Suggested Fix

Either:

  • Keep fix parseable as a deprecated command and let dispatcher post the deprecation message, or
  • Add explicit webhook fallback handling for @codex fix parse failures that posts the same deprecation notice.

Merge Score

Not ready to merge yet due to the user-visible command regression above.

Note: model gpt-5.3-codex, input 40143, output 2475 tokens used.

<!-- codex-review:head_sha=08075cb3c49ec6c2a000f633e2407033beadb7e3 --> ## Review Result I found one behavior regression introduced by this PR. ## Findings ### 1) `@codex fix` now gets silently ignored `src/gitea_codex_bot/services/commands.py` removes `fix` from `SUPPORTED_COMMANDS`, so `parse_command()` returns `None` for `@codex fix ...`. Because of that, no job is queued for new fix comments, and the dispatcher's new deprecation path (`"@codex fix is no longer supported"`) is not reached for normal webhook traffic. This changes behavior from explicit handling to silent no-op for users. ## Suggested Fix Either: - Keep `fix` parseable as a deprecated command and let dispatcher post the deprecation message, or - Add explicit webhook fallback handling for `@codex fix` parse failures that posts the same deprecation notice. ## Merge Score Not ready to merge yet due to the user-visible command regression above. _Note: model `gpt-5.3-codex`, input `40143`, output `2475` tokens used._
space added 1 commit 2026-05-23 14:20:43 +02:00
[fix]. Reply on unsupported @codex commands
All checks were successful
ci / test (pull_request) Successful in 36s
ci / publish (pull_request) Has been skipped
d1ca1052f4
Author
Owner

@codex penis

@codex penis
First-time contributor

⚠️ Command @codex penis is not supported. Try @codex -h.

⚠️ Command `@codex penis` is not supported. Try `@codex -h`.
space merged commit cf95343e06 into main 2026-05-23 14:22:33 +02:00
space deleted branch improv/big-reviewer-update 2026-05-23 14:22:36 +02:00
Sign in to join this conversation.
No Reviewers
No Label
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: space/gitea-codex#4