improvements: Big Reviewer Update #4

space · 2026-05-23T13:37:23+02:00

space commented

2026-05-23 13:37:23 +02:00

No description provided.

space added 1 commit 2026-05-23 13:37:24 +02:00

feat: Enhance review prompt with detailed instructions and placeholders for empty sections

ci / test (pull_request) Successful in 34s

Details

ci / publish (pull_request) Has been skipped

Details

d6a9397914

space commented

2026-05-23 13:53:06 +02:00

@codex review

gitea-codex commented

2026-05-23 13:53:08 +02:00

First-time contributor

👀 Codex review queued for commit d6a9397.

👀 Codex review queued for commit `d6a9397`.

space added 1 commit 2026-05-23 13:53:30 +02:00

Resolve Runner issue, runner refused to launch

ci / test (pull_request) Successful in 35s

Details

ci / publish (pull_request) Has been skipped

Details

c3925f37e1

gitea-codex commented

2026-05-23 13:54:18 +02:00

First-time contributor

Review Result

This PR introduces a checkout robustness regression that can cause valid PR reviews to fail.

Findings

1) PR checkout assumes both refs exist on one remote

In src/gitea_codex_bot/workers/container_runner.py (around the new fetch step), the runner does:

clone one clone_url as origin
git fetch --no-tags origin <base_ref> <head_ref>

For fork and other cross-repo topologies, base_ref and head_ref are not guaranteed to both exist on that same remote. If either ref is missing, fetch fails and review aborts before checkout.

This is a correctness issue because it can reject reviewable PRs purely due to ref layout.

Suggested Fix

Use SHA-based fetch (or independent tolerant fetches) rather than requiring both branch names from one remote. The runner already has base_sha and head_sha, which are stable identifiers for this purpose.

Test Gap

I did not see a test covering fork/cross-remote PR checkout where one of base_ref/head_ref is absent on origin. Adding that case would prevent regressions here.

Note: model gpt-5.3-codex, input 26991, output 2212 tokens used.

## Review Result This PR introduces a checkout robustness regression that can cause valid PR reviews to fail. ## Findings ### 1) PR checkout assumes both refs exist on one remote In `src/gitea_codex_bot/workers/container_runner.py` (around the new fetch step), the runner does: - clone one `clone_url` as `origin` - `git fetch --no-tags origin <base_ref> <head_ref>` For fork and other cross-repo topologies, `base_ref` and `head_ref` are not guaranteed to both exist on that same remote. If either ref is missing, fetch fails and review aborts before checkout. This is a correctness issue because it can reject reviewable PRs purely due to ref layout. ## Suggested Fix Use SHA-based fetch (or independent tolerant fetches) rather than requiring both branch names from one remote. The runner already has `base_sha` and `head_sha`, which are stable identifiers for this purpose. ## Test Gap I did not see a test covering fork/cross-remote PR checkout where one of `base_ref`/`head_ref` is absent on `origin`. Adding that case would prevent regressions here. _Note: model `gpt-5.3-codex`, input `26991`, output `2212` tokens used._

space added 1 commit 2026-05-23 13:57:58 +02:00

[fix]. Harden fork PR fetch + config marker

ci / test (pull_request) Successful in 33s

Details

ci / publish (pull_request) Has been skipped

Details

01e10abc71

space added 1 commit 2026-05-23 14:03:25 +02:00

[refactor]. Remove legacy review prompt path

ci / test (pull_request) Successful in 34s

Details

ci / publish (pull_request) Has been skipped

Details

30aa737516

space changed title from ~~feat: Enhance review prompt with detailed instructions and placeholders for empty sections~~ to improvements: Big Reviewer Update

2026-05-23 14:04:16 +02:00

space added 1 commit 2026-05-23 14:07:54 +02:00

[feat]. Add @codex -h help summary command

ci / test (pull_request) Successful in 38s

Details

ci / publish (pull_request) Has been skipped

Details

9392591429

space commented

2026-05-23 14:08:53 +02:00

@codex -h

gitea-codex commented

2026-05-23 14:08:53 +02:00

First-time contributor

Codex Help

Supported commands:

@codex review [security|performance|tests] [--full]
@codex rerun
@codex explain
@codex fix [--branch ...]
@codex ignore
@codex -h / @codex --help / @codex help

Status note:

Pending jobs on this PR: 1
Fix command enabled: false
Latest review command: review status succeeded. Summary: The PR introduces a likely functional regression in PR checkout for some pull request topologies (especially forks) by fetching both base and head refs from a single origin remot

Discussion summary (4 comments, human 2, bot 2):

@space: @codex review
@gitea-codex: 👀 Codex review queued for commit d6a9397.
@gitea-codex: ## Review Result This PR introduces a checkout robustness regression that can cause valid PR reviews to fail...
@space: @codex -h

## Codex Help Supported commands: - `@codex review [security|performance|tests] [--full]` - `@codex rerun` - `@codex explain` - `@codex fix [--branch ...]` - `@codex ignore` - `@codex -h` / `@codex --help` / `@codex help` Status note: - Pending jobs on this PR: `1` - Fix command enabled: `false` - Latest review command: `review` status `succeeded`. Summary: The PR introduces a likely functional regression in PR checkout for some pull request topologies (especially forks) by fetching both base and head refs from a single `origin` remot Discussion summary (4 comments, human `2`, bot `2`): - @space: @codex review - @gitea-codex: 👀 Codex review queued for commit `d6a9397`. - @gitea-codex:  ## Review Result This PR introduces a checkout robustness regression that can cause valid PR reviews to fail... - @space: @codex -h

space commented

2026-05-23 14:10:16 +02:00

@codex review

gitea-codex commented

2026-05-23 14:10:18 +02:00

First-time contributor

👀 Codex review queued for commit 9392591.

👀 Codex review queued for commit `9392591`.

gitea-codex commented

2026-05-23 14:11:43 +02:00

First-time contributor

Review Result

I found functional regressions introduced by this PR that affect review correctness and behavior.

Findings

1) PR-scoped context removed from model input (high)

The new ephemeral runner path no longer passes explicit PR review context (changed files, unified diff, base/head scope, etc.) and now sends only a short free-form prompt (review: ...).

This can cause reviews to drift from "issues introduced by this PR" and produce incorrect findings.

2) `.codex-review.yml` behavior regression (medium)

The code still loads repo config, but key config-driven behaviors previously applied in prompt construction are no longer enforced in the new path (notably default_mode fallback for @codex review without explicit mode, and other scope-shaping behavior).

Repos depending on these settings will observe changed review behavior.

3) Output marker parsing can fail on collisions (medium)

Result extraction uses global first-occurrence marker matching. If marker text appears inside generated JSON content, extraction can truncate and fail.

Use end-marker search anchored after the located start marker, and ideally randomized per-run markers.

Suggested Follow-up

Restore explicit PR scope/context in runner prompting or use a native base/head review target.
Reintroduce config semantics (default_mode and other scope controls) in the new container path.
Harden marker parsing and add tests for marker-collision scenarios.

Note: model gpt-5.3-codex, input 68257, output 3309 tokens used.

## Review Result I found functional regressions introduced by this PR that affect review correctness and behavior. ## Findings ### 1) PR-scoped context removed from model input (high) The new ephemeral runner path no longer passes explicit PR review context (changed files, unified diff, base/head scope, etc.) and now sends only a short free-form prompt (`review: ...`). This can cause reviews to drift from "issues introduced by this PR" and produce incorrect findings. ### 2) `.codex-review.yml` behavior regression (medium) The code still loads repo config, but key config-driven behaviors previously applied in prompt construction are no longer enforced in the new path (notably `default_mode` fallback for `@codex review` without explicit mode, and other scope-shaping behavior). Repos depending on these settings will observe changed review behavior. ### 3) Output marker parsing can fail on collisions (medium) Result extraction uses global first-occurrence marker matching. If marker text appears inside generated JSON content, extraction can truncate and fail. Use end-marker search anchored after the located start marker, and ideally randomized per-run markers. ## Suggested Follow-up 1. Restore explicit PR scope/context in runner prompting or use a native base/head review target. 2. Reintroduce config semantics (`default_mode` and other scope controls) in the new container path. 3. Harden marker parsing and add tests for marker-collision scenarios. _Note: model `gpt-5.3-codex`, input `68257`, output `3309` tokens used._

space added 1 commit 2026-05-23 14:15:07 +02:00

[fix]. Restore PR-scoped review + remove fix cmd

ci / test (pull_request) Successful in 32s

Details

ci / publish (pull_request) Has been skipped

Details

08075cb3c4

space commented

2026-05-23 14:16:12 +02:00

@codex review "give me a merge score"

gitea-codex commented

2026-05-23 14:16:14 +02:00

First-time contributor

👀 Codex review queued for commit 08075cb.

👀 Codex review queued for commit `08075cb`.

gitea-codex commented

2026-05-23 14:17:34 +02:00

First-time contributor

Review Result

I found one behavior regression introduced by this PR.

Findings

1) `@codex fix` now gets silently ignored

src/gitea_codex_bot/services/commands.py removes fix from SUPPORTED_COMMANDS, so parse_command() returns None for @codex fix ....

Because of that, no job is queued for new fix comments, and the dispatcher's new deprecation path ("@codex fix is no longer supported") is not reached for normal webhook traffic.

This changes behavior from explicit handling to silent no-op for users.

Suggested Fix

Either:

Keep fix parseable as a deprecated command and let dispatcher post the deprecation message, or
Add explicit webhook fallback handling for @codex fix parse failures that posts the same deprecation notice.

Merge Score

Not ready to merge yet due to the user-visible command regression above.

Note: model gpt-5.3-codex, input 40143, output 2475 tokens used.

## Review Result I found one behavior regression introduced by this PR. ## Findings ### 1) `@codex fix` now gets silently ignored `src/gitea_codex_bot/services/commands.py` removes `fix` from `SUPPORTED_COMMANDS`, so `parse_command()` returns `None` for `@codex fix ...`. Because of that, no job is queued for new fix comments, and the dispatcher's new deprecation path (`"@codex fix is no longer supported"`) is not reached for normal webhook traffic. This changes behavior from explicit handling to silent no-op for users. ## Suggested Fix Either: - Keep `fix` parseable as a deprecated command and let dispatcher post the deprecation message, or - Add explicit webhook fallback handling for `@codex fix` parse failures that posts the same deprecation notice. ## Merge Score Not ready to merge yet due to the user-visible command regression above. _Note: model `gpt-5.3-codex`, input `40143`, output `2475` tokens used._

space added 1 commit 2026-05-23 14:20:43 +02:00

[fix]. Reply on unsupported @codex commands

ci / test (pull_request) Successful in 36s

Details

ci / publish (pull_request) Has been skipped

Details

d1ca1052f4

space commented

2026-05-23 14:20:55 +02:00

@codex penis

gitea-codex commented

2026-05-23 14:20:56 +02:00

First-time contributor

⚠️ Command @codex penis is not supported. Try @codex -h.

⚠️ Command `@codex penis` is not supported. Try `@codex -h`.

space merged commit cf95343e06 into main

2026-05-23 14:22:33 +02:00

space referenced this issue from a commit

2026-05-23 14:22:35 +02:00

Merge pull request 'improvements: Big Reviewer Update' (#4) from improv/big-reviewer-update into main

space deleted branch improv/big-reviewer-update

2026-05-23 14:22:36 +02:00

Sign in to join this conversation.

2 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: space/gitea-codex#4

improvements: Big Reviewer Update #4

Review Result

Findings

1) PR checkout assumes both refs exist on one remote

Suggested Fix

Test Gap

Codex Help

Review Result

Findings

1) PR-scoped context removed from model input (high)

2) .codex-review.yml behavior regression (medium)

3) Output marker parsing can fail on collisions (medium)

Suggested Follow-up

Review Result

Findings

1) @codex fix now gets silently ignored

Suggested Fix

Merge Score

2) `.codex-review.yml` behavior regression (medium)

1) `@codex fix` now gets silently ignored