Skip to content

#evaluation

Pull Request Lifecycles as Agentic Coding Eval Data

Agent-authored pull requests already contain much of the signal needed to improve agentic coding workflows: the prompt, opened diff, quality gates, review comments, revisions, merged state, model configuration, and token spend. The useful loop is to mine that lifecycle after merge, propose minimal guidance-surface or quality-gate improvements, retry the clarified task, and measure whether the extra token spend improves post-convergence one-shot success and value-per-token.