Summarize model evaluation results and log them to Confluence

Read eval results from Google Docs, summarize key metrics, and create a structured Confluence page

Overview

Eval results pile up as draft notes that never make it into your internal docs. This template reads your results, formats them into a structured summary, and creates a Confluence page the team can actually find—flagging regressions so nothing slips through.

How it works

Reads the eval results from your specified Google Doc
Summarizes benchmark results: model name, dataset, key metrics, and regressions
Creates a new Confluence page in your specified space with todays date
Populates the page with the structured summary
Adds a bold warning at the top if any metric regressed more than 5%

Who this is for

AI engineers and ML researchers who run regular evaluations and want results documented where the team can find them without writing up summaries by hand.

Suggested prompt

Read the contents of my Google Doc called [1. Eval results doc name] and summarize the benchmark results in a structured format: model name, dataset, key metrics (accuracy, F1, latency), and any notable regressions compared to the previous run. Then create a new Confluence page in the [2. Space name] space titled "[3. Model name] Eval Summary—[todays date]" and populate it with that summary. If any metric shows a regression of more than 5% from the prior run, add a bold warning note at the top of the page flagging it for review.

Frequently asked questions

Can I change the regression threshold?

Yes, adjust 5% in the prompt to any threshold that makes sense for your team.

What if my results are in a different format?

Modify the prompt to match your eval output structure and the metrics you track.

Can I log results to GitHub instead?

Yes, swap Confluence for GitHub in the prompt to create a markdown file in your repo wiki.

Can I add more sections to the summary?

Yes, modify the prompt to include additional sections like recommendations or next steps.