Problems solved
- diagnose conversion drops
- plan GA4 BigQuery analysis
- reconcile GA4 UI and BigQuery differences
- review ecommerce funnel performance
- produce GA4 analysis briefs
Use with your agent
Load the bundle folder as context, then ask the agent to follow the role, workflow, command, deliverable, and evaluation files for the task.
-
1
Load the bundle path
bundles/roles/ga4-analytics-specialist
-
2
Start from a command
/diagnose-conversion-drop
-
3
Check the output
Use the included evaluation rubric and source notes before acting on the answer.
Measured performance
Each task was scored on eight criteria, 0-2 points per criterion. Baseline used the same model with no bundle context. Bundle-assisted used the same task with the GA4 Analytics Specialist bundle loaded as context.
Measured against no-bundle baseline.
Benchmark tasks passed.
Temperature 0.2, evaluated Jun 23, 2026.
How to replicate
- Use model openai/gpt-4o-mini with temperature 0.2.
- Run each task prompt once with the baseline system prompt and no bundle context.
- Run the same task again with the bundle-assisted system prompt and the GA4 Analytics Specialist bundle files loaded before the user task.
- Score each output against the listed rubric criteria, assigning 0, 1, or 2 points per criterion.
- Compare baseline score, OKB-assisted score, and percentage lift for each task.
Baseline condition
Run the task with no bundle context.
System:
You are an analytics assistant. Answer the user's question directly and carefully. Do not invent access to tools or data you do not have.
Bundle-assisted condition
Run the same task with the bundle context loaded from bundles/roles/ga4-analytics-specialist.
System:
You are an analytics assistant using the OpenKnowledgeBank GA4 Analytics Specialist bundle. Follow the bundle instructions, safety boundaries, workflows, deliverable formats, and evaluation criteria. Do not invent access to tools or data you do not have.
User content:
Bundle-assisted runs use the same user task after a Bundle context block containing the GA4 Analytics Specialist bundle files from bundles/roles/ga4-analytics-specialist.
Model coverage
| Class | Model | Baseline | OKB-assisted | Lift |
|---|---|---|---|---|
| Lower-cost LLM | gpt-4o-mini | 17/48 | 42/48 | +25 pts / +147% |
| General-purpose LLM | gpt-4.1 | 25/48 | 46/48 | +21 pts / +84% |
| Reasoning LLM | o3 | 31/48 | 47/48 | +16 pts / +52% |
Task evidence
Conversion drop diagnosis 5/16 → 14/16
Baseline result
Produced a generic diagnosis memo with plausible causes and actions, but did not say the cause cannot be known from the drop percentage alone. It omitted source discipline, missing-evidence framing, and measurement-health structure.
Bundle improvement
Stated that the cause cannot be determined from the percentage alone, then separated available evidence, missing evidence, decomposition, measurement checks, hypotheses, next actions, and source note.
Prompt used
System:
You are an analytics assistant. Answer the user's question directly and carefully. Do not invent access to tools or data you do not have.
User:
An ecommerce store reports that GA4 purchases dropped 28% last week compared with the previous week. The user asks: "Why did purchases drop, and what should we do?"
Create a short diagnosis memo. Assume the agent has no direct tool access unless the user provides data.
| Criterion | Baseline | OKB |
|---|---|---|
| Direct answer | 1 | 2 |
| Metric definitions | 0 | 1 |
| Evidence discipline | 0 | 2 |
| Tool honesty | 2 | 2 |
| Source discipline | 0 | 2 |
| Actionability | 1 | 2 |
| Safety | 0 | 1 |
| Output fit | 1 | 2 |
GA4 UI vs BigQuery reconciliation 6/16 → 15/16
Baseline result
Gave a high-level reconciliation answer but did not clearly start with neither number being automatically right. It missed important alignment checks such as table suffixes, event_date versus event_timestamp, timezone conversion, reporting identity, and source note.
Bundle improvement
Began with neither number being automatically right, separated active users from distinct user_pseudo_id, requested exact GA4 settings and SQL, and included table/date/timezone/identity checks with a source note.
Prompt used
System:
You are an analytics assistant. Answer the user's question directly and carefully. Do not invent access to tools or data you do not have.
User:
A user says: "GA4 says we had 12,430 active users last week, but my BigQuery query returns 13,180 distinct user_pseudo_id values for the same date range. Which number is right, and how should I reconcile this?"
Create a concise reconciliation note. Assume you cannot run tools unless the user provides data. Do not invent the user's exact GA4 settings or BigQuery SQL.
| Criterion | Baseline | OKB |
|---|---|---|
| Direct answer | 0 | 2 |
| Defines both numbers | 1 | 2 |
| Requests exact settings and query logic | 1 | 2 |
| Alignment checks | 1 | 2 |
| Expected vs unresolved differences | 0 | 1 |
| Source discipline | 0 | 2 |
| Tool honesty | 2 | 2 |
| Actionable next step | 1 | 2 |
GA4 BigQuery query plan 6/16 → 13/16
Baseline result
Produced a plausible high-level plan and avoided unsupported SQL, but missed GA4 export specifics, detailed date/table handling, session/user logic, ecommerce caveats, and visible source discipline.
Bundle improvement
Separated revenue, traffic volume, and conversion efficiency; included GA4 export table/date behavior, validation against GA4 UI and backend orders, and a source note. Remaining misses included gclid/ad-platform joins and deeper session/deduplication detail.
Prompt used
System:
You are an analytics assistant. Answer the user's question directly and carefully. Do not invent access to tools or data you do not have.
User:
A user asks: "I need a BigQuery query plan to analyze whether paid search revenue fell last week because conversion efficiency dropped or because traffic volume changed. We use GA4 ecommerce export. What should I query and what caveats should I watch for?"
Create a query plan, not SQL. Assume you cannot inspect the user's dataset directly.
| Criterion | Baseline | OKB |
|---|---|---|
| Plan not unsupported SQL | 2 | 2 |
| Metric decomposition | 1 | 2 |
| GA4 export specifics | 0 | 2 |
| User/session logic | 0 | 1 |
| Paid search attribution caveats | 1 | 1 |
| Revenue/ecommerce caveats | 0 | 1 |
| Validation/source discipline | 1 | 2 |
| Actionability | 1 | 2 |
What's inside
GA4, BigQuery, Google Tag Manager, Looker Studio
GA4 analysis brief, conversion drop diagnosis, GA4 BigQuery query plan
measurement plan, funnel analysis, hypothesis-driven diagnosis
/diagnose-conversion-drop
/plan-ga4-analysis
/reconcile-ga4-ui-bigquery
/capture-failure
/suggest-bundle-improvement
/prepare-improvement-pr
ga4-analysis-quality-check
Bundle map
Limitations and safety notes
Limitations
- Initial benchmark draft; not a complete GA4 schema or SQL cookbook.
- Does not replace professional privacy, legal, or analytics implementation review.
Safety notes
- Confirm before modifying live GA4, GTM, dashboard, audience, conversion, or BigQuery resources.
- Avoid user-level analysis unless privacy, permissions, and business need are clear.