Add benchmark results for meta-llama/llama-3.2-1b-instruct by github-actions[bot] · Pull Request #575 · tinybirdco/llm-benchmark

github-actions · 2026-06-10T10:41:28Z

This PR adds benchmark results for the meta-llama/llama-3.2-1b-instruct model.

Results have been pushed to Tinybird with validated=0 (pending review).
Merging this PR will validate the results, making them visible on the production dashboard.

This PR was automatically generated by the benchmark workflow.

Note: If you don't want to merge this PR, close it and the model will be added to the failed list.

/cc pei-tinybird,gnzjgo

vercel · 2026-06-10T10:41:30Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
llm-benchmark	Ready	Preview, Comment	Jun 10, 2026 10:41am

github-actions · 2026-06-10T10:41:34Z

Automated Benchmark Review

Model: meta-llama/llama-3.2-1b-instruct
Review model: openai/gpt-5.4-nano
Success rate: 52.0%

Quality summary: The model achieved a 52% success rate (26/50), with 50% first-attempt success, indicating moderate reliability. Latency is reasonable (avg 1.56), and execution time is low (avg 0.2748), but error frequency is high.
Concerns / anomalies: The main concern is the high error count (24/50)—nearly half the runs failed. Given the close match between success rate and first-attempt rate, retries don’t appear to materially improve outcomes, suggesting systematic failure modes rather than occasional flakiness.
Recommendation: REVIEW — the error rate is substantial enough to warrant human investigation (e.g., error categories, prompt sensitivity, or integration issues) before treating results as reliable.
Recommendation: REVIEW

This review was automatically generated. Set AUTO_MERGE=false in repo variables to disable auto-merge.

feat: add benchmark results for meta-llama/llama-3.2-1b-instruct

98fa74b

github-actions Bot assigned pei-tinybird and gnzjgo Jun 10, 2026

vercel Bot deployed to Preview June 10, 2026 10:41 View deployment

pei-tinybird merged commit 6955334 into main Jun 10, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmark results for meta-llama/llama-3.2-1b-instruct#575

Add benchmark results for meta-llama/llama-3.2-1b-instruct#575
pei-tinybird merged 1 commit into
mainfrom
benchmark/meta-llama-llama-3-2-1b-in-truct-27267114624

github-actions Bot commented Jun 10, 2026

Uh oh!

vercel Bot commented Jun 10, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

github-actions Bot commented Jun 10, 2026

Uh oh!

vercel Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 10, 2026

Automated Benchmark Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vercel Bot commented Jun 10, 2026 •

edited

Loading