Add benchmark results for deepseek/deepseek-r1-distill-llama-70b by github-actions[bot] · Pull Request #571 · tinybirdco/llm-benchmark

github-actions · 2026-06-08T10:19:49Z

This PR adds benchmark results for the deepseek/deepseek-r1-distill-llama-70b model.

Results have been pushed to Tinybird with validated=0 (pending review).
Merging this PR will validate the results, making them visible on the production dashboard.

This PR was automatically generated by the benchmark workflow.

Note: If you don't want to merge this PR, close it and the model will be added to the failed list.

/cc pei-tinybird,gnzjgo

vercel · 2026-06-08T10:19:52Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
llm-benchmark	Ready	Preview, Comment	Jun 8, 2026 10:20am

github-actions · 2026-06-08T10:19:55Z

Automated Benchmark Review

Model: deepseek/deepseek-r1-distill-llama-70b
Review model: openai/gpt-5.4-nano
Success rate: 66.0%

Quality summary: Out of 50 runs, 33 succeeded for a 66% success rate, with 48% first-attempt success, indicating a moderate level of reliability. Latency is reasonable (avg 23.64), and successful executions complete quickly (avg execution time 0.4920), suggesting performance is efficient when it works.
Concerns/anomalies: The main concern is the relatively high failure count (17/50 = 34% errors), and the gap between firstAttemptRate (48%) and successRate (66%) suggests many recoveries require reruns or retries. Without error breakdowns, it’s unclear whether failures are due to model correctness, prompt/tooling issues, or runtime constraints.
Recommendation: REVIEW
Recommendation: REVIEW

This review was automatically generated. Set AUTO_MERGE=false in repo variables to disable auto-merge.

feat: add benchmark results for deepseek/deepseek-r1-distill-llama-70b

5cb3a4d

github-actions Bot assigned pei-tinybird and gnzjgo Jun 8, 2026

vercel Bot deployed to Preview June 8, 2026 10:20 View deployment

pei-tinybird merged commit 54ef9a7 into main Jun 8, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmark results for deepseek/deepseek-r1-distill-llama-70b#571

Add benchmark results for deepseek/deepseek-r1-distill-llama-70b#571
pei-tinybird merged 1 commit into
mainfrom
benchmark/deep-eek-deep-eek-r1-di-till-llama-70b-27129781371

github-actions Bot commented Jun 8, 2026

Uh oh!

vercel Bot commented Jun 8, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

github-actions Bot commented Jun 8, 2026

Uh oh!

vercel Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 8, 2026

Automated Benchmark Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vercel Bot commented Jun 8, 2026 •

edited

Loading