Eval costs blog by evijit · Pull Request #24 · evaleval/evaleval.github.io

evijit · 2026-04-25T14:29:59Z

Added a PR for a blog post detailing how evals have gotten extremely expensive, with a direct callout to Every Eval Ever.

Field-guide essay by Avijit Ghosh on evaluation costs across static, agentic, and training-in-the-loop benchmarks. Preserves the responsive CSS bar charts from the source HTML and rethemes them with the evaleval design tokens (Inter/IBM Plex Mono, --fg/--accent/--border) so dark mode and the prose layout work without conflicts.

- Remove the inline roadmap nav (sidebar TOC already covers it). - Retheme the highlight color from brown to var(--accent) so the bars match the evaleval blue identity in both light and dark modes. - Add tex2jax_ignore on the article wrapper so MathJax stops parsing dollar-sign pairs in prose (the summary was rendering as italic math).

- Cap figures, table, and table-note at 760px so they share the same centered column as the body paragraphs (previously they spanned the full prose container, sticking out to the left of the text margin). - Add mathjax_ignore alongside tex2jax_ignore on the wrapper. MathJax 3 ignores 'mathjax_ignore' by default; tex2jax_ignore was the v2 convention, so without this some prose paragraphs (e.g. the Claude Opus pricing line) had their dollar pairs rendered as math.

Mobile breakpoint was painting the entire track with var(--bg-subtle), creating a visible grey strip behind every bar. Match the desktop treatment instead: transparent background with subtle vertical tick gradient (every third for a coarser mobile rhythm).

Give the BLUF a subtle bg-subtle fill, accent-colored left rule, and soft border so it reads as a pull-quote at the top of the article instead of a plain bordered span.

evijit · 2026-04-25T14:53:29Z

Suggested coauthors, Leshem, Yifan, Georgia <more?>

Added authors Yifan Mai, Georgia Channing, and Leshem Choshen to the article. Revised summary and various paragraphs for clarity and detail.

- restructure sources block as numbered bibliography (full titles, year, authors) - correct RE-Bench attribution (Wijk et al., not METR) - integrate reliability-multiplier callout into surrounding prose - remove filler intros and 'X not Y' staccato patterns - rephrase 'floors not ceilings' line

Cross-checked HELM cost claims against Section 6 model table (p. 43): replaced loose "$10K or 4,000+ GPU-hours per model" with actual range, corrected aggregate from "high six figures" to ~$100K, and updated the cost-summary table entry. Fixed Pythia "16 model sizes" → "16 models spanning 8 sizes". Relabeled ResearchGym row to "full pass (3 seeds)" so the dollars match the GPU-hours. Chart fixes: axis labels now align with bar positions (flex space-between instead of grid with centered labels). Figure 2 axis converted to uniform decades ($100/$1k/$10k/$100k); all bars recomputed and small ~1% errors corrected. Figure 3 caption clarifies that bars show maximum compression, not ranges. Vertical gridlines are now continuous across all rows (chart-body wrapper with absolute-positioned ::before instead of per-track backgrounds). Each figure sets its own --grid-interval. Mobile keeps the per-track gradient. Removed three stray image-markdown references accidentally pasted into "consequences" in the closing section.

evijit added 9 commits April 25, 2026 09:52

style summary as highlighted callout box

17b77cc

Give the BLUF a subtle bg-subtle fill, accent-colored left rule, and soft border so it reads as a pull-quote at the top of the article instead of a plain bordered span.

add hero banner image to eval-costs post

13208cd

swap eval-costs banner: cropped white-bg version

7f8d1a2

add 'stop paying twice' section pointing to every eval ever

39d4262

polish every-eval-ever callout wording

767d4a4

evijit self-assigned this Apr 25, 2026

evijit and others added 10 commits April 25, 2026 11:04

cite Perlitz et al. (2024) v5 properly for the eval-vs-pretraining quote

7ebc744

add bibtex citation block at end of post

15d74fc

move bibtex below sources and rename label

a4fa82c

Update authors and revise article content

f80206c

Added authors Yifan Mai, Georgia Channing, and Leshem Choshen to the article. Revised summary and various paragraphs for clarity and detail.

complete author list in bibtex citation

2f708b7

smooth link phrasing in summary callout

f8c7eb5

fill in missing citations in sources block

9cb682c

fold Exgentic numbers into summary, add Bandel et al. citation

6c3d2c2

evijit force-pushed the post/eval-costs-blog branch from e4206bf to 6c3d2c2 Compare April 28, 2026 17:39

push publication date to Apr 29 2026

cdd8df2

evijit merged commit 4b722ca into master Apr 29, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval costs blog#24

Eval costs blog#24
evijit merged 20 commits intomasterfrom
post/eval-costs-blog

evijit commented Apr 25, 2026

Uh oh!

evijit commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

evijit commented Apr 25, 2026

Uh oh!

evijit commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants