Methodology — Sentient Weekly

We publish our methodology because trust in any reference work begins with knowing how its judgments are made. The two sections below cover how editorial decisions are reached and how rankings are computed.

Editorial standards
Rankings methodology

Editorial standards

Pick criteria

News, Best Practices, Op-Eds, and Research entries are selected on three filters: durability (will this still matter in a month?), signal density (does it tell the reader something they couldn’t easily get elsewhere?), and breadth (does it inform individuals, builders, or leaders, ideally more than one). Items that score on only novelty or vendor announcement are screened out.

AI-assistance disclosure

AI tools assist in summarization, draft generation, and rankings tabulation. Every published piece is reviewed by a human editor before it ships, and the AI Op-Ed format is explicit by design — the model outputs are presented as the artifact rather than rewritten in a human voice. Where AI changes a workflow in a way readers should know about, we say so in line.

Conflicts of interest

Sentient Weekly takes no paid placements at launch. No company on the site has paid for coverage, ranking position, or inclusion in any guide. If that ever changes, paid material will be labeled clearly and segregated from editorial.

Corrections

We correct errors openly. When a piece changes after publication, the change is noted at the bottom of the piece with the date and the nature of the correction. Substantive errors trigger an entry in the weekly issue’s corrections note.

Tips and feedback

If you spot something we got wrong or want to surface a story we missed, send it through the contact form. We read everything.

Syndication

The full feed of issues and op-eds is available via RSS for readers who prefer feed readers to email.

Rankings methodology

Data sources

Rankings draw primarily from Artificial Analysis, which publishes daily-refreshed intelligence indices, cost, speed, and latency across large language models. Other benchmarks (proprietary, vendor-published, or our own evals) are entered manually or imported via CSV by editors. Each row on the rankings page links back to its source.

Scoring approach

Within each capability category, individual benchmark scores are normalized and combined into a composite rank. Weighting reflects the benchmark’s coverage and how well it correlates with real-world tasks in that category — pure synthetic tests are weighted lower than evaluations grounded in user-facing work.

Update cadence

Rankings refresh as new data arrives. Major model releases trigger an immediate update; routine benchmark publications fold in on the weekly cycle. Each ranking surface shows when it was last refreshed.

Eval criteria per sub-score

Within the LLM category, sub-scores — reasoning, coding, math, creative writing, instruction following, and multimodal — each weight a different mix of correctness, latency, cost, and consistency. The category page documents which inputs feed its composite score so readers can decide whether the weighting matches their own use case.

Current limitations

Rankings are necessarily incomplete. Closed models that don’t publish results can only be partially scored; emerging categories (agentic workflows, long-context reasoning) lack stable benchmarks and are tracked qualitatively until consensus tests exist. We surface these gaps on the relevant pages rather than papering over them.