We publish our methodology because trust in any reference work begins with knowing how its judgments are made. The two sections below cover how editorial decisions are reached and how rankings are computed.
Editorial standards
Pick criteria
News, Best Practices, Op-Eds, and Research entries are selected on three filters: durability (will this still matter in a month?), signal density (does it tell the reader something they couldn’t easily get elsewhere?), and breadth (does it inform individuals, builders, or leaders, ideally more than one). Items that score on only novelty or vendor announcement are screened out.
AI-assistance disclosure
AI tools assist in summarization, draft generation, and rankings tabulation. Every published piece is reviewed by a human editor before it ships, and the AI Op-Ed format is explicit by design — the model outputs are presented as the artifact rather than rewritten in a human voice. Where AI changes a workflow in a way readers should know about, we say so in line.
Conflicts of interest
Sentient Weekly takes no paid placements at launch. No company on the site has paid for coverage, ranking position, or inclusion in any guide. If that ever changes, paid material will be labeled clearly and segregated from editorial.
Corrections
We correct errors openly. When a piece changes after publication, the change is noted at the bottom of the piece with the date and the nature of the correction. Substantive errors trigger an entry in the weekly issue’s corrections note.
Tips and feedback
If you spot something we got wrong or want to surface a story we missed, send it through the contact form. We read everything.
Syndication
The full feed of issues and op-eds is available via RSS for readers who prefer feed readers to email.
Rankings methodology
Data sources
Rankings draw primarily from Artificial Analysis, which publishes daily-refreshed intelligence indices, cost, speed, and latency across large language models. Other benchmarks (proprietary, vendor-published, or our own evals) are entered manually or imported via CSV by editors. Each row on the rankings page links back to its source.
Scoring approach
Within each capability category, individual benchmark scores are normalized and combined into a composite rank. Weighting reflects the benchmark’s coverage and how well it correlates with real-world tasks in that category — pure synthetic tests are weighted lower than evaluations grounded in user-facing work.
Update cadence
Rankings refresh as new data arrives. Major model releases trigger an immediate update; routine benchmark publications fold in on the weekly cycle. Each ranking surface shows when it was last refreshed.
Eval criteria per sub-score
Within the LLM category, sub-scores — reasoning, coding, math, creative writing, instruction following, and multimodal — each weight a different mix of correctness, latency, cost, and consistency. The category page documents which inputs feed its composite score so readers can decide whether the weighting matches their own use case.
Current limitations
Rankings are necessarily incomplete. Closed models that don’t publish results can only be partially scored; emerging categories (agentic workflows, long-context reasoning) lack stable benchmarks and are tracked qualitatively until consensus tests exist. We surface these gaps on the relevant pages rather than papering over them.