AI BEAVERS
AI-Native Talent Screening

Evidence based hiring checklist

10 min read

transparent filing tray with code, campaign brief, and project note assembled as hiring evidence

Table of contents

Most teams say they want the best person. Then they shortlist on CV polish, reward interview charisma, and make the final call on gut feel. The best way to approach evidence based hiring is to treat it as a decision system, not a better conversation. Define the proof you need for the role before interviews start, score candidates against the same evidence, and use work samples, portfolios, and artefacts to reduce impression-led decisions.

In our work, the same pattern shows up again and again: teams think they are hiring for capability, but they are really hiring for confidence signals. The shift is already visible. BCG and Lightcast’s 2023 analysis of 22 million job ads across five countries found a clear move toward skills-based hiring, while job seekers said they want employers to assess skills and experience rather than credentials alone (BCG). In this checklist, you’ll get a practical way to set evidence standards, review artefacts, run structured interviews, and use AI-assisted portfolio screening without turning the process into theatre - useful whether you’re hiring in Hamburg, London, or New York.

TL;DR

  • Review what is evidence based hiring? before you widen rollout.
  • Review which hiring method should you use for your role? before you widen rollout.
  • Review how does artifact based candidate review work? before you widen rollout.

What is evidence based hiring?

Evidence-based hiring is not “using a structured interview” or “adding a test.” It is deciding, before you meet candidates, what proof will count and how it will be scored. Deloitte’s work on evidence-based HR makes the core point clearly: unaided judgment is unreliable, and better decisions come from explicit evidence, experimentation, and predictive methods rather than intuition alone Deloitte Insights on evidence-based HR.

What counts as evidence? Anything that shows the candidate can do the work you actually need: a structured interview scored against defined criteria, a work sample, a portfolio, a GitHub repository, a case [[write](/ai-prompts-with-context-team-outputs/)](/how-to-write-an-ai-use-case-brief-that-gets-budget/)-up, a sales plan, a prompt chain, or a decision memo. The distinction is not interview versus no interview; it is whether the interview is tied to constructs that can be assessed consistently. A 2025 meta-analysis in the International Journal of Selection and Assessment reviewed criterion-related validity across different interview constructs, which is a useful reminder that interviews are only as good as what they are designed to measure.

The practical move is to write the role as outcomes, not traits: what must this person ship, improve, diagnose, or decide in the first 90 days? Then define acceptable proof for each outcome before screening starts. Separate observed evidence, verified evidence, and self-reported claims in the rubric. Decide upfront which artefacts are strong enough to replace another round and which only justify a deeper conversation. That is what makes evidence based hiring a decision discipline rather than a collection of hiring tools (A Research-Backed [Training](/ai-workshop-for-real-work/) Method That Improves Hiring Outcomes).

Which hiring method should you use for your role?

Use the hiring method that matches the kind of evidence the role actually produces. For roles with tangible output, ask candidates to produce work you can inspect; for roles where the real signal is judgment under constraints, use a structured interview. If the role depends on AI-assisted execution, inspect how the person works with tools, not whether they can talk about them.

For roles with tangible output, work samples should usually carry the most weight. That fits the broader shift away from credential proxies: BCG’s analysis of 22 million job ads across five countries found a sustained move toward skills-based hiring rather than degree filters, and Deloitte argues that teams increasingly need to think in terms of skills and work, not static job titles (BCG, 2023, Deloitte on the skills-based organization).

Artifact review is the lighter-weight version when candidates already have relevant proof: GitHub repos, campaign decks, customer emails, dashboards, process docs, product specs. Don’t just score polish; score decision quality, constraints, trade-offs, and whether the candidate can explain what they owned versus what the team supplied.

Use structured interviews for what artefacts miss: prioritisation, stakeholder handling, conflict, ambiguity, and judgment over time. For AI-native roles, add an AI portfolio: prompts, evaluation notes, workflow screenshots, model-choice reasoning, or before/after outputs. Then use a structured interview to probe the choices behind the artefacts, not to replace them.

Comparison matrix

The practical rule is simple: choose the evidence type that most closely matches the work the person will actually do. For roles with visible output, work sample AI screening usually gives the cleanest read on task performance because you can watch how a candidate scopes, uses tools, and judges the result under realistic constraints. Artifact-based review is strong when candidates already have code, decks, briefs, dashboards, or process docs you can inspect; it verifies shipped quality, but it breaks down for earlier-career candidates with little public work. CV review belongs at the top of the funnel as an eligibility filter, not as evidence of future performance; relying on pedigree and proxies is exactly the hiring mistake Harvard Business Review argues against in “Your Approach to Hiring Is All Wrong”.

How does artifact based candidate review work?

Artifact-based review works by having the hiring team inspect real output, not just hear a candidate explain how they work. It is faster than conversation-led hiring at exposing weak judgment, because the gaps show up in the artefact itself: missing assumptions, poor trade-offs, and risks the candidate did not catch (IT Portfolio: Build Yours To Get Hired - ITU Online IT Training).

  1. Pick one artefact that matches the role’s real output. For a PM, that might be a product brief or prioritisation memo. For marketing, a campaign plan or post-mortem deck. For an AI engineer, a repo, eval write-up, prompt chain, or incident note.

  2. Score four things, in order: problem framing, execution quality, judgment, and trade-off explanation. Problem framing asks whether they defined the task correctly. Execution checks whether the work is technically or functionally sound. Judgment asks what they chose not to do, what risks they noticed, and whether the output would survive scrutiny from the team that must use it. Trade-off explanation is crucial for senior hires.

  3. Interrogate the artefact, not the person’s narrative about it. Ask what changed between draft one and final, where stakeholder conflict showed up, and what they would cut if time were halved. Use role-specific evidence checklists tied to actual deliverables, so scorecards converge and gut-feel drift drops because reviewers are debating the work, not the candidate’s charisma.

  4. Use the interview only to verify authorship and reasoning gaps. A portfolio or work sample should create the interview agenda, not sit beside it as decoration. If the artefact is strong and the explanation is weak, probe authorship. If the explanation is strong and the artefact is thin, believe the artefact.

How do you use AI candidate portfolios and work sample AI screening?

Use AI candidate portfolios and work-sample screening to see how someone actually uses AI, not just how well they talk about it. A strong portfolio shows the prompts, iterations, checks, and trade-offs behind the output, while a work sample shows whether they can do that under your constraints.

  1. Ask for a workflow trace, not a highlight reel.
    For any portfolio item, require the candidate to show the brief, key prompts, rejected outputs, edits, tool choices, and final decision note. For technical roles, that can include repo history, debugging notes, evals, or architecture trade-offs. For non-technical roles, ask for an AI-assisted brief, research summary, campaign draft, or process redesign with comments showing what the model got wrong and what the candidate fixed. GitHub’s own documentation shows how commit history and pull-request discussion expose authorship and reasoning better than a finished file alone GitHub Docs on comparing commits GitHub Docs on pull requests.

  2. Make the work sample mirror the actual job.
    If the role is AI-assisted market research, give a messy source pack and ask for a decision memo. If it is engineering, use a debugging or extension task in the real stack, not a toy prompt contest. Work samples are strongest when they reflect job tasks, and interviews remain better as a supplement than a substitute, according to the U.S. Office of Personnel Management guidance on work sample assessments.

  3. Score judgment explicitly.
    Your rubric should reward problem framing, source checking, error detection, escalation decisions, and whether the candidate knew when not to trust the model. NIST’s AI Risk Management Framework is useful here because it treats validity, reliability, and human oversight as operational requirements, not abstract principles.

  4. Use the interview only to test ownership of the work.
    Ask where the model helped, where it misled, what the candidate changed manually, and what they would do differently with more time. If they cannot reconstruct those decisions, you are probably looking at outsourced thinking, not AI-native capability.

Bottom line

Evidence-based hiring only works when you define the proof before interviews start and score every candidate against the same evidence, not when you add a nicer-looking interview script. Replace CV-led shortlists with work samples, portfolios, GitHub repos, case write-ups, and structured artefacts, then separate observed, verified, and self-reported claims so polished talk never counts as proof.

If your evidence-based hiring checklist still relies on interviews, CVs, and self-reported confidence, the hard part is separating people who can talk about AI from people who’ve actually built with it — the same problem shows up in roles that mention OpenAI API, LangChain, or Azure OpenAI but never prove hands-on work, like a candidate who says they “used LangChain” but can’t explain retrieval, chunking, or evals. You see the same pattern in teams using tools like ChatGPT Enterprise or Microsoft Copilot: access is there, but workflow change is shallow, and the only way to see it is to measure what people do, not what they say. We use the same interview-led approach to surface real capability, identify champions, and map the next intervention, instead of guessing from a CV or a self-rating form.

Your team has AI tools but [adoption](/quarterly-ai-adoption-board-update-executive-questions/) is shallow? We measure it and fix it. Book a diagnostic call -> calendar.app.Google or email [email protected]

FAQ

How do you score evidence based hiring candidates fairly?

Use a fixed rubric with 4-6 criteria and define what a 1, 3, and 5 looks like before interviews begin. Calibrate the panel on one sample candidate first, then compare scores and discuss only the evidence behind any gaps. If you want the process to hold up, require written notes for each score so you can audit why someone was advanced or rejected (6 talent assessment methods to use for recruiting in your company).

What should be in an evidence based hiring scorecard?

A useful scorecard separates job-relevant evidence from confidence signals such as polish, speed, or charisma. Include the source of evidence, the confidence level, and a short note on whether it was observed directly, verified by an artefact, or only self-reported. That distinction matters because a candidate can sound credible in interview while still failing to show usable output.

How do you validate a portfolio in hiring?

Ask for the original file, repo, or document history, not just screenshots or a PDF summary. Check timestamps, version history, commit activity, and whether the candidate can explain the trade-offs behind specific decisions without reading from the artefact. For AI-related work, ask which parts were generated, edited, or checked by the candidate so you can separate tool use from actual judgment. - 7 [mistakes](/real-work-hackathon-challenges-mistakes/) to avoid in hackathon follow through - How to build AI enabled workflows without adding busywork