In this article we explore AONs use of AI to enhance behavioural interviewing. Plus how to Turn Interviews into Valid Assessment Data Using AI.

To turn interviews into valid assessment data using AI, organisations need to structure questions around clearly defined job-relevant constructs, capture responses consistently, use AI to support transcription and evidence coding, and apply transparent scoring rules that improve reliability, comparability, and auditability. AI can improve interview quality, but only when it is grounded in psychometric design rather than generic automation.

Need a more defensible interview process?

Interviews remain one of the most widely used tools in recruitment and selection. They are also one of the most inconsistently designed. In many organisations, the interview still operates as a semi-structured conversation shaped by interviewer preference, memory, confidence, and intuition. That may feel practical, but from a psychometric standpoint it creates a serious weakness. Decisions are often being made on the basis of evidence that is incomplete, inconsistently captured, weakly standardised, and hard to compare fairly across candidates.

This is where AI has started to attract serious interest. Used properly, AI can help transform interviews from loosely managed conversations into more structured sources of assessment data. That matters because interviews are not just conversations. In hiring, they are measurement events. They are intended to generate evidence about capability, judgement, motivation, role fit, communication, problem solving, and behavioural tendencies relevant to job performance.

The opportunity is substantial. A well-designed AI-supported interview process can improve evidence capture, reduce interviewer inconsistency, strengthen scoring discipline, and create far more defensible hiring decisions. The risk, however, is equally significant. If AI is layered onto a poorly designed interview process, it simply automates weakness. It may give the illusion of rigour without the reality of validity.

This article explains how to use AI to turn interviews into valid assessment data, what usually goes wrong, and what good design looks like when psychometric principles are applied properly.

What does it mean to turn interviews into valid assessment data?

At a practical level, turning interviews into valid assessment data means converting candidate responses into structured evidence that can be evaluated consistently against defined constructs relevant to job success.

  • Questions are mapped to specific capabilities or behavioural indicators
  • Responses are captured fully rather than relying on selective notes
  • Evidence is coded against pre-defined criteria
  • Judgements are made using transparent scoring rules
  • Candidates are compared on like-for-like evidence as far as possible

That is a significant shift from how many interviews still operate. In an unstructured model, interviewers often ask inconsistent questions, probe different issues with different candidates, remember some responses more vividly than others, and then convert all of that into a global impression. That is not strong measurement. It is human impression management disguised as assessment.

By contrast, a more defensible interview process treats the interview as a designed evidence-generation system. AI can support that shift, but it cannot substitute for design.

Why most interviews still fail as assessment tools

Most organisations already know that structured interviews tend to outperform unstructured interviews. The research base on this is longstanding. Yet in practice, many interviews remain only partially structured. A few competency questions may exist, but scoring is loose, evidence capture is thin, and interviewer judgements still vary substantially.

There are four core problems.

1. Weak standardisation

Candidates are not always asked the same questions in the same way or given equivalent opportunities to demonstrate evidence. This makes comparisons less fair and less meaningful.

2. Low reliability

Different interviewers may interpret the same answer differently. Some are stricter, some are more lenient, some are easily impressed by style, and some overweight one strong or weak moment. This reduces consistency.

3. Poor evidence capture

Traditional note-taking is selective and incomplete. Interviewers do not record everything. They record what stood out to them. That means later scoring may be based as much on what was noticed as on what was actually said.

4. Construct drift

Even when interviews are supposedly competency-based, the actual conversation may drift into areas that are only loosely connected to job performance. Interviewers may reward confidence, charisma, fluency, or similarity rather than the intended construct.

These problems matter because interview scores are often treated as serious decision inputs. If the underlying evidence is weak, then the downstream decision is weak too.

Why AI is attracting attention in interview design

AI is attractive in this area because it appears to solve several practical problems at once. It can capture more information than manual note-taking. It can identify patterns across responses. It can prompt more consistent evaluation. It can reduce administrative burden. It can create audit trails. It can also support interviewer discipline by keeping the process anchored to defined evidence categories.

But there is an important distinction here. AI can improve the processing of interview evidence. It does not automatically improve the quality of the interview itself. If the constructs are vague, the questions are poor, and the scoring model is weak, AI will simply process poor-quality input more efficiently.

That is why the right question is not, “Can AI analyse interviews?” The right question is, “How should interviews be designed so that AI helps produce more valid evidence?”

The psychometric principle most vendors skip

The strongest interview systems begin with construct definition, not technology. Before building prompts, selecting AI features, or designing dashboards, you need clarity on what the interview is meant to measure.

That means asking:

  • Which role-relevant capabilities should the interview assess?
  • What does strong evidence for each capability actually look like?
  • Which behavioural indicators distinguish stronger from weaker responses?
  • Which constructs should not be inferred from interview responses alone?

This is basic psychometric discipline. It is also where many AI hiring tools become weakest. They often jump too quickly from conversation to classification without doing enough construct work. The result is a system that sounds sophisticated but lacks measurement integrity.

For organisations thinking more broadly about AI-related capability models, this construct-first logic also connects with the wider work on AI skills frameworks and capability mapping. The principle is the same: define what matters before attempting to measure it.

How to structure an AI-supported interview properly

A robust AI-supported interview process normally has five layers.

Layer 1: Job analysis and construct definition

The interview must be grounded in the actual role. That means clarifying the most relevant constructs, such as judgement, stakeholder management, analytical reasoning, problem solving, communication under pressure, learning agility, ethical decision making, or leadership behaviour.

If this step is weak, everything else is weaker. Generic interviews tend to produce generic evidence.

Layer 2: Prompt design

Questions should be designed to elicit evidence relevant to the construct. That may include behavioural questions, situational questions, critical incident prompts, or applied judgement scenarios. The question design should make it easier to observe differences in response quality.

Layer 3: Evidence capture

This is where AI can provide immediate value. Rather than relying on handwritten notes or memory, interviews can be transcribed fully. Responses can be segmented, tagged, and stored systematically.

Layer 4: Evidence coding

AI can support the extraction of behavioural indicators, reasoning patterns, decision steps, examples of stakeholder handling, signs of reflection, evidence of structure, and other coded features. However, those features must be mapped to a human-designed rubric.

Layer 5: Scoring and decision support

The final layer is the scoring model. This should specify how evidence maps onto rating bands, what counts as strong or weak evidence, when confidence in a score is low, and how interview evidence should combine with other data such as tests, work samples, or application review.

For organisations already developing broader AI-readiness or workforce capability approaches, this kind of evidence architecture sits naturally alongside work such as an AI defensibility audit and psychometric design review.

How AI improves reliability in interviews

Reliability concerns consistency. In interviews, one major challenge is that different interviewers notice different things, record different amounts, and apply rating scales differently. AI can help reduce this variability in several ways.

  • It captures complete rather than selective response content
  • It supports more standardised note generation
  • It encourages more consistent evidence tagging
  • It reduces dependence on memory after the interview
  • It can provide structured summaries aligned to scoring dimensions

That does not eliminate human rating differences, but it improves the consistency of the evidence base. This matters especially when panels need to review candidate evidence after the fact or when hiring managers want a clearer audit trail.

How AI can strengthen validity if used properly

Validity concerns whether the interview is actually measuring what it is intended to measure. AI can support validity in several important ways.

First, it can reduce the loss of evidence that occurs with traditional note-taking. Second, it can keep evaluations more closely tied to the intended rubric. Third, it can make it easier to compare multiple responses against consistent criteria. Fourth, it can expose when interviewers are relying on global impressions unsupported by actual evidence.

However, validity is only strengthened if the constructs and rubrics are sound. Generic AI systems often overclaim here. They may infer broad traits or capabilities from thin evidence. That is risky. Strong interview design remains anchored in role relevance, observable indicators, and carefully bounded inference.

Common mistakes in AI interview intelligence

Several mistakes appear repeatedly in this space.

Using generic AI summaries as if they were assessment judgements

A summary is not a score. A fluent paragraph can look persuasive while being psychometrically weak.

Over-inferring from language style

Communication polish is not the same as judgement quality, leadership potential, or problem-solving skill. Style can contaminate construct measurement.

Assuming transcription equals rigour

Full transcripts are useful, but they are not enough. Evidence still needs to be interpreted through a structured rubric.

Ignoring uncertainty

Some responses genuinely do not provide enough evidence for a confident rating. Good systems should flag uncertainty rather than forcing artificial precision.

Failing to integrate with other selection data

Interview evidence is only one part of the selection picture. In many roles, stronger decisions come from combining interviews with tests, work samples, or scenario-based exercises. This is particularly relevant if organisations are also exploring broader assessment design principles through related RWA work on AI-enhanced work sample design.

What good looks like in practice

A strong AI-supported interview system is not fully automated. It is well designed, well bounded, and transparent.

  • Constructs are clearly defined
  • Questions are deliberately designed to elicit job-relevant evidence
  • Interviews are transcribed consistently
  • AI supports coding and summarisation against a rubric
  • Human reviewers remain accountable for final judgement
  • Scoring logic is transparent and reviewable
  • Decisions are auditable

That combination is far more powerful than either unstructured human interviewing or black-box automation. It allows AI to improve consistency without pretending that selection decisions can be handed over wholesale to a machine.

Why this matters commercially

For many organisations, the immediate attraction of interview intelligence tools is efficiency. Faster note capture, easier summaries, less administrative burden. Those benefits are real. But the more important commercial issue is decision quality.

Poor interviews create hidden costs:

  • weaker hires
  • greater inconsistency
  • reputational risk
  • fairness concerns
  • difficulty defending decisions
  • wasted interviewer time

Well-designed AI-supported interviews can reduce those risks while strengthening confidence in the decision process. That makes them strategically more important than simple workflow tools.

CRO: Need a more defensible interview process?

Why AI-Enabled Behavioural Assessment Is More Powerful Than Traditional Interview Scoring

Most organisations still rely on managers taking notes during interviews and making broad judgements such as “good communicator”, “strong stakeholder skills” or “not quite strategic enough”. The problem is that these impressions are often inconsistent, subjective and difficult to compare across candidates.

AI-enabled behavioural assessment changes that.

One of the strongest current examples is Aon’s Certified Behavioral Event Interviewing Expert approach. Rather than treating the interview as an informal conversation, it trains assessors to use structured behavioural-event interviewing and then apply more consistent evaluation criteria to the evidence collected.

In practice, this means interviewers are trained to:

  • probe for specific past behaviour rather than general claims
  • collect evidence against clearly defined competencies
  • distinguish between high-quality and weak examples
  • evaluate leadership, judgement and interpersonal behaviour more consistently
  • turn interview responses into usable assessment data

This is exactly the direction that interview assessment needs to move in.

Why Behavioural Event Interviewing Produces Better Data

Traditional interviews often generate vague information:

  • “Tell me about yourself.”
  • “What are your strengths?”
  • “How would you deal with conflict?”

Behavioural Event Interviewing is different. It focuses on detailed examples of what the candidate actually did in real situations.

For example, instead of asking:

“Are you good at influencing stakeholders?”

The interviewer might ask:

“Tell me about a time when you had to influence a resistant stakeholder. What was the situation? What did you do? What happened?”

This produces much richer evidence because it reveals:

  • how the candidate thinks
  • how they respond under pressure
  • whether they show judgement, influence and adaptability
  • whether they can describe a credible behavioural example

Aon’s approach emphasises designing, conducting and evaluating interviews in a more structured way so that managers make better talent decisions across the employee lifecycle. 

Where AI Adds Further Value

The most interesting development is what happens when this behavioural interview approach is combined with AI.

AI can help interviewers:

  • transcribe and structure responses automatically
  • identify recurring behavioural themes
  • highlight missing evidence or weak probing
  • compare candidates more consistently against the same framework
  • reduce interviewer bias and over-reliance on “gut feel”

Used properly, AI does not replace the interviewer. It strengthens the quality and consistency of the data.

For example, an AI-enabled behavioural interview process could identify that one candidate repeatedly demonstrates strategic judgement and stakeholder influence, while another gives more generic answers with little evidence of real behavioural depth.

That is far more useful than a simple 1–5 interview rating.

Why This Matters in an AI-Enabled Workplace

As work becomes more AI-enabled, organisations increasingly need people who can:

  • exercise sound judgement
  • evaluate ambiguous information
  • work effectively with AI-generated output
  • balance speed with good decision-making

Traditional competency interviews are often too weak to assess those qualities properly.

Structured behavioural-event interviewing, especially when supported by AI, is much better suited to identifying how people really think and behave in complex situations.

Aon increasingly emphasises that leadership and talent assessment must move beyond simple experience or credentials and towards measuring judgement, adaptability and behaviour in real-world situations. Scenario-based and AI-augmented assessments are becoming a much larger part of leadership evaluation. 

Why This Fits the Future of Assessment

The strongest assessment processes no longer rely on interviews alone.

Instead, they combine:

  • structured behavioural interviews
  • AI-enabled analysis of responses
  • psychometric data
  • judgement or simulation exercises

This creates something much more defensible and predictive than a traditional interview.

For organisations that want to make better hiring or leadership decisions, Aon’s behavioural-event interviewing approach is a useful example of how interviews can become a genuine source of valid assessment data rather than just a subjective conversation.

Can AI make interviews more valid?

AI can make interviews more valid when it supports better evidence capture, structured coding, and more consistent scoring against defined job-relevant constructs. It does not create validity automatically. Good interview design still depends on sound construct definition and role relevance.

What is the main psychometric weakness of most interviews?

The main weakness is inconsistency. Many interviews suffer from poor standardisation, variable scoring, selective note-taking, and weak comparability across candidates. This reduces both reliability and defensibility.

Should AI replace interviewer judgement?

No. AI should support evidence capture and structured evaluation, not replace accountable human decision-making. Final hiring judgement should remain human-led, particularly in high-stakes contexts.

What type of AI is most useful in interviews?

The most useful forms of AI in interviews are tools that support transcription, evidence tagging, structured summarisation, rubric-linked coding, and audit trails. Generic summarisation tools on their own are not enough.

How do you score interview responses more fairly?

Fairer scoring comes from using clearly defined constructs, structured prompts, explicit behavioural indicators, anchored scoring rubrics, and consistent evidence capture. AI can support these steps but cannot substitute for them.

Can AI-supported interviews reduce bias?

They can reduce some forms of inconsistency and evidence loss, but they can also introduce new risks if the scoring logic or inference model is weak. Bias reduction depends on careful design, review, and governance.

How should interviews combine with tests or work samples?

Interviews usually work best as one component within a broader assessment strategy. In many roles, interviews should be integrated with tests, work samples, or structured scenarios so decisions are based on multiple relevant evidence sources.