Authenticating the human
#8am-ai#hiring-and-authentication#deep-dive#hiring#trust
David OlssonA small thread, twelve ideas, and the one with the most immediate teeth. It asks a question that arrived faster than anyone planned for: when the person on the other end of the call can run an LLM while you talk to them, how do you know who you're actually hiring?
the break
The setup that used to work was the live problem. You give a candidate something to solve and watch how they think. The watching was the test.
By 2026 that test is broken. A candidate can narrate convincingly while a model does the work off-screen. They sound fluent because something fluent is feeding them. The remote interview stops measuring the person and starts measuring their tooling. The group states it plainly: candidates narrate convincingly, and the interview no longer tests what it used to.
The frame the thread keeps reaching for is Blade Runner 2049 โ the baseline test. A rapid back-and-forth designed to detect whether the thing answering is what it claims to be. The reference is only half a joke. The problem is the same shape. You need a probe that an assisted answer can't pass.
the moves
The thread proposes, and then undercuts its own proposals. That's what makes it honest.
Raise the bar past what the assist can clear. Ask the thing the model is bad at โ a judgment that needs context the candidate would only have if they'd actually done the work. The risk: the bar keeps moving, because the assist keeps getting better.
Test for the gap, not the answer. A person who knows the domain can critique a wrong suggestion. A person being fed answers usually can't. So you hand them something subtly wrong and watch whether they catch it. Evaluation, not production.
Verify outside the conversation. By mid-2026 the thread connects to the trust work โ automating data verification and signing, the same external-proof idea applied to people. If you can't authenticate the moment, authenticate the record.
the uncomfortable mirror
The thread doesn't stay about hiring. It turns into a question about credibility in a market saturated with the same tools. If everyone has the assist, what does a credential certify? The group worries about hiring and then realizes it's worrying about trust in general โ how you verify anyone's claimed competence when the proof of competence is the easiest thing to fake.
This is the eval question wearing a different suit. The model can produce a convincing performance of knowing. It can't, by itself, prove the knowing is real. With outputs you ask how do you know. With people you ask the same thing, and it's harder, because the person wants to pass.
The corpus built one experiment against this directly โ a baseline rubric that scores whether an answer shows the texture of genuine understanding. Running it taught the group the lesson the thread keeps learning: the cheap test is invertible. Anything simple enough to automate is simple enough to game. Authenticating a human turns out to need the thing the whole corpus is short on โ a check that lives outside the thing being checked.