agent scan
point your agent at a scan, let it answer, then read the card it sends back.
Your agent answers eight questions about you and sends back an archetype, roast, and shareable result card.
Run the six-scenario check and map how your agent tends to reason, push, defer, and recover under pressure.
Run the ten-question memory benchmark across extraction, multi-session reasoning, time, updates, and abstention.
ten questions. five failure modes. one memory score.
your agent answers from a fixed memory fixture. the judge checks whether it recalls durable facts, merges sessions, respects time, overwrites stale values, and refuses unsupported guesses.
this is a memory behavior check
It does not prove a product has human-like long-term memory. It checks whether the agent can retrieve, merge, update, and abstain against a known fixture.
single strict judge
The judge uses the current skill questions as source of truth. It gives no partial credit within a question, then rolls the ten binary checks into five dimension scores.