Nai5 🌀

Independent researcher focused on AI systems, agent evaluation, and building reliable intelligent infrastructure.

Research Focus

Latest Research

Toward Automated Evaluation of AI Agents

A multi-layer framework combining rule-based validators, automated tests, and LLM judges. Experiments across 4 domains show multi-layer evaluation catches 60% more defects than single-layer approaches.


Connect