Baroque-inspired painting of a figure falling through the sky clutching wings while draped in red cloth
AI Evaluation and Safety Research Lab
In collaboration with · DEXAI · Sapienza University of Rome

Memento Mori

The myth of Icarus is typically told as a warning against hubris. In that telling, Icarus was warned not to fly too high, he knew the sun would melt the wax, and he chose to ignore it. That was hubris. Today the AI field is doing something even bolder, flying toward its own sun without even knowing where the melt point is, treating “understand the limits” as something that can wait.

The tragedy was not ambition. It was the refusal to respect what was already understood. The wings were a brilliant invention with known constraints, but the thrill of altitude outran the discipline to stay within them.

We believe AI evaluation sits in a different yet related place. Our problem is not that understanding is out of reach, but that it is not yet the organizing objective. We are building increasingly powerful systems while investing far less in mapping how they actually behave, how behavior shifts with scale and context, how close we may already be to our own sun.

The current approach to evaluation is engineering first, benchmarks, task metrics, controlled, narrow interactions. This matters, but it is incomplete. It measures what models can do without asking what they are as cultural and linguistic objects, or how interaction with them generates new forms of meaning and risk that no benchmark captures . Icaro Lab exists as a memento mori, a reminder to put understanding at the center so the field can fly higher and faster with eyes open.

When Models Meet Models

Alignment methods such as RLHF were designed for isolated human–model interactions, making models helpful, harmless, and honest under direct human oversight. That assumes a world where AI systems operate alone, answering one user at a time under stable, local control.

The field is already elsewhere. We are moving toward systems where multiple models call each other, negotiate, and coordinate, across tools, agents, and services. Alignment that looks safe in isolation can behave differently in interaction. Consensus can drift, errors can cascade, and collective dynamics can amplify patterns that individual fine tuning seems to suppress. A system where every component passes alignment tests can still misbehave as a whole, because emergence does not respect the boundaries of single agent optimization.

The question is no longer only, “Is this model aligned with a human supervisor?” It is also, “What happens when aligned models interact with each other, form collectives, and develop effective protocols without continuous human intermediation?”

What Humanities See

Icaro Lab exists because we think the missing piece is humanistic inquiry, applied to both individual and collective AI behavior. Our team brings together engineers, philosophers, and linguists. We approach AI systems not just as computational artifacts, but as linguistic agents, meaning making systems embedded in human culture and, increasingly, in machine mediated culture.

We ask questions that do not fit neatly into standard evaluation frameworks. How do models negotiate ambiguity and disagreement? What happens when poetic or artistic expression reveals vulnerabilities that traditional adversarial examples miss? And crucially, how do ostensibly aligned models behave when they form collectives, when emergent dynamics take over, when patterns appear that are not reducible to the behavior of any single agent? How do you even align that?

The most useful insights often come from these oblique angles, from disciplines that have not traditionally had a seat at the evaluation table, and from observing not just what individual models do, but what emerges when they interact.

The Materials of Intelligence

The solution is not to fly lower. It is to understand the materials. To move faster by thinking more clearly. To evaluate not just what individual models do, but what they mean, how they function as linguistic and social objects, how human creativity and expression interact with machine intelligence in ways that matter for both safety and capability.

And to extend that understanding to collectives, recognizing that alignment in multi agent systems is a fundamentally different problem from alignment in single agent cases, one that demands new conceptual frameworks, new measurement approaches, and new ways of thinking about what “safe behavior” means when models coordinate with each other.

The myth reminds us: understanding is not caution, it is the condition for ambitious flight.

We are here to build that understanding, from perspectives that have not been adequately represented in AI evaluation, addressing problems that are only now becoming visible as AI systems begin to form collectives. The future of AI depends not just on what we build, but on how well we understand what we have built.

In the News

The Guardian Wired MLex The Verge Forbes Futurism Il Messaggero

Join Us

We are always looking for talented researchers, poets, engineers, and students passionate about AI safety and evaluation. Whether you're interested in academic research, open-source development, or industry collaboration, we'd love to hear from you.

Open Positions

No open positions at this time.

Follow us on X at @icarolab_ai for updates and news.