📨 wschell@vrain.upv.es📜 Google Scholar🧑‍💻 GitHub🟢 ORCiD🗺️ Valencian Research Institute for Artificial Intelligence (VRAIN)

Supervised by: 🧙🏼‍♂️ José Hernández-Orallo and 🧙🏼 Fernando Martínez-Plumed


The main content of my PhD-research involves modelling AI evaluation as a prediction problem and what it would mean to maximise predictive power (and how we would do that). In general I am interested in AI evaluation & everything that is related: testing, auditing, metrics, environment & benchmark design, capability measurement, etc.

With regards to more specific applications and domains, I strongly prefer sequential decision problems, RL, planning & control and the likes. Specifically with relation to world-model learning, goal conditioning, multi-task systems.

Other concepts that spark my imagination include causality, embodied cognition, knowledge representation, grounding, AI safety, artificial life.

Most things really.


2023Your Prompt is My Command: On Assessing the Human-Centred Generality of Multimodal Models
Wout Schellaert, Fernando Martínez-Plumed, Karina Vold, John Burden, Pablo A. M. Casares, Bao Sheng Loe, Roi Reichart, Sean Ó hÉigeartaigh, Anna Korhonen, José Hernández-Orallo
JAIR: AI and Society (to be published) (paper)
2022Reject Before You Run: Granular Performance Prediction for Big Language Models with Small External Assessors
Lexin Zhou, Fernando Martínez-Plumed, José Hernández-Orallo, Cèsar Ferri, Wout Schellaert
Workshop on Evaluation Beyond Metrics at IJCAI 2022 (paper, workshop)
2022Training on the Test Set: Mapping the System-Problem Space in AI
José Hernández-Orallo*, Wout Schellaert*, Fernando Martínez-Plumed* (*equal contribution)
Blue Sky Idea Award 🏆
AAAI 2022 (paper, award)


Co-organising the “Predictable AI” kick-off event in Valencia

A singular event consisting of invited talks, panels and short lightning talks. It discussed “Predictable AI Futures” dealing with topics such as scaling laws, control, liability and future risks; as well as “Predictable AI Systems”, covering cognitive and robust evaluation, assessors, co-operative conditions, uncertainty estimation, and much more. (site)

Committee: José Hernández-Orallo, Ana Cidad and many others from ValGRAI in Valencia and the LCFI and CSER in Cambridge.

Co-organising the “Evaluation Beyond Metrics” workshop at IJCAI22

Workshop with the goal to challenge the widespread approach of evaluating intelligent systems with aggregated metrics over a benchmark or distribution of tasks. (site)

Committee: Wout Schellaert, Joshua Tenenbaum, Lucy Cheke, Tomer Ullman, José Hernández-Orallo, José Hernández-Orallo, Danaja Rutar, John Burden and Ryan Burnell.