📨 wschell@vrain.upv.es📜 Google Scholar🧑‍💻 GitHub🟢 ORCiD🗺️ Valencian Research Institute for Artificial Intelligence (VRAIN)

Supervised by: 🧙🏼‍♂️ José Hernández-Orallo and 🧙🏼 Fernando Martínez-Plumed


The main content of my PhD-research involves modelling AI evaluation as a prediction problem and what it would mean to maximise predictive power (and how we would do that). In general I am interested in AI evaluation & everything that is related: testing, auditing, metrics, environment & benchmark design, capability measurement, etc.

With regards to more specific applications and domains, I strongly prefer sequential decision problems, RL, planning & control and the likes. Specifically with relation to world-model learning, goal conditioning, multi-task systems.

Other concepts that spark my imagination include causality, embodied cognition, knowledge representation, grounding, AI safety, artificial life.

Most things really.


2023Rethink Reporting of Evaluation Results in AI
Ryan Burnell, Wout Schellaert, John Burden, Tomer D. Ullman, Fernando Martínez-Plumed, Joshua B. Tenenbaum, Danaja Rutar, Lucy G. Cheke, Jascha Sohl-Dickstein, Melanie Mitchell, Douwe Kiela, Murray Shanahan, Ellen M. Voorhees, Anthony G. Cohn, Joel Z. Leibo, José Hernández-Orallo
Science (paper, preprint)
2023Your Prompt is My Command: On Assessing the Human-Centred Generality of Multimodal Models
Wout Schellaert, Fernando Martínez-Plumed, Karina Vold, John Burden, Pablo A. M. Casares, Bao Sheng Loe, Roi Reichart, Sean Ó hÉigeartaigh, Anna Korhonen, José Hernández-Orallo
JAIR: AI and Society (to be published) (paper)
2022Reject Before You Run: Granular Performance Prediction for Big Language Models with Small External Assessors
Lexin Zhou, Fernando Martínez-Plumed, José Hernández-Orallo, Cèsar Ferri, Wout Schellaert
Workshop on Evaluation Beyond Metrics at IJCAI 2022 (paper, workshop)
2022Training on the Test Set: Mapping the System-Problem Space in AI
José Hernández-Orallo*, Wout Schellaert*, Fernando Martínez-Plumed* (*equal contribution)
Blue Sky Idea Award 🏆
AAAI 2022 (paper, award)


Co-organising the “Predictable AI” kick-off event in Valencia

A singular event consisting of invited talks, panels and short lightning talks. It discussed “Predictable AI Futures” dealing with topics such as scaling laws, control, liability and future risks; as well as “Predictable AI Systems”, covering cognitive and robust evaluation, assessors, co-operative conditions, uncertainty estimation, and much more. (site)

Committee: José Hernández-Orallo, Ana Cidad and many others from ValGRAI in Valencia and the LCFI and CSER in Cambridge.

Co-organising the “Evaluation Beyond Metrics” workshop at IJCAI22

Workshop with the goal to challenge the widespread approach of evaluating intelligent systems with aggregated metrics over a benchmark or distribution of tasks. (site)

Committee: Wout Schellaert, Joshua Tenenbaum, Lucy Cheke, Tomer Ullman, José Hernández-Orallo, José Hernández-Orallo, Danaja Rutar, John Burden and Ryan Burnell.