📨 wschell@vrain.upv.es📜 Google Scholar🧑‍💻 GitHub🟢 ORCiD🗺️ Valencian Research Institute for Artificial Intelligence (VRAIN)

Supervised by: 🧙🏼‍♂️ José Hernández-Orallo and 🧙🏼 Fernando Martínez-Plumed


The main content of my PhD-research involves modelling AI evaluation as a prediction problem and what it would mean to maximise predictive power (and how we would do that). In general I am interested in AI evaluation & everything that is related: testing, auditing, metrics, environment & benchmark design, capability measurement, etc.

With regards to more specific applications and domains, I strongly prefer sequential decision problems, RL, planning & control and the likes. Specifically with relation to world-model learning, goal conditioning, multi-task systems.

Other concepts that spark my imagination include causality, embodied cognition, knowledge representation, grounding, AI safety, artificial life.

Most things really.


2022Reject Before You Run: Granular Performance Prediction for Big Language Models with Small External Assessors
Lexin Zhou, Fernando Martínez-Plumed, José Hernández-Orallo, Cèsar Ferri, Wout Schellaert
Workshop on Evaluation Beyond Metrics at IJCAI 2022 (to be releasedworkshop)
2022Training on the Test Set: Mapping the System-Problem Space in AI
José Hernández-Orallo*, Wout Schellaert*, Fernando Martínez-Plumed* (*equal contribution)
Blue Sky Idea Award 🏆
AAAI 2022 (paperaward)


  • Co-organising the 📐 Evaluation Beyond Metrics workshop at IJCAI22, with Joshua Tenenbaum, Lucy Cheke, Tomer Ullman, José Hernández-Orallo, José Hernández-Orallo, Danaja Rutar, John Burden and Ryan Burnell (page).