publications

publications by categories in reversed chronological order. generated by jekyll-scholar.

2025

  1. Formalising Human-in-the-Loop: Computational Reductions, Failure Modes, and Legal-Moral Responsibility
    Maurice Chiodo, Dennis Müller, Paul Siewert, and 3 more authors
    arXiv preprint arXiv:2505.10426, 2025
  2. Framing the Game: How Context Shapes LLM Decision-Making
    Isaac Robinson, and John Burden
    arXiv preprint arXiv:2503.04840, 2025
  3. arXiv
    General Scales Unlock AI Evaluation with Explanatory and Predictive Power
    Lexin Zhou, Lorenzo Pacchiardi, Fernando Martínez-Plumed, and 23 more authors
    arXiv preprint arXiv:2503.06378, 2025
  4. Paradigms of AI Evaluation: Mapping Goals, Methodologies and Culture
    John Burden, Marko Tešić, Lorenzo Pacchiardi, and 1 more author
    Jun 2025
    arXiv:2502.15620 [cs]

2024

  1. Evaluating AI Evaluation: Perils and Prospects
    John Burden
    Jun 2024
    _eprint: 2407.09221
  2. Conversational Complexity for Assessing Risk in Large Language Models
    John Burden, Manuel Cebrian, and Jose Hernandez-Orallo
    Sep 2024
    arXiv:2409.01247 [cs, math]
  3. The Animal-AI Environment: A Virtual Laboratory For Comparative Cognition and Artificial Intelligence Research
    Konstantinos Voudouris, Ibrahim Alhas, Wout Schellaert, and 11 more authors
    Oct 2024
    arXiv:2312.11414 [cs]

2023

  1. Animal-AI 3: What’s New & Why You Should Care
    Konstantinos Voudouris, Ibrahim Alhas, Wout Schellaert, and 9 more authors
    Dec 2023
    arXiv:2312.11414 [cs]
  2. 9. From Turing’s Speculations to an Academic Discipline: A History of AI Existential Safety
    John Burden, Sam Clarke, and Jess Whittlestone
    In The Era of Global Risk, Aug 2023
  3. Predictable Artificial Intelligence
    Lexin Zhou, Pablo A. Moreno-Casares, Fernando Martínez-Plumed, and 12 more authors
    Oct 2023
    arXiv:2310.06167 [cs]
  4. Inferring Capabilities from Task Performance with Bayesian Triangulation
    John Burden, Konstantinos Voudouris, Ryan Burnell, and 3 more authors
    Sep 2023
    arXiv:2309.11975 [cs]
  5. Rethink reporting of evaluation results in AI
    Ryan Burnell, Wout Schellaert, John Burden, and 13 more authors
    Science, Sep 2023
  6. Your Prompt is My Command: On Assessing the Human-Centred Generality of Multimodal Models
    Wout Schellaert, Fernando Martínez-Plumed, Karina Vold, and 6 more authors
    Journal of Artificial Intelligence Research, Sep 2023
  7. Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
    Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, and 447 more authors
    Transactions on Machine Learning Research, Sep 2023
  8. An International Consortium for Evaluations of Societal-Scale Risks from Advanced AI
    Ross Gruetzemacher, Alan Chan, Kevin Frazier, and 11 more authors
    Nov 2023
    arXiv:2310.14455 [cs]
  9. Harms from Increasingly Agentic Algorithmic Systems
    Alan Chan, Rebecca Salganik, Alva Markelius, and 19 more authors
    In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, Nov 2023
    event-place: Chicago, IL, USA

2022

  1. How Sure to Be Safe? Difficulty, Confidence and Negative Side Effects
    John Burden, Jose Hernandez-Orallo, and Sean Heigeartaigh
    In NeurIPS ML Safety Workshop, Nov 2022
  2. Not a Number: Identifying Instance Features for Capability-Oriented Evaluation
    Ryan Burnell, John Burden, Danaja Rutar, and 3 more authors
    In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, Jul 2022
  3. Evaluating object permanence in embodied agents using the animal-AI environment
    Konstantinos Voudouris, Niall Donnelly, Danaja Rutar, and 4 more authors
    EBeM’22: Workshop on AI Evaluation Beyond Metrics, July 25, 2022, Vienna, Austria, Jul 2022
    Publisher: CEUR Workshop Proceedings
  4. How general-purpose is a language model? usefulness and safety with human prompters in the wild
    Pablo Antonio Moreno Casares, Bao Sheng Loe, John Burden, and 1 more author
    In , Jul 2022
    Issue: 5

2021

  1. Latent Property State Abstraction For Reinforcement Learning
    John Burden, Sajjad Kamali Siahroudi, and Daniel Kudenko
    In Proceedings of the AAMAS Workshop on Adaptive Learning Agents (ALA), Jul 2021

2020

  1. Automating abstraction for potential-based reward shaping
    John Burden
    Dec 2020
    Publisher: University of York
  2. Uniform State Abstraction for Reinforcement Learning
    John Burden, and Daniel Kudenko
    In 24th European Conference on Artificial Intelligence,, Dec 2020