Ben Levinstein / Belief and trust in AI
Ben Levinstein / Belief and trust in AI
Friday February 6 we are joined by Ben Levinstein, Associate Professor of Philosophy at the University of Illinois, Urbana-Champaign, who will present his work on "Belief and trust in AI." The talk is abstracted below.
Contemporary AI systems demonstrate remarkable capabilities, from passing PhD-level exams to sophisticated role-play and common-sense reasoning. As we increasingly delegate consequential decisions to these systems—in hiring, medical diagnosis, and beyond—two fundamental questions arise: Under what conditions can we responsibly trust AI agents? And to what extent can we meaningfully attribute action-guiding beliefs and desires to systems like Large Language Models? This talk develops a two-pronged approach to these challenges. First, I outline general conditions under which delegation to AI systems is rational and ethical, even under uncertainty about their internal states. Second, drawing on insights from radical interpretation (Davidson, Lewis) and contemporary machine learning, I propose four adequacy criteria for identifying belief-like representations in LLMs. I then argue current interpretability methods fail to satisfy these criteria for both conceptual and empirical reasons. I conclude by exploring alternative approaches to responsible AI deployment when mental state attribution remains uncertain.