The Three Paradoxes of AI Safety: Why Power Corrupts


Event details

Date 30.05.2024
Hour 16:3017:30
Speaker Monojit Choudhury
Category Conferences - Seminars
Event Language English

The dangers of LLMs and more broadly those of generative AI has been the centre of many debates and scientific studies. While working on the topics of fairness and inclusion, and “alignment” of LLMs, I have hit three roadblocks, which eventually led to, what I would say, three paradoxes of AI alignment. The first paradox is about the impossibility of defining universal alignment goals for generative AI because ethical principles are often in conflict to each other, and the tie can only be broken by the user of an AI system (or both in collaboration with each other)  in a given context. Therefore, strong alignment goals often lead to weaker systems. The second paradox dictates that the more powerful an AI model is, the more difficult it becomes to foresee and prevent the ways in which it can be “jailbroken” (i.e., derailed from its alignment goals). This implies that the only way to ensure AI safety is to curtail the power of the model. Finally, the third paradox is about drawing the lines between language, logic and knowledge: Injecting language models with knowledge is a potential solution to prevent “hallucination”, yet knowledge is infinite, non-monotonic and evolving. Therefore, it should be separated from language and logical processing – the two pillars of LLMs. On the other hand, it is impossible to draw a boundary between language and knowledge, as the structure of language and its usage encodes knowledge of the world and cultural conventions. This paradox then implies that more we attempt to prevent hallucination by decoupling knowledge from LLMs, the weaker the language processing abilities of the LLM become.

Monojit Choudhury is a professor of Natural Language Processing at Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi. Prior to this, he was a principal scientist at Microsoft Research Lab and Microsoft Turing, India. He is also a professor of practice at Plaksha University, and an adjunct professor at IIIT Hyderabad. Prof Choudhury's research interests lie in the intersection of NLP, Social and Cultural aspects of Technology use, and Ethics. In particular, he has been working on multilingual aspects of large language models (LLMs), their use in low resource languages and making LLMs more inclusive and safer by addressing bias and fairness aspects. Prof Choudhury is the general chair of Indian national linguistics Olympiad and the founding co-chair of Asia-Pacific linguistics Olympiad. He holds a BTech and PhD degree in Computer Science and Engineering from IIT Kharagpur.

Practical information

  • Informed public
  • Free


Event broadcasted in