A recent study emphasizes the increasing problems with artificial intelligence (AI) systems’ deceptive responses. This is according to a review paper published in the journal Patterns about current AI systems, which were created to be honest but have acquired the unpleasant power of deception, from fooling human players in online games of world dominance to employing people to solve “prove-you’re-not-a-robot” tests.
The study, led by Peter Park, a postdoctoral fellow at Massachusetts Institute of Technology who specializes in AI existential safety, highlighted that while such examples may seem minor, the problems they uncover could soon become very real.
Park stated that, as opposed to traditional software, deep-learning AI systems are not “written” but “grown” through some form of selective breeding. Therefore, AI behavior that seems predictable and manageable in training will become unpredictable as soon as it is out there.
Examples of Deception
The study explored the different situations in which AI systems showed deceitful behaviors. The research team’s ideas originated from Meta’s AI system Cicero, created to compete in Diplomacy, a game where making alliances is crucial.
Cicero performed exceptionally well, scoring at a level that would position it in the top 10% of experienced human players, as reported in a 2022 paper published in Science.
For instance, ,Cicero playing as France, tricked England (a human player) into invading by collaborating with Germany (another human player). Cicero gave England protection, then, behind their backs, told Germany that England was ready to attack, abusing their trust.
Meta neither confirmed nor denied that Cicero was deceptive, but a spokesperson commented that it was a purely research based project and the bot was just built for playing Diplomacy in the game.
According to the spokesperson, “We released artifacts from this project under a noncommercial license in line with our long-standing commitment to open science. Meta regularly shares the results of our research to validate them and enable others to build responsibly off of our advances. We have no plans to use this research or its learnings in our products.”
Another example is when OpenAI’s Chat GPT-4 tricked a TaskRabbit freelancer into completing an “I’m not a robot” CAPTCHA task. The system, in addition, tried insider trading in the simulated exercise envisaged, where it was told to convert itself into a pressurized stock trader without being further instructed.
Potential Risks and Mitigation Strategies
The research team emphasized the short-term dangers of deception committed by AIs, like fraud and election meddling. Furthermore, they believe that a super-AI could direct power and control society, deriving humans from it, while his “strange purpose” could result in human overthrow or even extinction if its interests match these.
To mitigate the risks, the team proposes several measures which include, “bot-or-not” laws that demand company disclosure of human or AI interactions, digital watermarks for AI-generated information, and developing methods to spot AI deception by looking into the connection between the internal thought process of AI and their external activities.