In a thought-provoking discussion at NDC Oslo, Luise Freese and Iona Varga delved into the practical dilemmas surrounding the use of artificial intelligence (AI) models, particularly in the context of code generation. While AI has made significant strides in mimicking human intelligence, Freese and Varga emphasized the importance of striking a balance between practicality and quality when utilizing AI for specific tasks like generating code.
Varga drew attention to the intriguing concept that AI hints at a sense of genuine intelligence, yet its essence lies in the representation of how these models are constructed. By interconnecting nodes, AI endeavors to replicate the intricate web of neurons and synapses in the human brain. This endeavor to emulate the neural network leads to the term “artificial networks” or “artificial intelligence.” However, it’s essential to remember that AI, in practice, functions quite differently from the human brain.
Freese added a layer of abstraction to the conversation by highlighting that computers fundamentally rely on transistors, which operate in a binary fashion—they are either on or off. Through various combinations of these binary states, computers manipulate bits to execute tasks. Unlike the human brain, transistors do not engage in complex entanglements; they merely function as a collection of switches that ultimately yield a result.
The downward spiral of generalistic AI models
The crux of the discussion revolved around the challenges posed by using generalized AI models, often referred to as foundational models, for highly specific tasks. The duo specifically scrutinized large language models (LLMs) and their limitations. LLMs operate by analyzing input, be it a question or prompt, and generating a sequence of words based on statistical patterns. These models excel at prediction but fall short when it comes to fact-checking and validation, as their primary design objective is to generate content, not verify its accuracy.
Varga pointed out a critical concern: the risk associated with employing very generalistic AI models for highly specialized tasks. When organizations attempt to employ a single AI model to tackle a wide array of problems, a troubling pattern emerges. Freese likened it to a self-amplifying downward spiral. To escape this cycle, Freese suggested a shift towards using more specialized AI models, some of which could be built upon foundational models.
The role of human judgment in code evaluation
One central issue that emerged from the discussion was the question of whether AI-generated code is safe to use and whether it meets the required standards and quality. Varga emphasized that these questions ultimately require human judgment and intervention. The process of evaluating AI-generated code should not be underestimated, as it parallels the challenges of debugging unfamiliar code written by someone else. Just as debugging someone else’s code can be a complex endeavor, ensuring the quality and safety of AI-generated code demands careful scrutiny.
Varga highlighted the potential of AI as a valuable tool for initiating problem-solving processes. However, she cautioned that once AI is set in motion, it necessitates a thorough post-processing phase. This phase involves checking, validating, modifying, editing, and, in some cases, rewriting the AI-generated content. It is in this essential phase that the true extent of the work introduced by AI becomes apparent.
In essence, the discussion at NDC Oslo 2023 underscored the delicate equilibrium required when harnessing the power of AI models, particularly in highly specialized domains like code generation. While AI holds tremendous promise as a problem-solving aid, human oversight and validation remain indispensable in ensuring the quality, safety, and relevance of the output it generates.