For most of the last year, both bulls and bears on generative AI have painted a picture of the future of AI that oftentimes reads like a dystopian novel (even the AI champions!), e.g., AI as all-knowing, uber-large, black-box models, capable of things that we can't possibly comprehend, operating over the cloud, running on massive farms of complex GPU hardware, with mysterious quasi "artificial general intelligence" (AGI) capabilities, led, of course, by a handful of Big Tech companies, who are the only ones that can be trusted with the awesome power of this technology.
What if this picture is all wrong?
I will make the case that the future of AI may be a lot smaller, more specialized, more modular and a lot simpler and lower cost.
The Foundation
The use of the term LLM (large language model) is relatively new and is the fruition of years of research initiatives to apply “attention-based causal decoder” models to natural language processing. As recently as 2020-2021, these models were often called, more humbly, “text generation” models since they were trained to predict the next word in a sentence and “auto-complete” the remainder of the text, oftentimes in creative and fun ways.
Until five years ago, a model with 100 million parameters was considered state-of-the-art, which quickly grew to 1 billion parameters, and then the top was blown off as Open AI, Microsoft, Google, Baidu and others pushed models up to 1 trillion parameters and beyond (e.g., Open AI's Davinci is famously 175 billion parameters). As AI researchers pushed the scale of the models from 1 billion to 1 trillion parameters over the last three years, two surprising emergent properties were discovered:
Knowledge: Even though the models were not trained with the objective of learning any specific knowledge, the models demonstrated, at times, a remarkable capability to bring fact-like details about people, places and events into its text generations,
Instructions: As researchers experimented with passing different types of “prompts” to the models, they realized that the models could quickly learn patterns, answer questions and follow user instructions. This property was a major departure from previous natural language models that were single-purpose and rather inflexible in their ability to adapt to new patterns and be used in different contexts. After discovering this property, the idea of “instruction training” became a formalized part of the way that end-user LLMs are developed and has led to the rapid pace of innovation in model quality over the last 12-18 months, in particular.
In late 2022, OpenAI’s ChatGPT brilliantly integrated these two capabilities and, in the course of doing so, re-ignited a relatively old debate, conjuring both hopes and fears about a coming potential AGI revolution. In the process, awareness has also grown of the challenges for LLMs to encode knowledge effectively, and the term "hallucinations" has become a common way to refer to the LLM's ability to generate detailed, credible and wholly inaccurate information. Right now, it is not understood how LLMs encode knowledge, and there are many research initiatives to try to figure that out, including many skeptics that LLMs, in their current architecture and training objectives, will ever be able to serve as accurate knowledge bases.
The Future
However, when you unpack these capabilities and disentangle these elements, I believe that the most important headline is being buried—specifically, that the single most important breakthrough is not what LLMs "know" or even the future potential of an AGI, but rather that LLMs today "here and now," generally speaking, are capable of reading comprehension and response at human-like levels. And here's the real kicker: If you are looking to exploit the ability of an LLM to read, this capability exists down to 1 billion parameter models, with strong "reading comprehension" performance of models in the 7 billion parameter range, and improving every month in the open source community.
The reason that this matters so much is the cost and the ability to integrate LLMs into an enterprise workflow. Models in this size range (3B-7B parameters) are likely 2-3 orders of magnitude less expensive to develop and manage through their life cycle, and many are accessible, even today, to run on laptops. These smaller "reading comprehension/instruction-following" models are within the reach of just about every business in the world to deploy not just a "single AI" but potentially dozens or even hundreds of specialized models into different parts of their business. Of course, models this size will not have all of the varied capabilities of GPT-4 or Bard, but especially when trained for a specialized purpose, will be able to achieve comparable, near comparable, and in some cases, even superior, performance, at a fraction of the cost and complexity—and can be delivered and managed entirely within the four walls of a single business.
The history of technology tells us that monopolies are usually fleeting, that mainframes usually give way to laptops, that centralization yields to the power of the distributed and that big technologies usually scale not up but down into miniaturization.
I believe that the future of AI is probably less scary and more prosaic—special-purpose "machines that can read" quietly working and humming in the background of most enterprise processes—reading, reviewing, extracting, analyzing and finding patterns in the ocean of a company's unstructured information and documents, tightly integrated into business processes, delivering concrete productivity and quality benefits.