I make it my mission to stay abreast of the most significant breakthroughs in AI most relevant for deploying large language models within enterprises. Recently, I stumbled upon a fascinating paper titled “QLoRA: Efficient Fine-tuning of Quantized LLMs”. Allow me to break down this innovative concept and explain why it’s crucial to your large language model strategy.
As we’ve discussed before, using GPT-4 models with hosting and cloud providers like Microsoft can pose a range of challenges — they can be slow, expensive, throttled, and may even present security issues. That’s why it’s important to stay informed about viable alternatives that can tackle these obstacles head-on.
Most people observing this field assume that only corporate giants have access to the necessary data and computational resources to develop AI. It’s a reasonable assumption, given the investment that previous large language models, like GPT-4, required for training to achieve impressive results. Unfortunately, many also believe that subsequent breakthroughs will require the same level of capital. I share the contrary opinion of one Google researcher who stated in a famously leaked internal blog post, “neither Google nor OpenAI has a competitive moat.”
QLoRA is an example that makes his point. The QLoRA research team has showcased an approach to customizing large language models at a fraction of the previous costs. They accomplished this by demonstrating that using fewer bits for the weights and biases in large language models can dramatically reduce the memory footprint on the GPUs required for training, which is a significant constraint in training large language models.
The team then benchmarked the performance of their models against others, such as Google’s Bard, GPT-3.5, and GPT-4. The results were nothing short of impressive, with only GPT-4 surpassing it in most tasks. I conducted a few tests myself, comparing one of their models with GPT-4, and I couldn’t see an obvious difference.
Moreover, the researchers illustrated how a small, high-quality training dataset can outperform a larger dataset. This suggests that you could soon take a freely available large language model and create your own enterprise version that rivals or outperforms the best commercially available alternatives, simply by tailoring it to your own internal data and tasks.
However, we’re not quite there yet. This research breakthrough is based on the LLaMA large language model that Meta (Facebook) released a few months ago, which is currently not available for commercial use. We’ll have to wait until another organization makes the initial investment to train a large language model with comparable performance and releases it for commercial use within the open-source community. Alternatively, the open-source community might develop techniques that allow them to incrementally train a model that rivals the performance of LLaMA.
Regardless, it’s clear that the future holds many alternatives to proprietary models like GPT-4 and Bard. Your competitive advantage will lie in your ability to customize these innovations and deploy them more swiftly than your competitors.
PROLEGO GREATEST HITS:
ABOUT PROLEGO
Prolego is an elite consulting team of AI engineers, strategists, and creative professionals guiding the world’s largest companies through the AI transformation. Founded in 2017 by technology veterans Kevin Dewalt and Russ Rands, Prolego has helped dozens of Fortune 1000 companies develop AI strategies, transform their workforce, and build state-of-the-art AI solutions.