Large Language Models (LLMs) are incredibly powerful, and continue to amaze us every day. Yet beneath their power lies a subtle but significant weakness.
They are frozen in time.
Once deployed, LLMs cannot update their knowledge or adapt to new information. Their understanding of the world remains static. These models are trained on data and then used for inference. They can't learn anything once they're in action.
Which means LLMs can't keep up with our world, where information is constantly being generated.
However, this limitation, while challenging, isn't a deal breaker. The key lies in augmenting these powerful models, enabling them to stay current. By successfully implementing such augmentation, we can extend LLMs' capabilities beyond their training cutoff, keeping them always 'fresh' and relevant.
RAGs
Vector embeddings are a key part of how LLMs work. They're simply a way to turn words into numbers that models can process. In LLMs, embeddings convert text into long lists of numbers (vectors), allowing the models to understand and manipulate language.
Now, imagine we took a book published after the LLM's training cutoff date. What if we broke it down into pages and used this same 'vector embedding' technique on each page? We could then store this vectorized book in an external database.
Here's where it gets interesting: when you ask the LLM about something in this book, it doesn't require prior training on its content. Instead, it can just search the vectorized pages, find the most relevant one, and use it as reference material to answer your question.
This method is known as Retrieval-Augmented Generation (RAG). By storing embeddings in an external, updatable database rather than within the LLM itself, RAGs allows the model to stay current. When the LLM needs to answer a question, it searches this up-to-date database, retrieves relevant information, and uses it to generate a response.
RAGs augment LLMs with real-time, domain-specific knowledge. This combination quickly became popular for building AI apps across many fields.
RAGs in practice
Creating a RAG is simple, start with any dataset you have. Let's consider a hospital using patient records to create an AI assistant for doctors.
The hospital turns its patient database into a RAG system. When a doctor asks a question, the AI searches these records and uses an LLM to craft a response, combining language skills with specific patient data.
This approach creates a specialized, up-to-date tool and addresses two additional LLM weaknesses:
It reduces hallucination by grounding responses in real data
It gives developers control over the AI's behavior with data
As records update, the hospital adds new info to the database, keeping the AI current without retraining.
This turns patient data into a powerful AI tool, helping doctors quickly access and understand patient history while maintaining accuracy and control.
Heresy
But what if we start with empty data?
This might sound heretical in the context of RAGs. After all, the whole point of a RAG is to ground an LLM in specific, relevant data. But let's entertain this thought for a moment.
Starting with an empty database could actually be a powerful way to build a knowledge base from scratch.
In this approach, users not only query the RAG but also 'write' back to it. This is the key distinction: users become both consumers and producers of knowledge.
Here's how it might work:
Start with an empty database
Enrich through user submissions
Grow via consume-contribute Cycles
This approach enables the system to learn and specialize based on actual usage. It grows organically, capturing nuances and knowledge that fixed datasets simply can't.
Organic growth is crucial, as it means the system evolves naturally in response to real user needs and interests, rather than following a predetermined path. It's a dynamic, living knowledge base that adapts and expands with each interaction.
Of course, this approach raises intriguing questions. How quickly would the system become useful? How would we manage potential misinformation? Could this lead to RAGs with vastly different knowledge bases depending on their user communities? And so on.
Analogy
A good analogy for this RAG-based collaborative knowledge building is the immensely successful Wikipedia. Both models rely on user contributions to grow.
The RAG approach, however, adds a conversational layer powered by LLMs. This creates a more casual interface for knowledge sharing, lowering the barrier for contribution and encouraging spontaneous input.
Users can easily ask questions, offer insights, or make connections, potentially leading to unexpected 'happy accidents' in knowledge creation. This fluid approach to collaborative learning could make knowledge-building more accessible and dynamic than ever before.
Next
Collaborative knowledge is an exciting topic. As an AI startup founder, I'm excited to explore its practical implementation.
We'll be experimenting with prototypes in the coming weeks. Stay tuned for updates on this mix of AI, knowledge management, and community collaboration.
Thanks for reading! Follow me on Twitter for more AI insights and updates. For company updates, check out Cycls.
“keeping the AI current without retraining”, this is key.. it is a waste of energy and resources to retrain the whole model to capture the dynamics of human development and their understanding of the world. RAG makes sense, it is the first time I learn about this systematic way to make AI models dynamic without retraining.
I was wondering about which lexical languages would perform better, and why? This could give us a glimpse on the origins of how each language was programmed by early humans and why every word have the letters and pronunciations they have.
I think, human languages are not arbitrary and random in letter shape, vocal sound, letter sequence in a word. There is a structure that we might discover that makes LLMs simple and lightweight.