decídalo Insights: About LLMs and chatbots

Due to recent advances in Large Language Models (LLMs), such as ChatGPT, it has become much easier today to develop intelligent chatbots that can access external data sources. However, several challenges arise. First, the knowledge of LLMs is static, which means that new knowledge that emerges after the LLM is trained is unknown to the LLM (for example, ChatGPT has a knowledge cutoff from September 2021). Second, the context size of LLMs is limited, which means that it is not possible to provide all available data simultaneously. The context size is the number of tokens that the LLM can process (for example, in ChatGPT this is a maximum of 4096 tokens). One token corresponds to approximately four characters. 

To address these challenges, we use the Retrieval-Augmented Generation Pattern for our decídalo chatbot. This involves first searching for those documents that are semantically similar to the question posed, and the LLM then uses this information to generate an answer. In this way, the LLM can answer questions about external data sources. 

An overview of the pattern is provided by the following illustration: 

Preparation for semantic search 

In order to take the context of the documents into account, we use so-called embeddings. These are functions that map text to numeric vectors. For example, the sentence “the quick brown fox jumps over the lazy dog” is transformed into the 1536-dimensional vector (0.0053, 0.0049,…). Semantically similar texts are placed close to each other in this embedding, while semantically dissimilar texts are placed far apart. Similar to LLMs, embeddings are trained on large datasets. However, like LLMs, embeddings have a limited context size, so larger documents must be split into multiple pieces of text. 

These embeddings are now stored in a vector database. 

In the case of our decídalo chatbot, the data from the profiles is stored section by section in the vector database.  In the process, projects are also divided among several vectors. 

Semantic search 

Now, when the user makes a search query, it is also transformed with the same embedding and compared with the document vectors from the vector database. To determine the similarity between two vectors, the cosine similarity is used. This is the cosine of the angle between the two vectors. For normalized vectors, Euclidean distance can be used equivalently. Considering the cosine similarity, the k closest documents are returned. Here, k is the number of documents to consider and is chosen depending on the use case and the length of the documents, since we have to consider the limited context size of the LLMs. 

Generation of the answer 

From the user’s question and the documents found by the semantic search, a single prompt is created, which is passed to the LLM to generate the answer. Additional information can be added to the prompt, such as the output language or the style of the answer. An example of such a prompt is the following: 

You are a chatbot that is supposed to respond to customers’ questions. 

Relevant context: {context} 

Question: {question} 

Helpful answer: 

The context and question variables are then replaced with the documents found and the user’s question, respectively. 

Multitool Chatbot 

Similarly, we can provide various tools and APIs to the chatbot. For this, descriptions are provided for the tools and APIs as to when they should be used. The chatbot then decides, based on the user input, which tool is appropriate and deploys it according to the instructions in the input. 

In our application, we use an optimization of semantic search, which we call hybrid search. This method combines the advantages of semantic search with the advantages of keyword based search. 

Finally, we present an example that illustrates the impressive power of semantic search. Let’s assume you are searching for a skill such as “Large Language Models”. A conventional search algorithm would not return any results because this skill is so new that no one will have noted it in their profile yet. Semantic search, on the other hand, uses the context of the question and searches for similar skills to still give you a meaningful answer. 

User: Who knows about large language models? 

Chatbot: I can’t say exactly who is familiar with Large Language Models, as the information provided mainly mentions Natural Language Processing (NLP) projects and skills. Erika Musterfrau and Max Mustermann have experience in NLP, but it is unclear if they have also worked specifically with Large Language Models. 

In addition, the chatbot easily handles challenging queries such as “I need an experienced project manager who knows data analytics and Scrum” or “I need a balanced team for a project in cyber security in the energy industry.” 

To see the related blog article about our decídalo chatbot including its video, click here.

If you have any further questions feel free to contact us.