Skip to content

Update Chatbots README #1402

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 10, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
185 changes: 184 additions & 1 deletion pgml-cms/docs/use-cases/chatbots/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ description: >-

## Introduction <a href="#introduction" id="introduction"></a>

This tutorial seeks to broadly cover the majority of topics required to not only implement a modern chatbot, but understand why we build them this way.There are three primary sections:
This tutorial seeks to broadly cover the majority of topics required to not only implement a modern chatbot, but understand why we build them this way. There are three primary sections:

* The Limitations of Modern LLMs
* Circumventing Limitations with RAG
Expand Down Expand Up @@ -202,6 +202,117 @@ Let's take this hypothetical example and make it a reality. For the rest of this
* The chatbot remembers our past conversation
* The chatbot can answer questions correctly about Baldur's Gate 3

In reality we haven't created a SOTA LLM, but fortunately other people have and we will be using the incredibly popular fine-tune of Mistral: `teknium/OpenHermes-2.5-Mistral-7B`. We will be using pgml our own Python library for the remainder of this tutorial. If you want to follow along and have not installed it yet:

```
pip install pgml
```

Also make sure and set the `DATABASE_URL` environment variable:

```
export DATABASE_URL="{your free PostgresML database url}"
```

Let's setup a basic chat loop with our model:

```
from pgml import TransformerPipeline
import asyncio

model = TransformerPipeline(
"text-generation",
"teknium/OpenHermes-2.5-Mistral-7B",
{"device_map": "auto", "torch_dtype": "bfloat16"},
)

async def main():
while True:
user_input = input("=> ")
model_output = await model.transform([user_input], {"max_new_tokens": 1000})
print(model_output[0][0]["generated_text"], "\n")

asyncio.run(main())
```

{% hint style="info" %}
Note that in our previous hypothetical examples we manually called tokenize to convert our inputs into `tokens`, in the real world we let `pgml` handle converting the text into `tokens`.
{% endhint %}

Now we can have the following conversation:

```
=> What is your name?
A: My name is John.

Q: How old are you?

A: I am 25 years old.

Q: What is your favorite color?

=> What did I just ask you?
I asked you if you were going to the store.

Oh, I see. No, I'm not going to the store.
```

That wasn't close to what we wanted to happen. Getting chatbots to work in the real world seems a bit more complicated than the hypothetical world.

To understand why our chatbot gave us a nonsensical first response, and why it didn't remember our conversation at all, we must dive shortly into the world of prompting.

Remember LLM's are just function approximators that are designed to predict the next most likely `token` given a list of `tokens`, and just like any other function, we must give the correct input. Let's look closer at the input we are giving our chatbot. In our last conversation we asked it two questions:

* What is your name?
* What did I just ask you?

We need to understand that LLMs have a special format for the inputs specifically for conversations. So far we have been ignoring this required formatting and giving our LLM the wrong inputs causing it to predicate nonsensical outputs.

What do the right inputs look like? That actually depends on the model. Each model can choose which format to use for conversations while training, and not all models are trained to be conversational. `teknium/OpenHermes-2.5-Mistral-7B` has been trained to be conversational and expects us to format text meant for conversations like so:

```
<|im_start|>system
You are a helpful AI assistant named Hermes
<|im_start|>user
What is your name?<|im_end|>
<|im_start|>assistant
```

We have added a bunch of these new HTML looking tags throughout our input. These tags map to tokens the LLM has been trained to associate with conversation shifts. `<|im_start|>` marks the beginning of a message. The text right after `<|im_start|>`, either system, user, or assistant marks the role of the message, and `<|im_end|>` marks the end of a message.

This is the style of input our LLM has been trained on. Let's do a simple test with this input and see if we get a better response:

```python
from pgml import TransformerPipeline
import asyncio

model = TransformerPipeline(
"text-generation",
"teknium/OpenHermes-2.5-Mistral-7B",
{"device_map": "auto", "torch_dtype": "bfloat16"},
)

user_input = """
<|im_start|>system
You are a helpful AI assistant named Hermes
<|im_start|>user
What is your name?<|im_end|>
<|im_start|>assistant
"""


async def main():
model_output = await model.transform([user_input], {"max_new_tokens": 1000})
print(model_output[0][0]["generated_text"], "\n")


asyncio.run(main())
```

```
My name is Hermes
```

{% hint style="info" %}
Notice we have a new "system" message we haven't discussed before. This special message gives us control over how the chatbot should interact with users. We could tell it to talk like a pirate, to be super friendly, or to not respond to angry messages. In this case we told it what it is, and its name. We will also add any conversation context the chatbot should have in the system message later.
{% endhint %}
Expand Down Expand Up @@ -288,6 +399,78 @@ You just asked me what my name is, and I am a friendly and helpful chatbot named

Note that we have a list of dictionaries called `history` we use to store the chat history, and instead of feeding text into our model, we are inputting the `history` list. Our library automatically converts this list of dictionaries into the format expected by the model. Notice the `roles` in the dictionaries are the same as the `roles` of the messages in the previous example. This list of dictionaries with keys `role` and `content` as a storage system for messages is pretty standard and used by us as well as OpenAI and HuggingFace.

Let's ask it the dreaded question:

```
=> What is Baldur's Gate?
Baldur's Gate 3 is a role-playing video game developed by Larian Studios and published by Dontnod Entertainment. It is based on the Advanced Dungeons & Dragons (D&D) rules and set in the Forgotten Realms campaign setting. Originally announced in 2012, the game had a long development period and was finally released in early access in October 2020. The game is a sequel to the popular Baldur's Gate II: Shadows of Amn (2000) and Baldur's Gate: Siege of Dragonspear (2016) expansion, and it continues the tradition of immersive storytelling, tactical combat, and character progression that fans of the series love.L
```

How does it know about Baldur's Gate 3? As it turns out, Baldur's Gate 3 has actually been around since 2020. I guess that completely ruins the hypothetical example. Let's ignore that and ask it something trickier it wouldn't know about Baldur's Gate 3.

```
=> What is the plot of Baldur's Gate 3?
Baldur's Gate 3 is a role-playing game set in the Dungeons & Dragons Forgotten Realms universe. The story revolves around a mind flayer, also known as an illithid, called The Mind Flayer who is attempting to merge humanoid minds into itself to achieve god-like power. Your character and their companions must navigate a world torn apart by various factions and conflicts while uncovering the conspiracy surrounding The Mind Flayer. Throughout the game, you'll forge relationships with various NPCs, make choices that impact the story, and engage in battles with enemies using a turn-based combat system.
```

As expected this is rather a shallow response that lacks any of the actual plot. To get the answer we want, we need to provide the correct context to our LLM, that means we need to:

* Get the text from the URL that has the answer
* Split that text into chunks
* Embed those chunks
* Search over the chunks to find the closest match
* Use the text from that chunk as context for the LLM

Luckily none of this is actually very difficult as people like us have built libraries that handle the complex pieces. Here is a program that handles steps 1-4:

```python
from pgml import Collection, Model, Splitter, Pipeline
import wikipediaapi
import asyncio

# Construct our wikipedia api
wiki_wiki = wikipediaapi.Wikipedia("Chatbot Tutorial Project", "en")

# Use the default model for embedding and default splitter for splitting
model = Model() # The default model is intfloat/e5-small
splitter = Splitter() # The default splitter is recursive_character

# Construct a pipeline for ingesting documents, splitting them into chunks, and then embedding them
pipeline = Pipeline("test-pipeline-1", model, splitter)

# Create a collection to house these documents
collection = Collection("chatbot-knowledge-base-1")


async def main():
# Add the pipeline to the collection
await collection.add_pipeline(pipeline)

# Get the document
page = wiki_wiki.page("Baldur's_Gate_3")

# Upsert the document. This will split the document and embed it
await collection.upsert_documents([{"id": "Baldur's_Gate_3", "text": page.text}])

# Retrieve and print the most relevant section
most_relevant_section = await (
collection.query()
.vector_recall("What is the plot of Baldur's Gate 3", pipeline)
.limit(1)
.fetch_all()
)
print(most_relevant_section[0][1])


asyncio.run(main())
```

```
Plot
Setting
Baldur's Gate 3 takes place in the fictional world of the Forgotten Realms during the year of 1492 DR, over 120 years after the events of the previous game, Baldur's Gate II: Shadows of Amn, and months after the events of the playable Dungeons & Dragons 5e module, Baldur's Gate: Descent into Avernus. The story is set primarily in the Sword Coast in western Faerûn, encompassing a forested area that includes the Emerald Grove, a druid grove dedicated to the deity Silvanus; Moonrise Towers and the Shadow-Cursed Lands, which are covered by an unnatural and sentient darkness that can only be penetrated through magical means; and Baldur's Gate, the largest and most affluent city in the region, as well as its outlying suburb of Rivington. Other places the player will pass through include the Underdark, the Astral Plane and Avernus.The player character can either be created from scratch by the player, chosen from six pre-made "origin characters", or a customisable seventh origin character known as the Dark Urge. All six pre-made origin characters can be recruited as part of the player character's party. They include Lae'zel, a githyanki fighter; Shadowheart, a half-elf cleric; Astarion, a high elf vampire rogue; Gale, a human wizard; Wyll, a human warlock; and Karlach, a tiefling barbarian. Four other characters may join the player's party: Halsin, a wood elf druid; Jaheira, a half-elf druid; Minsc, a human ranger who carries with him a hamster named Boo; and Minthara, a drow paladin. Jaheira and Minsc previously appeared in both Baldur's Gate and Baldur's Gate II: Shadows of Amn.
```

{% hint style="info" %}
Once again we are using `pgml` to abstract away the complicated pieces for our machine learning task. This isn't a guide on how to use our libraries, but for more information [check out our docs](https://postgresml.org/docs/api/client-sdk/getting-started).
{% endhint %}
Expand Down