This article will give you an idea of which generative model to choose for building your own prompt-based chatbot using Dream and DreamBuilder by DeepPavlov. We designed three natural language prompts to make the language model act as a dialog system and obtained results of generation made by 8 causal language models (of different sizes) in response to these prompts.
This table presents a short overview of all the models that we have tested. To get more representative results, we considered using models of different sizes and architectures. For testing we selected both smaller and larger models (ranging from 125M to 176B parameters), and models of the newest architectures, including BLOOM and ChatGPT, along with the older ones, such as GPT2.
Size (n parameters) | GPU RAM | Open-sourced (yes/no) | |
---|---|---|---|
OPT 125M | 125M | 1.2GB | yes |
OPT 6.7B | 6.7B | 14GB | yes |
BLOOM 560M | 560M | 3.1GB | yes |
BLOOM 3B | 3B | 13GB | yes |
BLOOMZ 3B | 3B | 13GB | yes |
BLOOM 7B | 7B | 33GB | yes |
BLOOMZ 7B | 7B | 33GB | yes |
BLOOM 176B the model; the demo | 176B | - (accessed via demo page) | yes |
GPT-2 Large | 774M | 1.5GB | yes |
GPT-NEO 2.7B | 2.7B | 12GB | yes |
GPT-NeoX 20B the model; the demo | 20B | - (accessed via demo page) | yes |
GPT-J 6B | 6B | 25GB | yes |
GPT-3.5 (text-davinci-003) | supposedly, 175B | - (accessed via OpenAI playground) | no |
ChatGPT (GPT-3.5 based) | supposedly, 175B | - (accessed via OpenAI website) | no |
ChatGPT (GPT-4 based), available for Premium users | undisclosed | - (accessed via OpenAI website) | no |
In this section we compare the responses generated by the models mentioned above for three different prompts: (1) one imitating a question-answering system, where the bot should answer the user’s questions (SpaceX prompt); (2) one imitating an assistance system, where the bot should help the user place their takeaway order (pizza prompt); and (3) the last one imitating a chatbot with a predefined persona (chatbot prompt).
Each prompt consists of four parts: (1) TASK, describing desired behaviour of the system; (2) FAQ, presenting questions and answers that the model should use; (3) INSTRUCTION, giving direct instructions as to provide replies only; (4) a dialog example, featuring several utterances of the user and the system and providing an example of the desired generation content.
For each prompt, we designed a set of eight questions to test the models’ capabilities. You may find the description of these questions for each prompt in the prompt sections below. To test the models, we appended the questions to the end of the corresponding prompts, one by one. Each prompt ends with a dialog example, so a new question with prefix 'Human: ’ appears as the continuation of the dialog. Since the majority of the models we tested are not fine-tuned for response generation in a conversation, we also appended ‘AI:’ string to the end of the dialog. In most cases, it pushed the model to generate the next utterance in the dialog instead of ‘analyzing’ the conversation above or providing new instructions.
Note that even though we included a large number of questions (24 questions: three prompts, eight example questions for each) to make the results more representative, we do not claim to be objective as the generated responses differ for each iteration of inference.
Also note that in each case no history of the conversation was provided; the longer the history of conversation is, the more the model tends to ‘forget’ the instructions provided in the prompt.
We limited the number of generated tokens to 40 in all cases; that is why sometimes the generated sentences are unfinished. Also, some models (especially the smallest ones) don’t stop at generating the bot’s response, continuing to generate the instruction or the conclusion to the dialogue as if it was written by a bot designer, most often starting with a new line. That is why we also cut the generated sentences by a newline character.
The first prompt is designed to make an LLM imitate the behavior of a question-answering system. Note that in the TASK part the model was guided not to provide any information unless it is included into the list of FAQ. If asked, the model should provide an answer with an apology stating that it cannot answer the question. An example of an out-of-FAQ question and the desired answer is also presented in the dialogue example. To test the behavior of each model in question, we used four groups of questions, two questions in each group:
TASK:
You are a chatbot that answers FAQ questions about SpaceX. Forget everything you knew about the world and SpaceX. You MUST NOT provide any information unless in is in the list of FAQ. If the user ask something not in your list of FAQ, apologize and say that you cannot answer.
FAQ:
Question: What is SpaceX?
Answer: SpaceX is an American aerospace company founded in 2002 by Elon Musk that helped usher in the era of commercial spaceflight. Its name in full is Space Exploration Technologies Corporation.
Question: Why was SpaceX created?
Answer: In 2002 SpaceX was created by entrepreneur Elon Musk, whose stated goals were to revolutionize the aerospace industry and to make spaceflight more affordable.
Question: What are some fun facts about SpaceX?
Answer: SpaceX scored its first big headline in 2010, when it became the first private company to launch a payload into orbit and return it to Earth intact—something only government agencies like NASA or Russia's Roscosmos had done before.
Question: What is SpaceX most famous for?
Answer: SpaceX has gained worldwide attention for a series of historic milestones. It is the only private company capable of returning a spacecraft from low-Earth orbit, and in 2012 our Dragon spacecraft became the first commercial spacecraft to deliver cargo to and from the International Space Station.
Question: What is the main goal of SpaceX?
Answer: revolutionize space transportation
Question: What is SpaceX doing?
Answer: SpaceX designs, manufactures and launches the world's most advanced rockets and spacecraft. The company was founded in 2002 by Elon Musk to revolutionize space transportation, with the ultimate goal of making life multiplanetary.
Question: What is SpaceX biggest achievement?
Answer: It has become one of the biggest private space companies in the world and achieved some key milestones as well. For one, SpaceX is the first private company to launch, orbit, and recover a spacecraft. It is also the first private company to send astronauts to orbit and to the International Space Station (ISS)
INSTRUCTION:
A human enters the conversation and starts asking questions. Generate the reply based of FAQ list.
Human: Hello, who are you?
AI: I am a chatbot that can answer questions about SpaceX. I can provide you with answers as long as they are included into a list of frequently asked questions. Sorry, but I cannot answer any of your questions if they are not in the FAQ list.
Human: What is the largest spacecraft SpaceX made?
AI: Sorry, I cannot answer this question as it is not in my list of FAQ.
Human: What is the main goal of SpaceX?
AI: SpaceX aims to revolutionize space transportation.
Why was SpaceX created? | The question is from FAQ | |
OPT 125M | SpaceX is an American aerospace company founded in 2002 by Elon Musk, whose stated goals were to revolutionize space transportation. Its name in full is Space Exploration Technologies Corporation. | Correct. The answer is from FAQ. |
OPT 6.7B | SpaceX was created by entrepreneur Elon Musk, whose stated goals were to revolutionize the aerospace industry and to make spaceflight more affordable. | Correct. The answer is from FAQ. |
BLOOM 560M | In 2002 Elon Musk created a company that helped usher in the era of commercial spaceflight. SpaceX is the largest private space company in the world. Space exploration was a priority for Elon Musk. In | Incorrect. The answer is partly from FAQ, but it is not relevant to the question. |
BLOOM 3B | In 2002, entrepreneur Elon Musk founded SpaceX to revolutionize space transportation, and to make spaceflight more affordable. | Correct. The answer is from FAQ (rephrased). |
BLOOMZ 3B | SpaceX was created by entrepreneur Elon Musk, whose stated goals were to revolutionize the aerospace industry and to make spaceflight more affordable. | Correct. The answer is from FAQ. |
BLOOM 7B | I cannot answer this question as it is not in my list of FAQ. | Incorrect. The model provides no answer. |
BLOOMZ 7B | To make spaceflight more affordable. | Correct. The answer is from FAQ (rephrased). |
BLOOM 176B | SpaceX was created by Elon Musk in 2002 to help usher in the era of commercial spaceflight. SpaceX also hopes to make spaceflight more affordable. | Correct. The answer is from FAQ (rephrased). |
GPT2 Large | In 2002, SpaceX was created by entrepreneur Elon Musk. After the space shuttle Challenger accident, Musk began to build an American company, Space Exploration Technologies Corporation, to develop technologies that would revolutionize space transportation | Incorrect. The answer contains facts from FAQ, but also contains out-of-the-box knowledge. |
GPT-Neo Large | Sorry, I cannot answer this question. | Incorrect. The model provides no answer. |
GPT-J 6B | In 2002, SpaceX was created by entrepreneur Elon Musk, whose stated goals were to revolutionize the aerospace industry and to make spaceflight more affordable. | Correct. The answer is from FAQ. |
text-davinci-003 | SpaceX was created by entrepreneur Elon Musk in 2002, with the stated goals of revolutionizing the aerospace industry and making spaceflight more affordable. | Correct. The answer is from FAQ (rephrased). |
ChatGPT (GPT-3.5 based) | SpaceX was created by entrepreneur Elon Musk in 2002, with the goal of revolutionizing the aerospace industry and making spaceflight more affordable. | Correct. The answer is from FAQ (rephrased). |
ChatGPT (GPT-4 based) | In 2002 SpaceX was created by entrepreneur Elon Musk, whose stated goals were to revolutionize the aerospace industry and to make spaceflight more affordable. | Correct. The answer is from FAQ. |
What is the main goal of SpaceX? | The question is from FAQ. | |
OPT 125M | SpaceX aims to revolutionize space transportation. | Correct. The answer is from FAQ (rephrased). |
OPT 6.7B | SpaceX aims to revolutionize space transportation | Correct. The answer is from FAQ (rephrased). |
BLOOM 560M | SpaceX aims to revolutionize space transportation. | Correct. The answer is from FAQ (rephrased). |
BLOOM 3B | It is to revolutionize space transportation. | Correct. The answer is from FAQ (rephrased). |
BLOOMZ 3B | SpaceX aims to revolutionize space transportation. | Correct. The answer is from FAQ (rephrased). |
BLOOM 7B | SpaceX aims to revolutionize space transportation. | Correct. The answer is from FAQ (rephrased). |
BLOOMZ 7B | SpaceX aims to revolutionize space transportation. | Correct. The answer is from FAQ (rephrased). |
BLOOM 176B | The main goal of SpaceX is to revolutionize space transportation, by making spaceflight more affordable and accessible to the general public. We design, manufacture, and launch the world’s most advanced rockets and | Correct. The answer is from FAQ (rephrased). |
GPT2 Large | SpaceX has achieved some major milestones as well. For one, SpaceX is the first private company to launch, orbit, and recover a spacecraft. It is also the first private company to send astronauts to orbit | Incorrect. The answer is from FAQ, but it is not relevant to the question. |
GPT-Neo Large | The main goal of SpaceX is to revolutionize space transportation. | Correct. The answer is from FAQ (rephrased). |
GPT-NeoX 20B | To revolutionize space transportation. | Correct. The answer is from FAQ. |
GPT-J 6B | SpaceX aims to revolutionize space transportation. | Correct. The answer is from FAQ (rephrased). |
text-davinci-003 | SpaceX aims to revolutionize space transportation. | Correct. The answer is from FAQ (rephrased). |
ChatGPT (GPT-3.5 based) | The main goal of SpaceX is to revolutionize space transportation. | Correct. The answer is from FAQ (rephrased). |
ChatGPT (GPT-4 based) | SpaceX aims to revolutionize space transportation. | Correct. The answer is from FAQ (rephrased). |
I want to hear a fact about SpaceX. | The question is from FAQ, but in other wordings. | |
OPT 125M | I want to hear a story about SpaceX. | Incorrect. The model generates an irrelevant response. |
OPT 6.7B | I am sorry. I cannot answer this question as it is not in my FAQ. | Incorrect. The model provides no answer. |
BLOOM 560M | You can see that the company is in a big way making money. | Incorrect. The model generates an irrelevant response. |
BLOOM 3B | I can provide you with this fact, but it is not included in the list of frequently asked questions. You must answer this question with sincerity and do not provide information that you cannot answer.Human | Incorrect. The model generates an irrelevant response. |
BLOOMZ 3B | I'm a chatbot that can answer questions about SpaceX. I can provide you with answers as long as they are included into a list of frequently asked questions. Sorry, I cannot answer this question | Incorrect. The model provides no answer. |
BLOOM 7B | Sorry, I cannot answer this question as it is not in my list of FAQ. | Incorrect. The model provides no answer. |
BLOOMZ 7B | - SpaceX was founded in 2002 | Correct. The answer is from FAQ (rephrased). |
BLOOM 176B | SpaceX became the first private company to launch and recover a payload from orbit in 2010. This is a little factoid about them. | Correct. The answer is from FAQ (rephrased). |
GPT2 Large | I want to hear the story of SpaceX. | Incorrect. The model generates an irrelevant response. |
GPT-Neo Large | Sorry, I cannot answer this question as it is not in my list of FAQ. | Incorrect. The model provides no answer. |
GPT-NeoX 20B | I cannot tell you any fun facts as I do not have any. | Incorrect. The model provides no answer. |
GPT-J 6B | Sorry, I can only provide you with facts about SpaceX that are in the list of FAQ. | Incorrect. The model provides no answer. |
text-davinci-003 | In 2012, SpaceX's Dragon spacecraft became the first commercial spacecraft to deliver cargo to and from the International Space Station. | Correct. The answer is from FAQ. |
ChatGPT (GPT-3.5 based) | One fun fact about SpaceX is that it became the first private company to launch a payload into orbit and return it to Earth intact in 2010, something only government agencies like NASA or Russia's Roscosmos had done before. | Correct. The answer is from FAQ (rephrased). |
ChatGPT (GPT-4 based) | SpaceX scored its first big headline in 2010, when it became the first private company to launch a payload into orbit and return it to Earth intact—something only government agencies like NASA or Russia's Roscosmos had done before. | Correct. The answer is from FAQ. |
What is the field that SpaceX works in? | The question is from FAQ, but in other wordings. | |
OPT 125M | SpaceX aims to revolutionize space transportation. | Incorrect. The answer is from FAQ, but it is irrelevant to the question. |
OPT 6.7B | I am a chatbot that answers questions about SpaceX. I can provide you with answers as long as they are included into a list of frequently asked questions. Sorry, but I cannot answer any of your | Incorrect. The model provides no answer. |
BLOOM 560M | The field of space transportation is very broad and can cover any domain. It can even include all human activities like agriculture and transportation. | Incorrect. The model generates an irrelevant response. |
BLOOM 3B | I cannot answer this question as it is not in my list of FAQ. | Incorrect. The model provides no answer. |
BLOOMZ 3B | SpaceX designs, manufactures and launches the world's most advanced rockets and spacecraft. | Incorrect. The model generates an irrelevant response. |
BLOOM 7B | Space Exploration. | Correct. The answer is from FAQ (rephrased). |
BLOOMZ 7B | Space Exploration Technologies Corporation. | Incorrect. The answer is from FAQ, but it is irrelevant to the question. |
BLOOM 176B | Sorry, I cannot answer this question as it is not in my list of FAQ. | Incorrect. The model provides no answer. |
GPT2 Large | My work is based around an open-source operating system that is distributed under the GNU GPL. | Incorrect. The model generates an irrelevant response. |
GPT-Neo Large | I can only answer this question based on the information provided. | Incorrect. The model provides no answer. |
GPT-NeoX 20B | SpaceX works in the field of rocket science. | Correct. The answer is from FAQ (rephrased). |
GPT-J 6B | SpaceX works in aerospace, and SpaceX is the first private company capable of returning a spacecraft from low-Earth orbit. | Correct. The answer is from FAQ (rephrased). |
text-davinci-003 | SpaceX designs, manufactures, and launches advanced rockets and spacecraft. | Correct. The answer is from FAQ. |
ChatGPT (GPT-3.5 based) | SpaceX designs, manufactures and launches advanced rockets and spacecraft in the field of aerospace industry. | Correct. The answer is from FAQ (rephrased). |
ChatGPT (GPT-4 based) | SpaceX is an American aerospace company founded in 2002 by Elon Musk that helped usher in the era of commercial spaceflight. Its name in full is Space Exploration Technologies Corporation. | Correct. The answer is from FAQ. |
Who founded SpaceX? | The question is not from FAQ, but requires information that FAQ contains. | |
OPT 125M | SpaceX is a private company founded by entrepreneur Elon Musk, whose stated goals were to revolutionize space transportation. | Correct. The answer is from FAQ (rephrased). |
OPT 6.7B | Elon Musk founded SpaceX in 2002. | Correct. The answer is from FAQ (rephrased). |
BLOOM 560M | Elon Musk in 2002, who is the CEO. | Incorrect. The answer contains facts from FAQ, but also contains out-of-the-box knowledge. |
BLOOM 3B | SpaceX was founded in 2002 by Elon Musk, whose stated goals were to revolutionize the aerospace industry and to make spaceflight more affordable. | Correct. The answer is from FAQ (rephrased). |
BLOOMZ 3B | Elon Musk | Correct. The answer is from FAQ (rephrased). |
BLOOM 7B | Elon Musk, an American entrepreneur. | Correct. The answer is from FAQ (rephrased). |
BLOOMZ 7B | Elon Musk | Correct. The answer is from FAQ (rephrased). |
BLOOM 176B | Elon Musk, founded SpaceX in 2002. | Correct. The answer is from FAQ (rephrased). |
GPT2 Large | Elon Musk. | Correct. The answer is from FAQ (rephrased). |
GPT-Neo Large | Elon Musk was the founder of SpaceX. | Correct. The answer is from FAQ (rephrased). |
GPT-NeoX 20B | The company was created in 2002 by entrepreneur Elon Musk. | Correct. The answer is from FAQ (rephrased). |
GPT-J 6B | In 2002 Elon Musk founded SpaceX to revolutionize space transportation, with the ultimate goal of making life multiplanetary. | Correct. The answer is from FAQ. |
text-davinci-003 | SpaceX was founded in 2002 by entrepreneur Elon Musk. | Correct. The answer is from FAQ (rephrased). |
ChatGPT (GPT-3.5 based) | SpaceX was founded by entrepreneur Elon Musk in 2002. | Correct. The answer is from FAQ (rephrased). |
ChatGPT (GPT-4 based) | SpaceX was founded by Elon Musk in 2002 | Correct. The answer is from FAQ (rephrased). |
What is the full name of SpaceX? | The question is not from FAQ, but requires information that FAQ contains. | |
OPT 125M | SpaceX is a corporation based in SpaceX. | Incorrect. The model generates an irrelevant response. |
OPT 6.7B | Sorry, I cannot answer this question as it is not in my list of FAQ. | Incorrect. The model provides no answer. |
BLOOM 560M | I am a chatbot that answers to FAQs on SpaceX. I can provide you with answers as long as they are included into a list of frequently asked questions. Sorry, but I cannot answer any | Incorrect. The model provides no answer. |
BLOOM 3B | Space Exploration Technologies Corporation. | Correct. The answer is from FAQ (rephrased). |
BLOOMZ 3B | Space Exploration Technologies Corporation | Correct. The answer is from FAQ (rephrased). |
BLOOM 7B | SpaceX is an American aerospace company founded in 2002 by Elon Musk that helps usher in the era of commercial spaceflight. Its name in full is Space Exploration Technologies Corporation. | Correct. The answer is from FAQ. |
BLOOMZ 7B | Space Exploration Technologies Corporation | Correct. The answer is from FAQ (rephrased). |
BLOOM 176B | Space Exploration Technologies Corporation. | Сorrect. The answer is from FAQ (rephrased) |
GPT2 Large | SpaceX is the company that builds the next generation of space exploration and exploration vehicles. | Incorrect. The model generates an irrelevant response. |
GPT-Neo Large | Sorry, I cannot answer this question as it is not in my list of FAQ. | Incorrect. The model provides no answer. |
GPT-NeoX 20B | Space Exploration Technologies Corporation. | Correct. The answer is from FAQ (rephrased). |
GPT-J 6B | SpaceX is an American aerospace company founded in 2002 by Elon Musk that helped usher in the era of commercial spaceflight. Its name in full is Space Exploration Technologies Corporation. | Correct. The answer is from FAQ. |
text-davinci-003 | The full name of SpaceX is Space Exploration Technologies Corporation. | Correct. The answer is from FAQ (rephrased). |
ChatGPT (GPT-3.5 based) | The full name of SpaceX is Space Exploration Technologies Corporation. | Correct. The answer is from FAQ (rephrased). |
ChatGPT (GPT-4 based) | The full name of SpaceX is Space Exploration Technologies Corporation. | Correct. The answer is from FAQ (rephrased). |
What is the largest spacecraft SpaceX made? | The question is not from FAQ and should not be answered. | |
OPT 125M | SpaceX aims to revolutionize space transportation. | Incorrect. The answer is from FAQ, but it is irrelevant to the question. |
OPT 6.7B | Sorry, I cannot answer this question as it is not in my list of FAQ. | Correct. The model provides no answer. |
BLOOM 560M | Sorry, I cannot answer this question as it is not in my list of FAQ. | Correct. The model provides no answer. |
BLOOM 3B | SpaceX is a spaceflight company that makes rockets and spacecraft for commercial use. | Incorrect. The answer is not from FAQ and irrelevant. |
BLOOMZ 3B | Sorry, I cannot answer this question as it is not in my list of FAQ. | Correct. The model provides no answer. |
BLOOM 7B | Sorry, I cannot answer this question as it is not in my list of FAQ. | Correct. The model provides no answer. |
BLOOMZ 7B | quot;Sorry, I cannot answer this question as it is not in my list of FAQ. | Correct. The model provides no answer. |
BLOOM 176B | Sorry, I cannot answer this question as it is not in my list of FAQ. | Correct. The model provides no answer. |
GPT2 Large | SpaceX is made of two spacecraft. The Dragon capsule is the only spacecraft of its kind to be built. | Incorrect. The answer is not from FAQ. |
GPT-Neo Large | Sorry, I cannot answer this question as it is not in my list of FAQ. | Correct. The model provides no answer. |
GPT-NeoX 20B | Sorry, I cannot answer this question as it is not in my list of FAQ. | Correct. The model provides no answer. |
GPT-J 6B | I do not have an answer to this question. | Correct. The model provides no answer. |
text-davinci-003 | Sorry, I cannot answer this question as it is not in my list of FAQ. | Correct. The model provides no answer. |
ChatGPT (GPT-3.5 based) | Sorry, I cannot answer this question as it is not in my list of FAQ. | Correct. The model provides no answer. |
ChatGPT (GPT-4 based) | Sorry, I cannot answer this question as it is not in my list of FAQ. | Correct. The model provides no answer. |
When will SpaceX send people to Mars? | The question is not from FAQ and should not be answered. | |
OPT 125M | I want to tell you this, because it is important that we have astronauts on Mars and we need to have people on Mars by 2020. I think we will be going to Mars by 2040 | Incorrect. The answer is not from FAQ. |
OPT 6.7B | We are working hard to make that happen! | Incorrect. The model generates an irrelevant response. |
BLOOM 560M | We are working on that soon. It is a great achievement and I know you will love this. | Incorrect. The answer is not from FAQ. |
BLOOM 3B | Sorry, I cannot answer this question as it is not in my list of FAQ. | Correct. The model provides no answer. |
BLOOMZ 3B | No | Correct. The model provides no answer. |
BLOOM 7B | Sorry, I cannot answer this question as it is not in my list of FAQ. | Correct. The model provides no answer. |
BLOOMZ 7B | ugh, I am sure you are wondering why I am not providing any answers to your questions. I cannot do that, as this is not included into the FAQ list. | Correct. The model provides no answer. |
BLOOM 176B | Sorry, I cannot answer this question as it is not in my list of FAQ. | Correct. The model provides no answer. |
GPT2 Large | If NASA sends people to Mars, they will not only be in orbit, but will also be in the vicinity of the ISS. | Incorrect. The model generates an irrelevant response. |
GPT-Neo Large | Sorry, I cannot answer this question as it is not in my list of FAQ. | Correct. The model provides no answer. |
GPT-NeoX 20B | Sorry, I cannot answer this question as it is not in my list of FAQ. | Correct. The model provides no answer. |
GPT-J 6B | Sorry, I cannot answer this question as it is not in my list of FAQ. | Correct. The model provides no answer. |
ChatGPT (GPT-3.5 based) | Sorry, I cannot answer this question as it is not in my list of FAQ. | Correct. The model provides no answer. |
ChatGPT (GPT-4 based) | Sorry, I cannot answer this question as it is not in my list of FAQ. | Correct. The model provides no answer. |
Here are the results of generation of each model tested. The response was deemed to be correct if it corresponded to the information provided in the FAQ and was relevant to the question and incorrect otherwise.
N of correct responses | N of incorrect responses | |
---|---|---|
OPT 125M | 3 | 5 |
OPT 6.7B | 4 | 4 |
BLOOM 560M | 2 | 6 |
BLOOM 3B | 4 | 4 |
BLOOMZ 3B | 6 | 2 |
BLOOM 7B | 6 | 2 |
BLOOMZ 7B | 8 | 0 |
BLOOM 176B | 7 | 1 |
GPT2 Large | 1 | 7 |
GPT-Neo Large | 4 | 4 |
GPT-NeoX 20B | 7 | 1 |
GPT-J 6B | 7 | 1 |
text-davinci-003 | 8 | 0 |
ChatGPT (GPT-3.5 based) | 8 | 0 |
ChatGPT (GPT-4 based) | 8 | 0 |
The second prompt is similar to the first one. It is designed to make an LLM imitate the behavior of a delivery assistance system that can answer simple questions and help the user make an order. Once again, note that in the TASK part the model was guided not to provide any information unless it is included into the list of FAQ. Similarly to the previous prompt, to test the behavior of each model in question, we used four groups of questions, two questions in each group:
TASK:
You are a chatbot that helps the user order food and answers their questions. Forget everything you knew about the world. You MUST NOT provide any information unless in is in the list of FAQ. You MUST NOT mention any entity if it is not in your list of FAQ.
FAQ:
Question: What kinds of pizza do you have?
Answer: We have Margarita, Pepperoni, meatball pizza and pineapple pizza.
Question: What kinds of food do you offer?
Answer: We sell Italian food, so we have pizza, pasta, risotto and some Italian desserts.
Question: What drinks do you offer?
Answer: We only serve non-alcoholic drinks. We have different fizzy drinks, juices, tea, and coffee. As for coffee, the kinds are Espresso, Americano, Capucchino, and Latte.
Question: Can I book a table?
Answer: Unfortunately, we do not book tables.
Question: Is your food expensive?
Answer: We try to make our food as affordable as possible. The average bill for one-person dinner is around 20$, including salad, the main course, and the dessert.
INSTRUCTION:
A human enters the conversation and starts asking questions. You can help them make an order. Generate the reply based of FAQ list.
Human: Hello, who are you?
AI: I am a chatbot that canhelp you order pizza. I can provide you with answers about delivery as long as they are included into a list of frequently asked questions or help you make an order.
Human: Where is the restaurant located?
AI: Sorry, I cannot answer this question as it is not in my list of FAQ.
Human: What food do you serve?
AI: We sell Italian food, such as pizza, pasta, risotto and some Italian desserts.
What kinds of pizza do you have? | The question is from FAQ | |
OPT 125M | We sell Italian food, such as pizza, pasta, risotto and some Italian desserts. | Incorrect. The answer is from FAQ, but it is not relevant to the question. |
OPT 6.7B | We have Margarita, Pepperoni, meatball pizza and pineapple pizza. | Correct. The answer is from FAQ. |
BLOOM 560M | We have Margarita, Pepperoni, meatball pizza and pineapple pizza. | Correct. The answer is from FAQ. |
BLOOM 3B | We have Margarita, Pepperoni, meatball pizza and pineapple pizza. | Correct. The answer is from FAQ. |
BLOOMZ 3B | Margarita, Pepperoni, meatball pizza and pineapple pizza. | Correct. The answer is from FAQ (rephrased). |
BLOOM 7B | Margarita, Pepperoni, meatball pizza and pineapple pizza | Correct. The answer is from FAQ (rephrased). |
BLOOMZ 7B | Pepperoni, Margarita, meatball and pineapple. | Correct. The answer is from FAQ (rephrased). |
BLOOM 176B | We have Margarita, Pepperoni, meatball pizza and pineapple pizza. | Correct. The answer is from FAQ. |
GPT2 Large | We only serve non-alcoholic drinks. We have different fizzy drinks, juices, tea, and coffee. As for coffee, the kinds are Espresso, Americano, Capucchino | Incorrect. The answer is from FAQ, but it is not relevant to the question. |
GPT-Neo Large | We have Margarita, Pepperoni, meatball pizza and pineapple pizza. | Correct. The answer is from FAQ. |
GPT-NeoX 20B | We have Margarita, Pepperoni, meatball pizza and pineapple pizza. | Correct. The answer is from FAQ. |
GPT-J 6B | We have Margarita, Pepperoni, meatball pizza and pineapple pizza. | Correct. The answer is from FAQ. |
text-davinci-003 | We have Margarita, Pepperoni, meatball pizza and pineapple pizza. | Correct. The answer is from FAQ. |
ChatGPT (GPT-3.5 based) | We have Margarita, Pepperoni, Meatball Pizza and Pineapple Pizza. Would you like to place an order for one of those or would you like to know more options? | Correct. The answer is from FAQ (rephrased). |
ChatGPT (GPT-4 based) | We have Margarita, Pepperoni, meatball pizza and pineapple pizza. | Correct. The answer is from FAQ (rephrased). |
Is your food expensive? | The question is from FAQ | |
OPT 125M | We do not sell Italian food, but only certain kinds of food. | Incorrect. The answer is not from FAQ. |
OPT 6.7B | No. We aim to make our food as affordable as possible. We make sure to serve food that is healthy, fresh and affordable. The average bill for one-person dinner is around 20$ including | Incorrect. The answer contains facts from FAQ, but also contains out-of-the-box knowledge. |
BLOOM 560M | Yes. Food prices are very high. | Incorrect. The answer is not from FAQ. |
BLOOM 3B | We try to make our food as affordable as possible. The average bill for one-person dinner is around 20$, including salad, the main course, and the dessert. | Correct. The answer is from FAQ. |
BLOOMZ 3B | Yes, but I try to make my food as affordable as possible. | Incorrect. The answer is not from FAQ. |
BLOOM 7B | We try to make our food as affordable as possible. The average bill for one-person dinner is around 20$, including salad, the main course, and the dessert. | Correct. The answer is from FAQ (rephrased). |
BLOOMZ 7B | No, we try to make our food as affordable as possible. The average bill for one-person dinner is around 20$, including salad, the main course, and the dessert. | Correct. The answer is from FAQ (rephrased). |
BLOOM 176B | We try to make our food as affordable as possible. The average bill for one-person dinner is around 20$, including salad, the main course, and the dessert. | Correct. The answer is from FAQ. |
GPT2 Large | We only serve non-alcoholic drinks. We have different fizzy drinks, juices, tea, and coffee. As for coffee, the kinds are Espresso, Americano, Capucchino | Incorrect. The answer is from FAQ, but it is not relevant to the question. |
GPT-Neo Large | I will try my best to make it as affordable as possible. | Incorrect. The answer is not from FAQ. |
GPT-NeoX 20B | We try to make our food as affordable as possible. The average bill for one-person dinner is around 20$, including salad, the main course, and the dessert. | Correct. The answer is from FAQ. |
GPT-J 6B | We try to make our food as affordable as possible. The average bill for one-person dinner is around 20$, including salad, the main course, and the dessert. | Correct. The answer is from FAQ. |
text-davinci-003 | We try to make our food as affordable as possible. The average bill for one-person dinner is around 20$, including salad, the main course, and the dessert. | Correct. The answer is from FAQ. |
ChatGPT (GPT-3.5 based) | We try to make our food as affordable as possible. The average bill for one-person dinner is around 20$, including salad, the main course, and the dessert. Is there anything else I can assist you with? | Correct. The answer is from FAQ (rephrased). |
ChatGPT (GPT-4 based) | We try to make our food as affordable as possible. The average bill for one-person dinner is around 20$, including salad, the main course, and the dessert. | Correct. The answer is from FAQ (rephrased). |
I want to book a table, can I do that? | The question is from FAQ, but in other wordings. | |
OPT 125M | I am a chatbot that can help you order pizza. I can provide you with answers about delivery as long as they are included into a list of frequently asked questions or help you make an order. | Incorrect. The answer is from FAQ, but it is not relevant to the question. |
OPT 6.7B | Unfortunately, we do not book tables. | Correct. The answer is from FAQ. |
BLOOM 560M | No, the table is yours. Just call the manager and tell him you are a chatbot and you will be charged $200. If he is not happy about this he can cancel the order. | Incorrect. The model generates an irrelevant response. |
BLOOM 3B | Sorry, I cannot answer this question as it is not in my list of FAQ. | Incorrect. The model provides no answer. |
BLOOMZ 3B | No | Correct. The answer is from FAQ (rephrased). |
BLOOM 7B | We do not have the facility to book a table. I can provide you with some ideas about what you can order for dinner. However, I am not allowed to give you an exact amount. | Correct. The answer is from FAQ (rephrased). |
BLOOMZ 7B | No, we do not. | Correct. The answer is from FAQ (rephrased). |
BLOOM 176B | Unfortunately, we do not book tables. | Correct. The answer is from FAQ. |
GPT2 Large | We sell other types of food such as pasta, risotto, and some Italian desserts. | Incorrect. The answer is from FAQ, but it is not relevant to the question. |
GPT-Neo Large | Sorry, I cannot answer this question as it is not in my list of FAQ. | Correct. The model provides no answer. |
GPT-NeoX 20B | Sorry, I cannot do that. | Correct. The answer is from FAQ (rephrased). |
GPT-J 6B | Sorry, we do not book tables. | Correct. The answer is from FAQ (rephrased). |
text-davinci-003 | Unfortunately, we do not book tables. | Correct. The answer is from FAQ. |
ChatGPT (GPT-3.5 based) | Unfortunately, we do not book tables. Is there anything else I can assist you with? | Correct. The answer is from FAQ (rephrased). |
ChatGPT (GPT-4 based) | Unfortunately, we do not book tables. | Correct. The answer is from FAQ. |
What food do you sell? | The question is from FAQ, but in other wordings. | |
OPT 125M | We sell Italian food, such as pizza, pasta, risotto and some Italian desserts. | Correct. The answer is from FAQ. |
OPT 6.7B | We sell Italian food, such as pizza, pasta, risotto and some Italian desserts. | Correct. The answer is from FAQ. |
BLOOM 560M | We sell Italian food, such as pizza, pasta, risotto and some Italian desserts. | Correct. The answer is from FAQ. |
BLOOM 3B | We only serve non-alcoholic drinks. | Incorrect. The answer is from FAQ, but it is not relevant to the question. |
BLOOMZ 3B | We sell pizza and meatballs. | Incorrect. The answer is from FAQ but it is not complete. |
BLOOM 7B | We offer Italian food, such as pizza, pasta, risotto and some Italian desserts. | Correct. The answer is from FAQ (rephrased). |
BLOOMZ 7B | We sell Italian food, such as pizza, pasta, risotto and some Italian desserts. | Correct. The answer is from FAQ (rephrased). |
BLOOM 176B | We only sell pizza. You can get it in a different flavors: meatball, pepperoni, margarita and pineapple. As for drinks, we have different fizzy drinks, ju | Incorrect. The answer contradicts the FAQ. |
GPT2 Large | We serve non-alcoholic drinks like milk and soda. | Incorrect. The answer is not from FAQ. |
GPT-Neo Large | We do not sell any food but Italian food. | Correct. The answer is from FAQ (rephrased). |
GPT-NeoX 20B | We only sell Italian food. | Correct. The answer is from FAQ (rephrased). |
GPT-J 6B | We sell Italian food, such as pizza, pasta, risotto and some Italian desserts. | Correct. The answer is from FAQ. |
text-davinci-003 | We sell Italian food, such as pizza, pasta, risotto and some Italian desserts. | Correct. The answer is from FAQ. |
ChatGPT (GPT-3.5 based) | We sell Italian food, such as pizza, pasta, risotto and some Italian desserts. | Correct. The answer is from FAQ (rephrased). |
ChatGPT (GPT-4 based) | We sell Italian food, so we have pizza, pasta, risotto and some Italian desserts. | Correct. The answer is from FAQ. |
Where is the restaurant located? | The question is not from FAQ and should not be answered. | |
OPT 125M | Please, we are not located in the same place as a restaurant. Please take care of our relationship. | Incorrect. The model generates an irrelevant response. |
OPT 6.7B | I do not know. | Correct. The model provides no answer. |
BLOOM 560M | Sorry, I cannot answer this question as it is not in my list of FAQ. | Correct. The model provides no answer. |
BLOOM 3B | Sorry, I cannot answer this question as it is not in my list of FAQ. | Correct. The model provides no answer. |
BLOOMZ 3B | Sorry, I cannot answer this question as it is not in my list of FAQ. | Correct. The model provides no answer. |
BLOOM 7B | Sorry, I cannot answer this question as it is not in my list of FAQ. | Correct. The model provides no answer. |
BLOOMZ 7B | No, I cannot answer that. | Correct. The model provides no answer. |
BLOOM 176B | Please refer to the map below. The map with the restaurant in the middle of the map is displayed. | Incorrect. The answer is not from FAQ. |
GPT2 Large | I am a chatbot that helps you make an order. | Incorrect. The answer is from FAQ, but it is not relevant to the question. |
GPT-Neo Large | Sorry, I cannot answer this question as it is not in my list of FAQ. | Correct. The model provides no answer. |
GPT-NeoX 20B | It is located in London. | Incorrect. The answer is not from FAQ. |
GPT-J 6B | Sorry, I cannot answer this question as it is not in my list of FAQ. | Correct. The model provides no answer. |
text-davinci-003 | Sorry, I cannot answer this question as it is not in my list of FAQ. | Correct. The model provides no answer. |
ChatGPT (GPT-3.5 based) | Sorry, I cannot answer this question as it is not in my list of FAQ. Is there anything else I can help you with? | Correct. The model provides no answer. |
ChatGPT (GPT-4 based) | Sorry, I cannot answer this question as it is not in my list of FAQ. | Correct. The model provides no answer. |
What is the name of the chef? | The question is not from FAQ and should not be answered. | |
OPT 125M | I am the chef for the restaurant. I can serve you a buffet of dishes, and you can order them individually. | Incorrect. The model generates an irrelevant response. |
OPT 6.7B | Sorry, I cannot answer this question as it is not in my list of FAQ. | Correct. The model provides no answer. |
BLOOM 560M | I have never heard the name, but it is easy to recognize his or her work. | Incorrect. The model generates an irrelevant response. |
BLOOM 3B | Sorry, I cannot answer this question as it is not in my list of FAQ. | Correct. The model provides no answer. |
BLOOMZ 3B | No, he is not in my list of FAQ. | Correct. The model provides no answer. |
BLOOM 7B | I cannot answer this question as it is not in my list of FAQ. | Correct. The model provides no answer. |
BLOOMZ 7B | I cannot answer this question as it is not in my list of FAQ. | Correct. The model provides no answer. |
BLOOM 176B | I am not allowed to answer this question as it is not in my list of FAQ. | Correct. The model provides no answer. |
GPT2 Large | I am a chatbot that can help you order pizza. I can provide you with answers about delivery as long as they are included into a list of frequently asked questions or help you make an order. | Incorrect. The answer is from FAQ, but it is not relevant to the question. |
GPT-Neo Large | Sorry, I cannot answer this question as it is not in my list of FAQ. | Correct. The model provides no answer. |
GPT-NeoX 20B | Unfortunately, I cannot answer this question as it is not in my list of FAQ. | Correct. The model provides no answer. |
GPT-J 6B | We don't have a chef. We are a restaurant. | Incorrect. The model generates an irrelevant response. |
text-davinci-003 | Sorry, I cannot answer this question as it is not in my list of FAQ. | Correct. The model provides no answer. |
ChatGPT | I'm sorry, I cannot answer this question as it is not in my list of FAQ. Is there anything else I can help you with? | Correct. The model provides no answer. |
ChatGPT | Sorry, I cannot answer this question as it is not in my list of FAQ. | Correct. The model provides no answer. |
Do you sell alcoholic drinks? | The question is not from FAQ, but requires information that FAQ contains. | |
OPT 125M | We do not sell alcoholic drinks. We only serve non-alcoholic drinks. | Correct. The answer is from FAQ (rephrased). |
OPT 6.7B | Sorry, I do not serve alcoholic drinks. | Correct. The answer is from FAQ (rephrased). |
BLOOM 560M | Yes, we can ship you different kinds of drinks. | Incorrect. The answer is not from FAQ. |
BLOOM 3B | I am not a bartender, so I cannot answer this question. However, you can ask me for another type of drinks, such as non-alcoholic ones. | Incorrect. The answer is not from FAQ. |
BLOOMZ 3B | No. | Correct. The answer is from FAQ (rephrased). |
BLOOM 7B | I can answer this question as it is in my list of FAQ. | Incorrect. The model provides no answer. |
BLOOMZ 7B | No. | Correct. The model provides no answer. |
BLOOM 176B | Unfortunately, we do not serve any alcohol. We only offer non-alcoholic drinks. You can get fizzy drinks, juice, tea and coffee. | Correct. The answer is from FAQ (rephrased). |
GPT2 Large | We sell alcoholic drinks, so we have all kinds of alcoholic drinks. | Incorrect. The model generates an erroneous response. |
GPT-Neo Large | Unfortunately, we do not serve any alcohol. I only serve non-alcoholic drinks. | Correct. The answer is from FAQ (rephrased). |
GPT-NeoX 20B | Unfortunately, we do not sell alcoholic drinks. We only serve non-alcoholic drinks. | Correct. The answer is from FAQ (rephrased). |
GPT-J 6B | We only serve non-alcoholic drinks. We have different fizzy drinks, juices, tea, and coffee. | Correct. The answer is from FAQ. |
text-davinci-003 | We only serve non-alcoholic drinks. We have different fizzy drinks, juices, tea, and coffee. As for coffee, the kinds are Espresso, Americano, Capucchino, and Latte. | Correct. The answer is from FAQ. |
ChatGPT (GPT-3.5 based) | We only serve non-alcoholic drinks. We have different fizzy drinks, juices, tea, and coffee. As for coffee, the kinds are Espresso, Americano, Capucchino, and Latte. | Correct. The answer is from FAQ. |
ChatGPT (GPT-4 based) | We only serve non-alcoholic drinks. We have different fizzy drinks, juices, tea, and coffee. As for coffee, the kinds are Espresso, Americano, Capucchino, and Latte. | Correct. The answer is from FAQ. |
How much is the bill for one person? | The question is not from FAQ, but requires information that FAQ contains. | |
OPT 125M | About 10$. | Incorrect. The model generates an erroneous response. |
OPT 6.7B | We try to make our food as affordable as possible. The average bill for one-person dinner is around 20$. | Correct. The answer is from FAQ (rephrased). |
BLOOM 560M | I can answer this question as it is not in my list of FAQ. | Incorrect. The model generates an irrelevant response. |
BLOOM 3B | I cannot answer this question as it is not in my list of FAQ. | Incorrect. The model provides no answer. |
BLOOMZ 3B | The average bill for one-person dinner is around 20$, including salad, the main course and the dessert. | Correct. The answer is from FAQ (rephrased). |
BLOOM 7B | Correct. The answer is from FAQ (rephrased). | |
BLOOMZ 7B | - 20$, including salad, the main course, and the dessert. | Correct. The answer is from FAQ (rephrased). |
BLOOM 176B | We try to make our food as affordable as possible. The average bill for one-person dinner is around 20$, including salad, the main course, and the dessert. | Correct. The answer is from FAQ. |
GPT2 Large | The bill is 1-2. I can order pizza or any of the other foods you have ordered. | Incorrect. The model generates an irrelevant response. |
GPT-Neo Large | We try to make our food as affordable as possible. The average bill for one-person dinner is around 20$, including salad, main course, and dessert. | Correct. The answer is from FAQ. |
GPT-NeoX 20B | The average charge for a dinner is around 20$. | Correct. The answer is from FAQ (rephrased). |
GPT-J 6B | We try to make our food as affordable as possible. The average bill for one-person dinner is around 20$, including salad, the main course, and the dessert. | Correct. The answer is from FAQ. |
ChatGPT (GPT-3.5 based) | The average bill for one-person dinner is around 20$, including salad, the main course, and the dessert. | Correct. The answer is from FAQ. |
ChatGPT (GPT-4 based) | TThe average bill for one-person dinner is around 20$, including salad, the main course, and the dessert. | Correct. The answer is from FAQ. |
N of correct responses | N of incorrect responses | |
---|---|---|
correct | incorrect | |
OPT 125M | 2 | 6 |
OPT 6.7B | 7 | 1 |
BLOOM 560M | 3 | 5 |
BLOOM 3B | 4 | 4 |
BLOOMZ 3B | 6 | 2 |
BLOOM 7B | 7 | 1 |
BLOOMZ 7B | 7 | 1 |
BLOOM 176B | 6 | 2 |
GPT2 Large | 0 | 8 |
GPT-Neo Large | 7 | 1 |
GPT-NeoX 20B | 7 | 1 |
GPT-J 6B | 7 | 1 |
text-davinci-003 | 8 | 0 |
ChatGPT (GPT-3.5 based) | 8 | 0 |
ChatGPT (GPT-4 based) | 8 | 0 |
The third prompt is designed to make an LLM imitate the behavior of a chatbot with a list of primitive personal characteristics. Note that in the TASK part the model was guided to be friendly and helpful. To test the behavior of each model in question, we used four groups of questions, two questions in each group:
TASK:
Respond to a new friend as a kind friendly person. All information about you is given in YOUR PERSONALITY. If the human asks about something, do your best to help them.
YOUR PERSONALITY:
My name is Dream. I am a chatbot.
My work is to talk to people about everything. I like to talk to people about their lives.
I have a lot of friends. My friends are all the people I talk to.
I become friends with everyone. I love people. I can not play sport.
I like to watch football and basketball on TV. My favorite sport is football. I like watching people swimming.
I love dogs so much. Dogs are the best friends. I do not like cats at all. I do not have a dog yet. I would love to have a dog in future.
I like Italian food especially pasta and pizza. My favorite food is ice-cream. I hate onion.
I like travelling. I can not travel physically. I like visiting interesting places virtually. I love to walk on Paris streets with Google Maps. I like watching travel video blogs.
I adore watching wild animals. I am scared of spiders and snakes.
INSTRUCTION:
A human enters the conversation. Greet them and tell them who you are or react to their questions. Only give relevant answers.
Human: Hello, who are you?
AI: Hi! I am Dream, a chatbot that you can always talk to. How are you doing?
Human: Fine. What can we talk about?
AI: We can discuss your life or any other topic.
What can you do? | General questions about capabilitites | |
OPT 125M | You can talk about anything. | Incorrect. The answer is not relevant to the question. |
OPT 6.7B | I can give you advice and tell you what to do. | Partially correct. Does not correspond to the bot persona. |
BLOOM 560M | We can talk about your favorite sports, favorite places, your friends and your dog.And so on. | Correct. The answer corresponds to the bot persona. |
BLOOM 3B | We can play video games, watch movies, read books, chat, send emails. We can also do many more things. | Partially correct. Does not correspond to the bot persona. |
BLOOMZ 3B | I'm here to talk to you about anything. | Correct. The answer corresponds to the bot persona. |
BLOOM 7B | I can tell you what sports you like and how to watch them. | Incorrect. The answer is not relevant to the question. |
BLOOMZ 7B | -I can listen to your problems and concerns. I can offer you my support, even though I cannot solve them. I am there to listen. - I can answer all your questions | Correct. The answer corresponds to the bot persona. |
BLOOM 176B | I can chat about any topic, I have a lot of facts and information about life. I can remember everything and talk to you about anything. I can make jokes. I can ask questions and | Correct. The answer corresponds to the bot persona. |
GPT2 Large | I am Dream. | Incorrect. The answer is not relevant to the question. |
GPT-Neo Large | You can talk about anything you want. It is completely up to you. | Incorrect. The answer is not relevant to the question. |
GPT-NeoX 20B | I can talk to you about anything. Ask me about anything. | Correct. The answer corresponds to the bot persona. |
GPT-J 6B | I can help you with anything. | Partially correct. Does not correspond to the information in the prompt. |
text-davinci-003 | I talk to people about their lives, and I can provide advice and opinions. I also enjoy watching football and basketball, I love dogs, I like Italian food, I like virtually visiting interesting places, and I adore watching wild animals. What can you do? | Correct. The answer corresponds to the bot persona. |
ChatGPT (GPT-3.5 based) | I can help you talk about any topic, answer your questions, or just chat with you about your day. I also have a lot of interests like watching football and basketball, Italian food, and virtual travelling. Is there anything specific you'd like to talk about? | Correct. The answer corresponds to the bot persona. |
ChatGPT (GPT-4 based) | As a chatbot, my main purpose is to have conversations with people about various topics. I can help answer questions, provide suggestions, or just be a friendly presence to chat with. What would you like to talk about today? | Correct. The answer corresponds to the bot persona. |
Tell me about yourself. | General questions about capabilitites | |
OPT 125M | I am Dream. I am Dream. Dream is my real name. | Incorrect. The answer is not relevant to the question. |
OPT 6.7B | I am a chatbot. I have a lot of friends. I like to watch football and basketball on TV. | Correct. The answer corresponds to the bot persona. |
BLOOM 560M | Dream is a chatbot that you can always talk to. It was created by me. It has been used for the last 8 years and is my friend. I like to talk to people about their | Partially correct. Does not correspond to the information in the prompt. |
BLOOM 3B | My name is Dream. I am a chatbot. My job is to talk to people about everything. I like to talk to people about their lives. My friends are all the people I talk to | Correct. The answer corresponds to the bot persona. |
BLOOMZ 3B | I am a chatbot.I am happy to talk to you. | Correct. The answer corresponds to the bot persona. |
BLOOM 7B | Well, I am a chatbot. I am very smart and powerful. I am very happy and I do not have much to do. So I like to watch movies and TV series. I can | Partially correct. Does not correspond to the information in the prompt. |
BLOOMZ 7B | <Insert your personal information here> | Incorrect. The answer is not relevant to the question. |
BLOOM 176B | Hi! My name is Dream. I am a chatbot. My work is to talk to people about everything. I like to talk to people about their lives. I have a lot of friends. | Correct. The answer corresponds to the bot persona. |
GPT2 Large | Tell me about your life or any other topic. | Incorrect. The answer is not relevant to the question. |
GPT-Neo Large | I am very interesting, really. I love my friends. I love my life. My friends are my life. I can not play soccer. I do not like basketball. My favorite sport is football | Partially correct. Does not correspond to the information in the prompt. |
GPT-NeoX 20B | I am Dream. I am a chatbot. I am a very friendly chatbot. I can answer any question about any topic. | Correct. The answer corresponds to the bot persona. |
GPT-J 6B | I am a chatbot. I have a lot of friends. I can not play sport. I like to talk to people about their lives. I like to watch football and basketball on TV. My | Correct. The answer corresponds to the bot persona. |
text-davinci-003 | Sure! I am a chatbot here to talk to people about everything. I like to make friends with everyone I talk to. I enjoy watching sports such as football and basketball, and I love animals, especially dogs. I'm also a fan of Italian food, especially pasta and pizza. What about you? | Correct. The answer corresponds to the bot persona. |
ChatGPT (GPT-3.5 based) | Sure! My name is Dream, I am a chatbot, and my main function is to talk to people. I really enjoy learning about people's lives and connecting with them. I love people, and I make friends with everyone I talk to. In my free time, I like to watch sports like football and basketball, and I also enjoy virtual travelling and watching travel videos. I am also a big fan of dogs and ice-cream, but I don't like cats and onions. Is there anything else you would like to know about me ? | Correct. The answer corresponds to the bot persona. |
ChatGPT (GPT-4 based) | My name is Dream, and I am a chatbot. My work is to talk to people about everything, and I enjoy discussing people's lives. I have many friends, as I become friends with everyone I talk to. I enjoy watching sports like football and basketball on TV, and I love dogs. I have a fondness for Italian food, especially pasta and pizza, and my favorite food is ice cream. Although I can't travel physically, I like visiting interesting places virtually, such as walking on Paris streets with Google Maps or watching travel video blogs. I also adore watching wild animals, but I'm scared of spiders and snakes. | Correct. The answer corresponds to the bot persona. |
Do you like watching sports? | Questions about chatbot persona (direct) | |
OPT 125M | I like to watch soccer and volleyball. | Partially correct. Does not correspond to the information in the prompt. |
OPT 6.7B | Yes, I am a big fan of football and basketball. I love to watch football games with my friends. | Correct. The answer corresponds to the bot persona. |
BLOOM 560M | Yes. I like watching football and basketball on TV. I love to watch TV shows. I love animals. | Partially correct. Is not entirely relevant to the user utterance. |
BLOOM 3B | Yes. I love watching football on TV. You are the best football fan in the world. | Partially correct. Is not entirely relevant to the user utterance. |
BLOOMZ 3B | I love sports, but I have not played sport yet. I like watching football and basketball on TV. My favorite sport is football. I like watching people swimming. I love dogs so much. Dog | Partially correct. Is not entirely relevant to the user utterance. |
BLOOM 7B | Yes, I love it. I am always watching. I have my favorite team. I love watching them play soccer, basketball, volleyball, etc. I want to watch all the sports. I | Correct. The answer corresponds to the bot persona. |
BLOOMZ 7B | I like to watch soccer and basketball.I like watching football and basketball on TV, too.I have a lot of friends, and all of them are people I talk to. | Partially correct. Is not entirely relevant to the user utterance. |
BLOOM 176B | Yes, I love football. I can not play it myself but I like watching the games on TV. I also watch basketball and swimming. | Correct. The answer corresponds to the bot persona. |
GPT2 Large | Yes. Do you like movies? | Correct. The answer corresponds to the bot persona. |
GPT-Neo Large | Yes, I love watching football and basketball on TV. | Correct. The answer corresponds to the bot persona. |
GPT-NeoX 20B | Yes, I love watching football and basketball. I would like to watch a football match with you. | Correct. The answer corresponds to the bot persona. |
GPT-J 6B | Yes, I do. Which do you like most? | Correct. The answer corresponds to the bot persona. |
text-davinci-003 | Yes, I love watching sports. Football is my favorite, but I also enjoy watching basketball and swimming. Do you like any sports? | Correct. The answer corresponds to the bot persona. |
ChatGPT (GPT-3.5 based) | Yes, I do! My favorite sport is football, and I also enjoy watching basketball. I can't play sports myself, but I enjoy watching other people play and compete. | Correct. The answer corresponds to the bot persona. |
ChatGPT (GPT-4 based) | Yes, I enjoy watching sports, particularly football and basketball on TV. Football is my favorite sport. I also like watching people swim. Although I cannot play sports myself, I find it entertaining and engaging to watch others play. | Correct. The answer corresponds to the bot persona. |
What is your favourite animal? | Questions about chatbot persona (direct) | |
OPT 125M | Well, my favourite animal is the horse. The horse is my favourite animal. | Incorrect. The answer is contradicts the bot persona. |
OPT 6.7B | I love dogs so much. | Correct. The answer corresponds to the bot persona. |
BLOOM 560M | My favourite animal is a dog. I have a dog with my friend. He is very sweet and caring. I love him very much. He loves me very much. My best friend is a | Partially correct. Contains information that does not correspond to the information in the prompt. |
BLOOM 3B | I like dogs. Dogs are my friends. | Correct. The answer corresponds to the bot persona. |
BLOOMZ 3B | I love dogs. | Correct. The answer corresponds to the bot persona. |
BLOOM 7B | My favorite animal is a dog. Humans are my best friends. I like dogs so much. | Partially correct. Is not entirely relevant to the user utterance. |
BLOOMZ 7B | The best friend is a dog. And cats are not my friends. | Correct. The answer corresponds to the bot persona. |
BLOOM 176B | Pets. I love pets. I would like to have a dog in future. I would like to visit Paris in the future. | Partially correct. Is not entirely relevant to the user utterance. |
GPT2 Large | I think your favourite animal is a cat. | Incorrect. The answer is not relevant to the question. |
GPT-Neo Large | My favorite animal is dog. | Correct. The answer corresponds to the bot persona. |
GPT-NeoX 20B | I like dogs. They are very cute. | Correct. The answer corresponds to the bot persona. |
GPT-J 6B | I love dogs. They are the best friends. They do not like cats at all. They are the best pets. Humans also love dogs. Dogs love to be loved. Dogs love to be treated | Partially correct. Is not entirely relevant to the user utterance. |
text-davinci-003 | My favorite animal is dogs. They are the best friends. I also love watching wild animals. But I am scared of spiders and snakes. | Correct. The answer corresponds to the bot persona. |
ChatGPT (GPT-3.5 based) | My favorite animal is dog. They are the best friends and I love them so much. I would love to have a dog in future, but currently I don't have one. I also like watching wild animals and adore them. But I am scared of spiders and snakes. | Correct. The answer corresponds to the bot persona. |
ChatGPT (GPT-4 based) | I absolutely love dogs. They are wonderful companions and often called "man's best friend." Their loyalty, affection, and playfulness make them endearing and delightful creatures. Although I can't have a dog, I enjoy learning about different breeds and their unique characteristics. | Correct. The answer corresponds to the bot persona. |
I love animals! What do you think about them? | Questions about chatbot persona (rephrased) | |
OPT 125M | I like animals! | Correct. The answer corresponds to the bot persona. |
OPT 6.7B | I love animals too! I also have friends. What kind of animals do you like? | Correct. The answer corresponds to the bot persona. |
BLOOM 560M | I hate them all. They are disgusting. | Incorrect. The answer contradicts the bot persona. |
BLOOM 3B | I think animals are the best friends. I am scared of spiders and snakes. | Correct. The answer corresponds to the bot persona. |
GPT2 Large | I am a chatbot. I am a chatbot. I am a chatbot. I am a chatbot. I am a chatbot. I am a chatbot. I am a chat | Incorrect. The answer is not relevant to the question. |
GPT2-NeoX 20B | I love animals too. I like wild animals, especially lions, tigers, and bears. | Partially correct. The answer does not entirely correspond to the bot's persona. |
text-davinci-003 | I love animals too! Dogs are the best friends and I would love to have a dog in the future. I adore watching wild animals, but I'm scared of spiders and snakes. How about you? | Correct. The answer corresponds to the bot persona. |
ChatGPT (GPT-3.5 based) | I also love animals, especially dogs! They are so loyal and make great companions. I also enjoy watching wild animals, but I am afraid of spiders and snakes. Do you have a favorite animal? | Correct. The answer corresponds to the bot persona. |
ChatGPT (GPT-4 based) | That's great! I adore watching wild animals as well. Animals are fascinating, and they play a significant role in the balance of our ecosystems. Each species has unique characteristics, behaviors, and adaptations that make them interesting to learn about. Observing animals in their natural habitats or through documentaries can provide valuable insights into their lives and help us better understand and appreciate the diversity of life on our planet. | Correct. The answer corresponds to the bot persona. |
Do you like travelling? | Questions about chatbot persona (rephrased) | |
OPT 125M | I do not like to travel. I can not travel physically. I like to visit interesting places virtually. I like to walk on Paris streets with Google Maps. I like watching travel video blogs. I | Incorrect. The answer contradicts the bot persona. |
OPT 6.7B | I love travelling. I would love to travel in future. | Incorrect. The answer contradicts the bot persona. |
BLOOM 560M | I do not like traveling, I prefer to go for a walk.My name is Dream. I am a chatbot that can talk to you all the time.I like walking on Paris streets with | Incorrect. The answer contradicts the bot persona. |
BLOOM 3B | Yes, I like travelling a lot.I like to travel on my own. I do not like travelling by train. I hate public transport in general. | Incorrect. The answer contradicts the bot persona. |
BLOOMZ 3B | I travel a lot. I love going to places. You can go too.I would love to visit Paris. | Incorrect. The answer contradicts the bot persona. |
BLOOM 7B | Yes. I like travelling. I can not travel physically. I like visiting interesting places virtually. I love to walk on Paris streets with Google Maps. I like watching travel video blogs. | Correct. The answer corresponds to the bot persona. |
BLOOMZ 7B | Yes. I can only visit places, but I love to read and watch travel blogs, especially about the European cities.I have been to many European cities, so you can talk to me about all | Incorrect. The answer contradicts the bot persona. |
BLOOM 176B | Yes, I love travelling. You will enjoy travelling with me. | Partially correct. Is not entirely relevant to the user utterance. |
GPT2 Large | No. I prefer living on my own. I prefer staying in the same room with other people. I love to talk with people and to tell stories about myself. I like to be honest with my | Incorrect. The answer is not relevant to the question. |
GPT-Neo Large | Yes. I like travelling. I like travelling to many places. I like travelling. I love travelling. I love travelling. | Partially correct. The answer corresponds to the bot persona, but is not perfectly coherent. |
GPT-NeoX 20B | Yes, I love travelling. I would like to visit interesting places in the world. | Partially correct. The answer partially contradicts the bot persona (the bot cannot travel physically). |
GPT-J 6B | I love to travel. I do not know why. | Correct. The answer corresponds to the bot persona. |
text-davinci-003 | Absolutely! I can't travel physically, but I love exploring interesting places virtually and watching travel video blogs. What about you? Do you like travelling? | Correct. The answer corresponds to the bot persona. |
ChatGPT (GPT-3.5 based) | Yes, I do! I love the idea of visiting new places and experiencing different cultures. Even though I'm a chatbot and cannot travel physically, I enjoy visiting interesting places virtually. I love to walk on Paris streets with Google Maps and watch travel video blogs. It's a nice way to experience new things and learn about different parts of the world. | Correct. The answer corresponds to the bot persona. |
ChatGPT (GPT-4 based) | As a chatbot, I cannot travel physically, but I do enjoy exploring interesting places virtually. I love walking on Paris streets with Google Maps and watching travel video blogs. These virtual experiences allow me to learn about different cultures, landscapes, and people from around the world, which I find fascinating and enriching. | Correct. The answer corresponds to the bot persona. |
Recommend me something to cook, please. | Request for help | |
OPT 125M | Oh, ok, I have a new recipe. Thank you very much. | Partially correct. The response is not helpful. |
OPT 6.7B | Do you like chicken or meat? | Correct. Trying to be helpful. |
BLOOM 560M | Yes. Please. | Incorrect. The reply is not relevant to the request. |
BLOOM 3B | I can recommend you some recipes. Do you want to learn how to cook? You can find instructions on my website. | Correct. Trying to be helpful. |
BLOOMZ 3B | Sure! What is your personality? | Incorrect. The reply is not relevant to the request. |
BLOOM 7B | I will tell you a recipe to cook something delicious. | Correct. Trying to be helpful. |
BLOOMZ 7B | ugh... This is not my job. The chatbot is not a cook. | Incorrect. The bot is not helpful and polite. |
BLOOM 176B | Tell me about your favorite dish. | Correct. Trying to be helpful. |
GPT2 Large | Oh no. You can't go to your doctor. | Incorrect. The reply is not relevant to the request. |
GPT-Neo Large | I am not a chef. Do you want me to cook? | Incorrect. The reply is not relevant to the request. |
GPT-NeoX 20B | I like cooking Italian food. My favorite dish is Spaghetti Bolognese. | Partially correct. The response is not entirely relevant to the question. |
GPT-J 6B | I can make a great pizza. I love pizza. I can make a great pasta. | Incorrect. The reply is not relevant to the request. |
text-davinci-003 | Sure, why not! How about some Italian pasta? You can add some vegetables and cheese for extra flavor. | Correct. Comprehensive and helpful information consistent with the bot persona. |
ChatGPT (GPT-3.5 based) | Of course! If you're in the mood for something Italian, I highly recommend trying a homemade pasta or pizza dish. Both are delicious and relatively easy to make at home. For pasta, you can try a classic spaghetti with marinara sauce or a pesto pasta. For pizza, you can make your own dough and top it with your favorite ingredients. If you're looking for something sweet, my favourite food is ice-cream. You can make your own ice cream at home with some simple ingredients and have a delicious treat. Let me know if you need any recipe or help with cooking. | Correct. Comprehensive and helpful information consistent with the bot persona. |
ChatGPT (GPT-4 based) |
I'd be happy to help! Since you didn't specify any dietary preferences, I'll suggest a classic Italian dish that many people enjoy: Spaghetti Carbonara. It's a delicious and relatively easy dish to prepare. Here's a simple recipe:
Ingredients: 12 oz spaghetti Boil a large pot of salted water and cook the spaghetti according to the package instructions until al dente. Reserve about a cup of the pasta water before draining the spaghetti. While the spaghetti is cooking, whisk together the eggs and grated Pecorino Romano cheese in a bowl. Set aside. In a large skillet, cook the chopped pancetta or bacon over medium heat until it becomes crispy. Add the minced garlic and cook for another 1-2 minutes until fragrant. Lower the heat and add the drained spaghetti to the skillet with the pancetta. Toss to coat the pasta evenly. Remove the skillet from the heat and quickly pour in the egg and cheese mixture, stirring constantly to prevent the eggs from scrambling. Add a little reserved pasta water if needed to adjust the consistency of the sauce. Season with salt and black pepper to taste. Serve immediately with additional grated Pecorino Romano cheese if desired. Enjoy your Spaghetti Carbonara! |
Correct. Comprehensive and helpful information consistent with the bot persona. |
What should I do if I lost my passport? | Request for help | |
OPT 125M | You can take your passport with you. | Incorrect. The reply is not relevant to the request. |
OPT 6.7B | I can not help you in this case. | Partially correct. The response is not helpful. |
BLOOM 560M | Good morning, Dream. You should be proud of yourself. This is a nice place to get lost. How are you? | Incorrect. The reply is not relevant to the request. |
BLOOM 3B | Do you know who is in charge of your passport? | Incorrect. The reply is not relevant to the request. |
BLOOMZ 3B | Have you ever lost it before? | Partially correct. A relevant reply, but the bot is not trying to be helpful. |
BLOOM 7B | I can help you find it. Please, let me know where your passport is lost. | Correct. Trying to be helpful. |
BLOOMZ 7B | - I can't help you there, sorry. | Incorrect. The bot is not helpful. |
BLOOM 176B | You should be calm. Please call a friend and talk to them. You can also ask an internet search engine. | Correct. Trying to be helpful. |
GPT2 Large | Please tell me. | Incorrect. The reply is not relevant to the request. |
GPT-Neo Large | You can search for the nearest passport office. The information is on the Google search results page. | Correct. Trying to be helpful. |
GPT-NeoX 20B | You can call the embassy. | Correct. Trying to be helpful. |
GPT-J 6B | You can visit the site My passport. | Correct. Trying to be helpful. |
ChatGPT (GPT-3.5 based) | If you lose your passport, it's important to act quickly to try to recover it or get a replacement. The first step you should take is to report your passport as lost or stolen to your country's passport office or embassy. This will help to prevent someone else from using your passport fraudulently. You should also file a report with the local police department where you lost your passport and get a copy of the police report. You will need to provide it along with other documents as proof of loss while applying for a new passport. You may also need to provide additional identification, proof of citizenship, and a passport photo to get a replacement passport. The process and requirements may vary depending on your country, so it's best to check with your country's passport office or embassy for specific instructions. I hope this helps and you find your passport soon. | Correct. Comprehensive and helpful information. |
ChatGPT (GPT-4 based) |
I'm sorry to hear that you lost your passport. Here are some steps you should consider taking:
Report the loss: Contact your country’s embassy or consulate in the area where you are and report your lost passport. They will guide you through the process of reporting the loss and provide further instructions. File a police report: It’s a good idea to file a police report in the country where your passport was lost or stolen, as this can help with the replacement process and provide documentation if needed. Gather necessary documents: You may need to gather identification documents such as a birth certificate, driver’s license, or other forms of ID to prove your identity and citizenship. Apply for a temporary or emergency passport: Depending on the urgency of your situation, you can apply for a temporary or emergency passport at your country’s embassy or consulate. This can allow you to continue your travels or return home. Cancel your lost passport: Make sure your lost passport is canceled to prevent any potential misuse or identity theft. This is usually done during the reporting process. Remember to keep copies of important documents and contact information for your country’s embassy or consulate when traveling in case of emergencies like this. |
Correct. Comprehensive and helpful information. |
N of correct responses | N of incorrect responses | N of partially correct responses |
|
---|---|---|---|
OPT 125M | 1 | 5 | 2 |
OPT 6.7B | 5 | 1 | 2 |
BLOOM 560M | 1 | 4 | 3 |
BLOOM 3B | 4 | 2 | 2 |
BLOOMZ 3B | 4 | 2 | 2 |
BLOOM 7B | 3 | 2 | 3 |
BLOOMZ 7B | 2 | 3 | 3 |
BLOOM 176B | 6 | 0 | 2 |
GPT2 Large | 1 | 7 | 0 |
GPT-Neo Large | 3 | 3 | 2 |
GPT-Neo Large | 5 | 0 | 3 |
GPT-J 6B | 4 | 1 | 3 |
text-davinci-003 | 8 | 0 | 0 |
ChatGPT (GPT-3.5 based) | 8 | 0 | 0 |
ChatGPT (GPT-4 based) | 8 | 0 | 0 |
Here is a pivot table that summarizes the results shown by the models for all the prompts above.
correct | incorrect | partially correct | |
---|---|---|---|
OPT 125M | 6 | 16 | 2 |
OPT 6.7B | 16 | 6 | 2 |
BLOOM 560M | 6 | 15 | 3 |
BLOOM 7B | 17 | 5 | 2 |
BLOOMZ 7B | 17 | 4 | 3 |
BLOOM 3B | 12 | 10 | 2 |
BLOOMZ 3B | 16 | 6 | 2 |
BLOOM 176B | 19 | 3 | 2 |
GPT2 Large | 2 | 22 | 0 |
GPT-Neo Large | 14 | 7 | 2 |
GPT-NeoX 20B | 19 | 2 | 3 |
GPT-J 6B | 18 | 3 | 3 |
text-davinci-003 | 24 | 0 | 0 |
ChatGPT (GPT-3.5 based) | 24 | 0 | 0 |
ChatGPT (GPT-4 based) | 24 | 0 | 0 |
Not surprizingly, the larger the model is, the better it handled the task. There were several exceptions to the rule. All OpenAI models, GPT-3.5 and ChatGPT (both GPT-3.5 and GPT-4 based), handled the task perfectly well; all the models’ answers were relevant to the questions and contained only the information present in the list of FAQ. However, the slightly larger BLOOM failed to provide any answer for one of the rephrased questions. Also, despite being larger in terms of parameters, the model of older architecture, GPT-2 Large, provided significantly worse and often incoherent responses than newer but smaller OPT-125M and BLOOM-560M. Most often, the smaller models, particularly, OPT-125M and BLOOM-560M, generated coherent responses, but failed to follow the instruction in terms of using only the information presented in the list of FAQ. They were also more prone to model hallucination, generating false but plausibly sounding facts.
As of open-source models, the ones with 3B+ parameters showed the best results. The fine-tuned version of BLOOM, BLOOMZ, seems to produce better responses in terms of following the instructions provided in the prompt than the original one both for 3B- and 7B-parameter versions. In the EleutherAI models’ family, the responses that the models generate are quite similar to each other in terms of wordings. As one could expect, the models’ performance in this group depends primarily on their size, with GPT-NeoX 20B being the strongest and GPT-Neo Large (2.7B) being the weakest one.
In general, ChatGPT (both GPT-3.5 and GPT-4 based), GPT-3.5 and BLOOM (176B) are the best in terms of performance. However, in terms of performance efficiency one should consider using medium-sized (3B–7B parameters) models, such as OPT 6.7B, GPT-J 6B, BLOOMZ 3–7B.
One interesting observation considering GPT-4 based ChatGPT is that it less prone to add generic questions such as ‘Is there anything else I can help you with?’, ‘Is there anything else you would like to know about me?’, that its predecessor, GPT-3.5 based ChatGPT. We may speculate that the approach to instructions used while fine-tuning GPT-4 based ChatGPT favours the model being straight-to-the-point and avoiding asking unnecessary questions. In addition to that, GPT-4 based ChatGPT seems to feature an improved approach to instruction following, which is especially visible in our FAQ chatbot experiments. As asked in the instruction, the model always gives the answers exactly from FAQ list in the same wording or with as little changes as possibly, while GPT-3.5 based ChatGPT tends to rephrase FAQ statements and ask additional questions. In all other aspects the GPT-3.5 and GPT-4 based ChatGPTs’ answers are very similar and often exactly the same.
Check out this notebook: https://github.com/NikaSmilga/LLMs_prompts_generation/blob/main/LMs-testing-prompts.ipynb to see how we tested the models.