The storage pipeline comprises the following stages:
To effectively utilize Custom KG in your skill, it is recommended to follow these steps:
The main database that contains all saved data. It utilizes TerminusDB as data store. You can CRUD to KG using the DeepPavlov-KG
librariy’s modules: graph
and ontology
.
SQLite database that has one table storing user_id, entity_id, entity_name, entity_kind. Its goal is to provide fast access to check existence of relationships between the user and some entities, as it is stored on the user’s machine. You can CRUD to Index using DeepPavlov-KG index
module.
TerminusDB Server is a service for connecting to a TerminusDB server, which runs locally.
To run a TerminusDB server locally, first execute the following command, which starts the terminusdb-server Docker container from the docker-compose.override.yml
file
docker-compose -f docker-compose.yml -f assistant_dists/dream_kg/docker-compose.override.yml -f assistant_dists/dream_kg/dev.yml up --build terminusdb-server
Then, ensure you provide the following env variables in the container, that would use the TerminusDB Server, in docker-compose.override.yml
:
TERMINUSDB_SERVER_URL=http://terminusdb-server:6363
TERMINUSDB_SERVER_TEAM=admin
TERMINUSDB_SERVER_DB=<NAME_YOUR_DB>
TERMINUSDB_SERVER_PASSWORD=root
INDEX_LOAD_PATH=/root/.deeppavlov/downloads/entity_linking_eng/custom_el_eng_dream
Please update the DB name, as well as the team and password values if you have made any changes to the default Terminusdb settings.
An example of connecting to the database after running the server is demonstrated in terminusdb/test.py
.
Custom Entity Linking allows developers to index their own knowledge graphs and link entities to them. The annotator verifies entities connected to the user in the index, disregarding those that lack a connection to the current user. If the input entity exists in the database and is connected to the user through any relationship, custom entity linking will pass it to the output regardless of the specific relationship name. This service prioritizes the existence of the entity and the connection with user rather than the exact triplet match. For example, if the input is ['user', 'like', 'banana']
, and in the database the relationship is ['user', 'hate', 'banana']
, the entity 'banana'
will still be linked to the extracted entity from the utterance.
The input of this service is the list of dictionaries produced by property-extraction service, and its output is information about entities that exist in DB, particularly speaking, in Index (SQLite).
Example of Property Extraction usage:
>>> url_custom_el = 'http://0.0.0.0:8075/model'
>>> data ={"entity_substr": [["pizza"]], "entity_tags": [["misc"]], "context": [["Maybe I will order pizza for lunch."]]}
>>> requests.post(url_custom_el, json=data).json()
[{"entity_substr": "pizza", "entity_ids": ["Food/68e82b41-b8bb-40d3-b4a0-73f8ff6bced5"], "confidences": [1.0]}]
The Property Extraction annotator within the DREAM platforme is instrumental in extracting user attributes for a specific individual. This enables a dialog assistant to acquire information about the user’s preferred film, dish, location, etc., and utilize this knowledge to generate personalized responses.
The annotator is capable of extracting multiple user attributes from utterances in the form of (subject, predicate, object) triplets. The subject is designated as “user,” the relation represents the attribute name, and the object denotes the attribute value. For instance, given the utterance “I love going for a walk with my two dogs every day,” the service will extract the following triplets: [<user, like_activity, walking>, <user, have_pet, two dogs>]
. The annotator currently supports extraction of 61 distinct user attributes.
Property Extraction annotator consists of the following components:
The models were trained utilizing the DialogueNLI dataset, which comprises 59.2K samples in the training set, 6.2K in the validation set, and 6.1K in the testing set. The DialogueNLI triplets contain 61 distinct relation types, with the top 10 most frequently occurring relations enumerated in the table below.
Relation | Number of samples |
---|---|
have_pet | 5184 |
like_activity | 4620 |
has_profession | 2920 |
has_hobby | 2864 |
have | 2824 |
have_children | 2713 |
like_general | 2559 |
other | 2159 |
like_food | 1997 |
misc_attribute | 1722 |
Comparison with other solutions:
Model | DialogueNLI, F1 |
---|---|
DREAM property extraction | 0.44 |
GenRe |
0.44 |
Two-stage Attribute Extractor |
0.28 |
Examples of Property Extraction usage:
>>> property_extraction_url = "http://0.0.0.0:8136/respond"
>>> requests.post(property_extraction_url, json = {"utterances": [["i live in moscow"]]}).json()
[[{"triplets": [{"object": "moscow", "relation": "live in citystatecountry", "subject": "user"}]}]]
>>> requests.post(property_extraction_url, json = {"utterances": [["My favorite city in Italy is Venice. And what's yours?"]]}).json()
[[{"triplets": [{"object": "venice", "relation": "favorite place", "subject": "user"}]}]]```
The User Knowledge Memorizer is an annotator that stores user data, such as properties and preferences, into a knowledge graph utilizing the DeepPavlov-KG
Python library as an API interface. Additionally, it saves data to an Index, which is SQLite database that provides quick access to data. Index is utilized by Custom Entity Linking
service for linking entities. The annotator saves data to the index simultaneously with saving it to the KG, ensuring that the index remains up-to-date and accurate. This service monitors the output of the Property Extraction
process to obtain triplets from the user’s utterance and compares them with the output of the Custom Entity Linking
service, then it stores triplets, that are not already present in the database.
Here is a diagram of how the annotator algorithm works.
Example
# For first run
user_km_url = "http://0.0.0.0:8027/respond"
data = {
'utterances': [
{
'text': 'i have a dog and a cat',
'user': {'id': 'b75d2700259bdc44sdsdf85e7f530ed'},
'annotations': {
'property_extraction': [{
'triplets': [
{'subject': 'user', 'relation': 'HAVE PET', 'object': 'dog'},
{'subject': 'user', 'relation': 'LIKE GOTO', 'object': 'park'}
]
}],
'custom_entity_linking': []
}
}
]
}
requests.post(user_km_url, json = data).json()
"""
Output:
[{'added_to_graph': [['User/b75d2700259bdc44sdsdf85e7f530ed', 'HAVE PET', 'Animal/bd7ca1ce-9fbf-46d4-8fab-7611a688df00'], ['User/b75d2700259bdc44sdsdf85e7f530ed', 'LIKE GOTO', 'Place/e0544fc0-8e4d-4f37-8817-9cc05e03642b']], 'triplets_already_in_graph': []}]
"""
# For further runs
user_km_url = "http://0.0.0.0:8027/respond"
data = {
'utterances': [
{
'text': 'i have a dog and a cat',
'user': {'id': 'b75d2700259bdc44sdsdf85e7f530ed'},
'annotations': {
'property_extraction': [
{'triplets': [
{'subject': 'user', 'relation': 'HAVE PET', 'object': 'dog'},
{'subject': 'user', 'relation': 'LIKE GOTO', 'object': 'park'}
]}
],
'custom_entity_linking': [
{
'entity_substr': 'dog',
'entity_ids': ['Animal/18e1401b-e085-4300-821b-3593f7402dbd'],
'confidences': [1.0],
'tokens_match_conf': [1.0],
'entity_id_tags': ['Animal']
}, {
'entity_substr': 'park',
'entity_ids': ['Place/915bd244-2ce5-4011-b569-ed2822a73f4c'],
'confidences': [1.0],
'tokens_match_conf': [1.0],
'entity_id_tags': ['Place']
}
]
}
}
]
}
requests.post(user_km_url, json = data).json()
"""
Output:
[{'added_to_graph': [], 'triplets_already_in_graph': [['User/b75d2700259bdc44sdsdf85e7f530ed', 'HAVE PET', 'Animal/18e1401b-e085-4300-821b-3593f7402dbd'], ['User/b75d2700259bdc44sdsdf85e7f530ed', 'LIKE GOTO', 'Place/915bd244-2ce5-4011-b569-ed2822a73f4c']]}]
"""