Building an Effective Knowledge Hub with Answerly
Proper dataset construction is vital for chatbot technologies like ChatGPT to work optimally.
To achieve reliable performance, it is necessary to format and structure data correctly.
Feeding a dataset without formatting can result in inconsistent responses or instances where the chatbot generates incorrect or nonsensical information.
This issue can arise in the following scenarios:
-
Too much information: when there is a large amount of information, such as PDFs or when using web crawling to fetch multiple URLs without proper formatting.
-
Not enough information: when we do not provide enough information, ChatGPT will fill in the missing pieces with false information. This is referred to as hallucinations, where ChatGPT creates fake content to compensate for the missing content.
-
Duplicated or overlapping content: we should avoid duplicating the same content or repeating information within the knowledge hub.
Understanding Chatbot Limitations and Data Interpretation
Chatbots are designed to efficiently extract and utilize structured data.
Some early adopters of chatbot technology mistakenly believed that providing a URL or feeding unformatted data would be enough for the chatbot to understand and accurately provide all the content.
However, this approach works well only when the content is well-written and the website does not contain complex documentation or a large number of products.
When dealing with complex data, it is almost always necessary to create properly formatted content
Creating a dataset using the crawling features best pratice
It is extremely convenient to create our knowledge base in just a few clicks.
However, at times, we may want to verify that the retrieved data is properly formatted.
How can we ensure that the collected data is the one we truly need?
Fortunately there are tools which can facilitate the construction of the dataset information:
I have created a video using Voila and Harpa to demonstrate how you can create a dataset by utilizing the crawling functionality.
Best Practices + Examples
To minimize errors and improve the accuracy of the chatbot, please follow these guidelines when building your knowledge hub:
Make sure to enter a brief description of your business and Personality.
Create a plain document using one of the available document datasets set to explain contact information, address, general information about the company, what it sells or where it sells, shipping information, etc.
Tips. Create a section at the bottom of the document dedicated for the Q&A question formatted in the following way:
*###
Frequently Asked Questions
Question: “Is the product imported from Mexico?”
Answer: “All our products are imported from Mexico.”
Question: “How many beverages do you have available?”
Answer: “We have 16 beverages, all from Mexico.”
Question: “Where is the company/business located?”
Answer: “We are located in Toronto, Canada.”
*###
Please note i have used the 3 hashtag sign to delimiter the start and end of the Q&A question
Product Listing (E-commerce or Real Estate)
When it comes to listings products or services, consider using structured data formats such as Google Sheets.
This approach facilitates easier data interpretation by the chatbot.
Formatting Your Dataset
Ensure your dataset includes the following columns with the corresponding data:
- Item ID: A unique identifier for each product.
- Category: The classification or type of product.
- Description: A clear and concise description of the product. Incorporate keywords and phrases that users might input during a search to trigger accurate responses.
- Product Link: A direct URL to the product page.
Example of a Formatted Google Sheets for product entries
ID | Category | Description | Product Link |
---|---|---|---|
12312 | Electronics | Kodak Compact camera with 10x zoom (Professional camera ) | http://www.example.com/12312 |
23423 | Kitchenware | Stainless steel chef knife ( professional knife ) | http://www.example.com/23423 |
34534 | outdoor Equipment | Waterproof and durable 4-person tent ( ipx6 water protection ) | http://www.example.com/34534 |
Please note: The description should accurately depict the article, while also using terms that are likely to be searched by users. For example, include phrases such as “professional camera” and “best for traveling.”
If you are building a listing for a real estateHere is a link to a real estate Google Sheets example that you can use to create your own listing.