Building an Effective Knowledge Hub with Answerly

Building an Effective Knowledge Hub with Answerly

Proper dataset construction is vital for chatbot technologies like ChatGPT to work optimally.
To achieve reliable performance, it is necessary to format and structure data correctly.
Feeding a dataset without formatting can result in inconsistent responses or instances where the chatbot generates incorrect or nonsensical information.

This issue can arise in the following scenarios:

  1. Too much information: when there is a large amount of information, such as PDFs or when using web crawling to fetch multiple URLs without proper formatting.

  2. Not enough information: when we do not provide enough information, ChatGPT will fill in the missing pieces with false information. This is referred to as hallucinations, where ChatGPT creates fake content to compensate for the missing content.

  3. Duplicated or overlapping content: we should avoid duplicating the same content or repeating information within the knowledge hub.

Understanding Chatbot Limitations and Data Interpretation

Chatbots are designed to efficiently extract and utilize structured data.
Some early adopters of chatbot technology mistakenly believed that providing a URL or feeding unformatted data would be enough for the chatbot to understand and accurately provide all the content.

However, this approach works well only when the content is well-written and the website does not contain complex documentation or a large number of products.
When dealing with complex data, it is almost always necessary to create properly formatted content

Creating a dataset using the crawling features best pratice

It is extremely convenient to create our knowledge base in just a few clicks.
However, at times, we may want to verify that the retrieved data is properly formatted.

How can we ensure that the collected data is the one we truly need?

Fortunately there are tools which can facilitate the construction of the dataset information:

I have created a video using Voila and Harpa to demonstrate how you can create a dataset by utilizing the crawling functionality.

Best Practices + Examples

To minimize errors and improve the accuracy of the chatbot, please follow these guidelines when building your knowledge hub:

Make sure to enter a brief description of your business and Personality.

Create a plain document using one of the available document datasets set to explain contact information, address, general information about the company, what it sells or where it sells, shipping information, etc.

Tips. Create a section at the bottom of the document dedicated for the Q&A question formatted in the following way:

Frequently Asked Questions

Question: “Is the product imported from Mexico?”

Answer: “All our products are imported from Mexico.”

Question: “How many beverages do you have available?”
Answer: “We have 16 beverages, all from Mexico.”

Question: “Where is the company/business located?”

Answer: “We are located in Toronto, Canada.”


Please note i have used the 3 hashtag sign to delimiter the start and end of the Q&A question

Product Listing (E-commerce or Real Estate)

When it comes to listings products or services, consider using structured data formats such as Google Sheets.
This approach facilitates easier data interpretation by the chatbot.

Formatting Your Dataset

Ensure your dataset includes the following columns with the corresponding data:

  • Item ID: A unique identifier for each product.
  • Category: The classification or type of product.
  • Description: A clear and concise description of the product. Incorporate keywords and phrases that users might input during a search to trigger accurate responses.
  • Product Link: A direct URL to the product page.

Example of a Formatted Google Sheets for product entries

ID Category Description Product Link
12312 Electronics Kodak Compact camera with 10x zoom (Professional camera )
23423 Kitchenware Stainless steel chef knife ( professional knife )
34534 outdoor Equipment Waterproof and durable 4-person tent ( ipx6 water protection )

Please note: The description should accurately depict the article, while also using terms that are likely to be searched by users. For example, include phrases such as “professional camera” and “best for traveling.”

If you are building a listing for a real estateHere is a link to a real estate Google Sheets example that you can use to create your own listing.


Thanks Simone, this will be helpful.

1 Like

This is great Simone, maybe good to link to this (or a version of it) in the Training section of the app

1 Like

Amazing tips, very helpful Thanks a lot @Simone

1 Like

I originally started with URL’s. Then, I noticed that the agent was not getting certain details correct, even though they were on the website. So, I created Google Docs that outlined everything in a manner like you suggested above. Should I go back and delete the URL’s so I don’t have overlapping data?

1 Like

Yes, delete them don’t keep duplicated URLs. I have also responded to your original post.

Is there a way I can delete some url’s from the sitemap?

You can exclude page URLs by modifying your sitemap. however, I don’t think this is the correct approach.

You can include individual webpages using the one-time training dataset “webpage,” but if your website is constantly updating, you will lose the real-time synchronization that you get by using the sitemap Website dataset.

Ok, I deleted my URL and the several Google Docs I had created and uploaded one master Google Doc. Things have improved. However, my agent is not always giving prices or hours. These items are clearly defined in the Google Doc. Would it be better to organize numerical items such as costs and hours of operation in a Google Sheet instead? Or, what would you recommend?

1 Like

Hi Duncan,

i have sent you a private message.

Of course, you can also use Google Sheets to set up the opening hours or times for activities. I was playing around with Google Sheets and came up with an idea to tell the agent the current day. So if somebody asks, “Are you open today?” the agent will be aware of the current date. It will also work if somebody asks if you are open tomorrow or 10 days from now, or if they insert an exact date. I have made a short video to explain how it’s done. Also, I will leave here the formulas that I’m using on Google Sheets.

Basic current date formula : =TODAY()
The one i’m using inside the sheets: ="Today date is " & TEXT(TODAY(), “dddd/mmmm/dd”)

The only problem that I can think of is if somebody asks if we are open today one minute past midnight and the Google Sheets real-time sync hasn’t kicked in yet. But this should not be a big problem.

Oh my goodness, this looks amazing! I’m getting ready to go on vacation for a week, so I’ll build this when I return. Thank you.

Since I am travelling for a week, I am trying to make sure the agent is at least able to provide a schedule when someone asks. Currently, it’s the one thing it’s getting very wrong. I’ve created an iframe that directs to the url when someone asks if we’re open, are we open tomorrow or what our hours are. Since it’s not acknowledging the iframe when that query is made, I uploaded the url to the hours of operation page to see if that would solve it and it has not. Can you assist with what is going wrong here?

Simone, I can’t seem to get this to work properly. My agent still doesn’t know the date and time and is providing incorrect times. I’m trying to get this right with just one program before adding all the other program hours. Here’s what I’ve got so far. Please tell me where the error is- Hours of Operation Chatbot - Google Tabellen

Hi Duncan, you need to tell the chatbot that today’s date is: [insert date]. Otherwise, it will just be a date thrown into the Google Sheet.

If you need further assistance with this, please make sure to open a new topic in the community.

1 Like


I made the correction you recommended. I’m still not getting correct responses back. In fact, I’m getting a hallucinated response when I ask when we close. Agent says 8 pm. I can’t find 8 pm in any of our documentation. However, when I asked what today was, the agent got it right. That was a first, so I feel like I’m getting closer.

I also have iFrames setup that point to our schedule when someone asks “If we’re open.” Perhaps that is creating conflicting information. Here’s the latest hours of operation spreadsheet- Hours of Operation Chatbot - Google Tabellen

Hi Duncan, I’m sending you a private message so we can continue this conversation from there.

Question: Can you structure If/Then responses in the Google doc knowledge base?

For instance:
If: If a customer asks for replacement parts of a specific product
Then: Ask them to send an email to with an image of their product and we’ll see if we can find it for them.

If: A customer asks for a specific product.
Then: If the brand they’re looking for is a part of our range of brands, send him a link to that sight, otherwise tell them that it’s not part of our ordinary stock, that they can browse the website to look for something similar or be contacted by an expert to help them further.

Is this possible some way and how would i structure it so it’s most effective?

I feel like it is sort of like Q/A but a slight variation.

1 Like


For the second part, you will need to create a list in Google Sheets that includes the product ID, name, and url to check if it is part of the stock. you must provide clear instructions on how to search for a product. Using an ID or serial number would be the best option for searching.

1 Like

Is product ID just something random, but unique i make up for each? (right now the product range sheet looks like this )

But do you think i can add these “If/Then instructions within the google doc”? Perhaps in a FAQ style.

I already have a bunch of FAQ
Question: XXXX
Answer: XXXX

Question: XXX
Answer: XXX

No could i below that do another ###

If: a customer blablabla…
Then: blablbalbal…

If: A customer blablabla…
Then: blablbabla…

i’m not sure if this will work, i think you can just test this in a small scale and see the results.