Chatbot, voicebot, voice applications, IVR, GPT-3 … let’s be clear!

Chatbot, voicebot, voice applications, IVR, GPT-3 … let’s be clear!

The universe of conversational systems now proposes various solutions, which are updated very often.
With this post, I will try to provide some clarity, giving you some ideas on how you can think and choose according to your needs.

Voice application

If you want to create a voice application, in the Amazon environment (therefore a skill for Alexa), you will use the Alexa Skills Kit:

If you want to implement a Google Action, there are at least two solutions: Dialogflow ( or Actions Builder (available in the Actions on Google console: aog-console-ab).

Of course, the second is a new tool, and it was created ad hoc for Google Actions, so if your project is limited to this, there is no doubt. If, on the other hand, the Google Action is one of the touchpoints through which a conversational agent is made available, then Dialogflow could be the most suitable solution.

In the following video, you will find a comparison between the two tools.

From Dialogflow to Actions Builder

In the following post, however, some of my considerations:

Chatbot and Voicebot

In this case, for the creation of a chatbot available on a web page, but also in other channels such as Facebook Messenger, Telegram, Slack, etc., the range of possibilities is considerably wide.

The first choice must be made between an “on-prem” and “cloud basedNLU (Natural Language Understanding) system.

In the first case, I recommend using Rasa (, a very flexible open source system, which allows different options for deployment (on-prem, private-cloud, third-party cloud) , which can integrate virtually anywhere and has a very active community. It is clear that using such a solution implies greater technical knowledge.

In the second case, there are several solutions,

and many others.

The level of all these services is very high, and the choice between the different systems must be made, in addition to the price, considering

  • the tools they offer to support (for the creation of the agent, for the training, for the analysis services, etc.) and
  • the integrations available for the target channels.

All services, however, can be used via API, and this guarantees the possibility of integration with basically any ecosystem.

How does voice interaction happen instead? This depends on the touchpoint through which the user will interact with the conversational agent. It is possible to use TTS (Text To Speech) and STT (Speech To Text) systems interfaced with the agent, in case you need, for example, customized voices, or use those that are usually integrated into the NLU engine.

Dialogflow APIs, for example, if properly configured, also accept audio files instead of text input and return the audio corresponding to the agent’s response.

Telephone systems (IVR)

The creation of an automated telephone system is simply the use of one of the touchpoints through which it is possible to expose the conversational agent.

Usually NLU engines offer native interfaces with cloud telephone services. This allows, for example, to use a number for customer service, to which the agent developed with Dialogflow, with Rasa, or others answers directly.

Even if the interfacing is not native, the communication via API between these systems allows a very flexible management of phone calls and communication flow.

GPT-3 VS Specialized Conversational Agents

If we had a challenge between a chatbot developed with Rasa and GPT-3, who would win?

Mark Ryan made a fantastic test described by the following post: and this video.

GPT-3 vs RASA chatbot

In practice, a conversational agent was made with Rasa answering generic movie questions (developed in 4 months and trained with a large database), and 7 questions were asked to the chatbot and GPT-3.

Rasa answered 6 questions correctly, GPT-3 instead 5, but, giving suggestions to GPT-3, achieves the same result.

So can a system trained with generic data replace specialized agents on specific domains? Absolutely not! Imagine, for example, if, instead of film, the agent had been an expert in a much less generic domain such as assistance on CNC machines .. how would the challenge have gone?

It’s amazing, though, how GPT-3, with very little additional training, equals an agent that took months to develop.

In my opinion, and I’ve always been supporting it since, the combination of the two systems can offer a complete service, that is an agent prepared on the main topic and support for generic requests, or a second opportunity to give answers in case the agent training is not enough.


As you can guess, the solution that fits your needs must be evaluated through careful analysis, and can include the use of even more elements seen in the post. This is the approach we use in Voice Branding to create customized conversational systems.

Voice Branding: voice solutions for brands

Head of SEO, Head of Voice Technology, AI Conversation Designer @ site By site // Autore di Voice Technology — Dario Flaccovio Editore