The chatbot blends a diverse set of conversational skills — including empathy, knowledge, and personality — together in one system.
"This is the first time a chatbot has learned to blend several conversational skills — including the ability to assume a persona, discuss nearly any topic, and show empathy — in natural, 14-turn conversation flows," according to Facebook.
Facebook's team achieved this through a new chatbot recipe that includes improved decoding techniques, novel blending of skills, and a model with 9.4 billion parameters, which is 3.6x more than the largest existing system.
Facebook's team used previously available public domain conversations that involved 1.5 billion training examples of extracted conversations. Provided that the neural networks are too large to fit on a single device, Facebook utilized techniques such as column-wise model parallelism, which allows us to split the neural network into smaller, more manageable pieces while maintaining maximum efficiency.
The company has released the complete model, code, and evaluation set-up, so that other AI researchers will be able to reproduce this work and continue to advance conversational AI research.
To evaluate their model, Facebook's team benchmarked Blender's performance against Google’s latest Meena chatbot through pairwise human evaluations. Since their model has not been released, Facebook used the roughly 100 publicly released and randomized logs for this evaluation. Using the ACUTE-Eval method, human evaluators were shown a series of dialogues between humans paired with each respective chatbot. They were asked:
“Who would you prefer to talk to for a long conversation?” (showing engagingness)
“Which speaker sounds more human?” (showing humanness)
When presented with chats showing Meena in action and chats showing Blender in action, 67 percent of the evaluators said that Facebook's model sounds more human, and 75 percent said that they would rather have a long conversation with Blender than with Meena.
"We’re excited about the progress we’ve made in improving open-domain chatbots. However, we are still far from achieving human-level intelligence in dialogue systems. Though it’s rare, our best models still make mistakes, like contradiction or repetition, and can “hallucinate” knowledge, as is seen in other generative systems. Human evaluations are also generally conducted using relatively brief conversations, and we’d most likely find that sufficiently long conversations would make these issues more apparent, "Facwbook said.
The company is currently exploring ways to further improve the conversational quality of our models in longer conversations with new architectures and different loss functions. The team is also focused on building stronger classifiers to filter out harmful language in dialogues.