A Developer’s Guide to Build an AI Application

artificial intelligence, Technology , , , , , , ,

Introduction

Artificial Intelligence is rapidly becoming a mainstream technology that is helping transform and empowers us in unexpected ways. Let us take a trip to remote Nepal to see a fascinating example. Like the vast majority of Nepalese, Melisha Ghimere came from a remote village from a family of subsistence farmers who raised cows, goats, and water buffalos. Seven years ago, she watched her relatively wealthy uncle and aunt lose a lot of their herd to an outbreak of anthrax; they were never to recover their economic footing. Melissa went on to college thinking about the plight of her family. In college, she worked to develop a predictive early warning solution to help farmers. With a team of four students, they researched livestock farming, veterinary practices, and spoke to farmers. They built a prototype for a monitoring device that tracks temperature, sleep patterns, stress levels, motion, and the activity of farm animals. Melisha’s AI system predicts the likely health of each animal based on often subtle changes in these observations. Farmers are able to track their animals, receive alerts, and actionable recommendations. Although her project is still in its infancy, the field tests have shown the solution was about 95% accurate in predicting risks to an ani‐ mal’s health. Melissa and her team were able to help a family prevent a deadly outbreak of an anthrax infection by identifying a diseased cow before symptoms were evident to the farmer. Melisha’s team was a regional finalist in Microsoft’s Imagine Cup competition in 2016.1
Let me give you another example much closer to home, the power of AI in transforming the retail experience. Lowes Innovation Labs has now created a unique prototype shopping experience for home remodeling. For example, a customer
can now walk in and share her dream kitchen photos with a design specialist. Using an AI-powered application, the design specialist gains a deep insight into the customer’s style and preference. The application generates a match from the Lowe’s dream kitchen collection, and the design of the kitchen is then shown in very realistic holographic mixed-reality through a Hololens.2 The customer can now visualize, explore, and change the design to his taste in the mixed reality environment in real time. Applications like these are vanguards of the revolution in retail experiences that AI will bring for consumers.
Healthcare is another field that is at the cusp of a revolution. With the power of AI and a variety of data sources from genomics, electronic medical records, medical literature, and population data, scientists are now able to predict health emergencies, diagnose better, and optimize care. A unique example in this area comes from Cochrane, a highly reputed nonprofit organization dedicated to gathering and summarizing the best evidence from research to help doctors make informed choices about treatment. Cochrane conducts systematic reviews, which digests and analyzes explosively growing medical literature, and reduces it into fairly short and manageable pieces of work to give doctors the best possible guidance on the effects of healthcare interventions. For example, a recent systematic review of medical studies looked at whether steroids can help with the maturation of premature babies’ lungs. The review showed conclusively that steroids can help save the babies’ lives. This intervention has helped hundreds of thousands of premature babies. However, such reviews are very labor intensive and can take two to three years to complete. Cochrane’s Project Transform was born out of the need to make systematic reviews more efficient, give more timely and relevant guidance to doctors, and therefore help save more lives. Project Transform uses AI to manipulate and analyze the literature and data very efficiently and therefore allow researchers to understand the data and interpret the findings. It creates a perfect partnership between human and machine, where a significant amount of the heavy overhead of systematic reviews is reduced, and the human analysis skills can be directed where they are most needed for timeliness and quality.
There’s no field that will be left untouched by the transformational power of AI. I can point you to fields as diverse as astronomy where AI has accelerated the pace of new discoveries, and the area of conservation where ecologists and conservationists are working with AI-powered tools to help track, study, and protect elusive and endangered animals.
A lot of times we become bogged down in the discussions of the appropriate algorithm or tools, but the real power of AI resides in the ideas and questions that precede it.
It’s the conservationist pondering on how to create sustainable habitats, the doctor wondering how to better serve their patient, the astronomer’s and citizen scientist’s curiosity that expands our collective consciousness to the outer limits of the universe. AI has the potential to empower the noblest of human causes, and we are just at the beginning. The field is still nascent, and yet these breakthroughs highlight the explosive power of AI in reshaping our daily experiences, how we do business, and how we live our lives.
Five decades ago, the early inventors in AI could only dream of what most consumers take for granted today. From voice-powered assistants like Cortana, Siri, or Alexa, to smartphones and self-driving cars, we seem to be living in “sci-fi” pages. What do the next two decades hold for us? Five decades? At Microsoft, we have made it our mission to advance AI innovations by democratizing AI tools in the same way that we democratized the power of computing in the mainframe era by envisioning a personal computer in every home, school, and workplace.
As educator and computing pioneer Alan Kay said, “The best way to predict the future is to create it.” In the same spirit, we are writing this book to give developers a start on creating the future with AI. In this book, we will show you how to create your first AI application in the cloud, and in the process learn about the wealth of AI resources and capabilities that are now rapidly becoming available to programmers. The application we create will be an AI-infused Bot, a “Conference Buddy,” that helps create a novel Question and Answer experience for the attendees and speakers participating in a conference. As we build this Bot, you will get a glimpse into how AI can help understand conversations, perceive vast amounts of information, and respond intelligently. In the process, you will also get a glimpse into the landscape of AI tools and emerging developments in the field.
We selected a chatbot as our example because it is a relatively easy entry point into AI, and in the process, we highlight resources and links to help you dig deeper. Chatbots are also ubiquitous, with interesting implementations, and transforming the way in which we interact with computers. We also give you a wider lens on the landscape of AI tools and a glimpse into exciting new developments in the field.
“The Intersection of Cloud, Data, and AI”
In the rest of this section, we will introduce AI and the powerful intersection of data, cloud, and AI tools that is creating a paradigm shift, helping enable systems of intelligence.
“The Microsoft AI Platform”
Here, we explore the Microsoft AI platform and point out the tools, infrastructure, and services that are available for developing AI applications.
“Developing an Intelligent Chatbot”
This section presents a discussion of chatbots, conversational AI, and high‐ lights some chatbot implementations. How do you create an intelligent chat‐ bot for the enterprise? We provide a high-level architecture using the Conference Buddy bot example, including code samples; discuss design considerations and technologies involved; and take a deep dive into the abstraction layer of the bot, which we call the Bot Brain.
“Adding “Plug and Play” Intelligence to Your Bot”
This section explores how you easily give the bot new skills and capabilities such as vision, translation, speech, and other customs AI abilities as well as how you develop the Bot Brain’s intelligence.
“Building an Enterprise App to Gain Bot Insights: The Conference Buddy Dashboard”
This section highlights the Conference Buddy dashboard, which allows the conference speaker and attendees to see the attendees’ questions and answer them in real-time. We also discuss how to instrument the Bot to get metrics and application insights.
“Paving the Road Ahead”
In the final section, we consider an exciting development in the AI world with the release of Open Neural Network Exchange (ONNX) and also Microsoft’s commitment to the six ethical principles—fairness, reliability and safety, privacy and security, inclusivity, transparency, and accountability—to guide the cross-disciplinary development and use of AI.


The Intersection of Cloud, Data, and AI

We define AI as a set of technologies that enable computers to assist and solve problems in ways that are similar to humans by perceiving, learning, and reasoning. We are enabling computers to learn from vast amounts of data, and interact more naturally and responsively with the world, rather than following pre-programmed routines.3 Technologies are being developed to teach computers to “see,” “hear,” “understand,” and “reason.”4 The key groups of capabilities include:
Computer vision
This is the ability of computers to “see” by recognizing objects and their relationships in a picture or video.
Speech recognition and synthesis
This is the ability of computers to “listen” by understanding the words that people say and to transcribe them into text, and also to read text aloud in a natural voice.
Language understanding
The ability of computers to “comprehend” the meaning of words and respond, considering the many nuances and complexities of language (such as slang and idiomatic expressions). When computers can effectively participate in a dialog with humans, we call it “conversational AI.”
Knowledge
The ability of a computer to “reason” by representing and understanding the relationship between people, things, places, and events.
What do these capabilities mean in the context of enterprise applications? The power of AI is powering applications that reason by unlocking the power of all data collected over time, across repositories and massive datasets through machine learning. These AI-powered systems understand and create meaning in unstructured data such as email, chats, and handwritten notes, all of which we previously could not process. And, more important, the systems are interacting with customers and engaging them in different channels and in ways that are hyper-personalized.
In the same vein, businesses are using AI-powered applications to digitally transform every aspect of their organizations including transforming their products through insights from customer data, optimizing business operations by predict‐ ing anomalies and improving efficiencies, empowering their employees through intelligent tools, and engaging their customers through conversational agents that deliver more customized experiences.
The following are examples of the questions that power the engines running AI applications:
Classifications
Which category does it belong to?
Regression
How much? How many?
Anomaly
Is it weird?
Clustering
How is it organized?
So how do you begin to design AI-powered solutions that take advantage of all the aforementioned capabilities?
We design AI solutions to complement and unlock human potential and creative pursuits. There are significant implications of what it means to design technology for humans, and this includes considering ethical implications; understanding the context of how people work, play, and live; and creating tailored solutions that adapt over time.
One of the most fascinating areas of research is bridging emotional and cognitive intelligence to create conversational AI systems that model human language and have insight into the logical and unpredictable ways human interact.
According to Lili Cheng, corporate vice president of Microsoft AI and Research, “This likely means AI needs to recognize when people are more effective on their own—when to get out of the way, when not to help, when not to record, when not to interrupt or distract.”5
The time for AI is now, given the proliferation of data, the limitless availability of computing powers on the cloud, and the rise of powerful algorithms that are powering the future.

Modern AI: Intersection of Data, Cloud Computing, and AI

Although AI research has been ongoing for decades, the past few years have seen a leap in practical innovations, catalyzed by vast amounts of digital data, online services, and enormous computing power. As a result, technologies such as natural-language understanding, sentiment analysis, speech recognition, image understanding, and machine learning have become accurate enough to power applications across a broad range of industries.
Let’s examine the three important developments that are helping create modern AI: data and the digital transformation, cloud computing, and AI algorithms and tools.
Data and the digital transformation
At the center of AI is data, and the increasing digitization of our age is resulting in the proliferation of what is known as big data. Out of approximately 7.4 billion people on Earth, more than 4 billion own mobile devices and 3.8 billion are connected to the internet, and these numbers are projected to keep growing. The vast majority of new information in the world is now generated and consumed online, and an increasingly large fraction of the economy is migrating to online services, from shopping to banking, entertainment, media, and communications. As our lives have become increasingly digitized and sensors (microphones, cameras, location, and other sensors) have become cheap and ubiquitous, more data than ever before is available from which computers can learn and reason. At the same time, as we engage in online interactions and transactions digitally, new response and feedback data are generated that allows AI algorithms to adapt and optimize interactions.
The staggering amount and growth rate of data has led to significant innovation in how we efficiently store, manage, and mine the data for flexible, real-time analysis. Most such data flows to public or private clouds over the internet. “Big Data” systems help to handle the heterogeneous nature of such data and support many analysis methods, such as statistical analysis, machine learning, data mining, and deep learning.
Such systems are at the heart of what makes it possible for computers to “see,” “hear,” and “reason,” and to discern patterns often undetectable to human eyes.
Cloud computing
The internet and the digital transformation of the world, in turn, helped catalyze cloud computing. Processing the data and delivering large-scale online services requires massive computing power, reliable networks, storage, and data processing. The cloud provides a powerful foundation and platform to handle these challenges. It allows you to stream data from connected devices, offers massive data storage capacity and elastic, scalable computing power to integrate, analyze, and learn from the data.
You can also get the largest servers, latest GPUs, and latest cutting-edge hardware like Field Programmable Gate Arrays (FGPAs) to accelerate demanding computations without the exorbitant overhead cost of building and provisioning data centers and server farms. Enormous connectivity allows every type of device— what we know as the Internet of Things (IoT)—to bring massive amounts of data into the cloud on a real-time basis for analysis and AI at scale. Furthermore, the cloud provides the necessary infrastructure and tools to offer enterprise-grade security, availability, compliance, and manageability for the applications and services deployed on the cloud.
AI algorithms and tools
The explosion of use cases for AI-driven by online services and the digital transformation, in turn, catalyzed enormous progress in AI algorithms. One of the most profound innovations in recent years has been deep learning. This technique, inspired by neural networks in the brain, allows computers to learn deep concepts, relationships, and representations from vast amounts of data (such as images, video, and text), and perform tasks such as object and speech recognition with accuracy comparable to humans. Today, open source tools such as Cognitive Toolkit, PyTorch, and Tensorflow make deep learning innovations accessible to a wide audience. And all the major cloud vendors now have services that substantially simplify AI development to empower software engineers.
Modern AI lives at the intersection of these three powerful trends: digital data from which AI systems learn, cloud-hosted services that enable AI-powered interactions, and continuing innovations in algorithms that make the AI capabilities more powerful while enabling novel applications and use cases.

Your First Bot: The Scenario

Now, let’s now look at how you can build your first bot. Imagine you are attending a technology conference presentation with hundreds of other enthusiastic attendees. As the speaker is presenting, you have a running list of questions. You want to ask your questions but:
• It is not Q&A time.
• You don’t relish the idea of speaking up in public.
• You didn’t raise your hand high enough or weren’t picked during Q&A.
• You have a language barrier and cannot communicate fluently in the speak‐ er’s native language.
The reasons go on and on. Most people don’t have an opportunity to fully engage with the speaker and content during conferences because of the logistics or other barriers.
What if you had a “Conference Buddy” chatbot that you could ask your questions as they occur to you and get answers as you go? And those questions also get routed on a dashboard where the speaker can engage and answer questions from the audience in real time.
The Conference Buddy chat client that we are going to build will have three functions:
Answer your greetings and introduce itself


Answer some of your questions intelligently and automatically, when possible

Route your question for the speaker to a dashboard so the speaker can see all the questions from the audience, pick the question to answer, and engage
 

Conversation Flow: An Example of the Conference Buddy Bot in Action

To get an idea of how the Conference Buddy bot works in action, let’s examine a typical conversation flow:
1. The user invokes the Conference Bot by sending the first message.
2. The Conference Bot responds with a greeting and introduction of what it can do.
3. The user asks a question; for example, “Who is Lili Cheng?”
4. The Conference Bot routes the message to LUIS to determine the intent of the message: LUIS parses the message and, for our example, returns “This is an Ask Who Task.”
5. The Conference Bot then selects the appropriate bot task within the Bot Brain to call via HTTP Post. In our example, the “Ask Who” task will do the following:
a. Send the string to Bing Web Search and grab the results.
b. Send the string to Bing Image Search in parallel.
c. Combine the image and text into a response object/data contract that the Conference Bot understands.
6. The Conference Bot sends a graphical card as results to the user.
7. The Conference Bot sends results to Azure Search to be archived so that the dashboard can use it.
8. The user can click the link on the card to get more information from the source of the article.
Let’s demonstrate the “Learn More” task to illustrate this entire process:
1. Suppose that the user asks, “I want to learn more about Azure Machine Learning.”
2. The Conference Bot routes the message to LUIS to determine the intent of the message: LUIS parses the message and, for our example, returns “This is a Learn More task.”
3. The Conference Bot then selects the appropriate bot task to call via HTTP Post to process the message: in our example, “The Learn More Task” will call Text Analytics to extract key phrases and send parallel requests to the follow‐ ing:
a. Video Indexer: Video Indexer is a Cognitive Service that will get the transcript of the video, break it into keywords, annotate the video, analyze sentiment, and moderate content. You can upload specific videos related to the session, and it will play the video right at the moment at which the speaker is discussing the keyword entered.
b. Bing Custom Search: Enables the power of Bing Search on a restricted number of websites to increase the relevancy and speed of results. In the case of the Conference Buddy bot, we included websites that dealt only with the conference themes.
c. Bing Web Search: Bing Web Search is activated in case the prior Video Indexer and Bing Custom Search don’t yield any results.
Now let’s look at some of the design considerations and take a deeper dive into the bot’s architecture.
 

Conference Buddy Bot Architecture Details

Let’s take a deeper dive into the Conference Buddy bot architecture details and explore the code samples that power the chatbot.
Root Dialog

Whereas a traditional application starts with a main screen and users can navigate back to start over, with bots you have the Root Dialog. The Root Dialog guides the conversation flow. From a UI perspective, each dialog acts like a new screen. This way, dialogs help the developer to logically separate out the various areas of bot functionality.
For the Conference Buddy bot, each dialog invokes the next, depending on what the user types and the intent. This is called a waterfall dialog. A waterfall dialog is a type of dialog that allows the bot to easily walk a user through a series of tasks or collect information. The tasks are implemented as an array of functions where the results of the first function are passed as input into the next function, and so on. Each function typically represents one step in the overall process. At each step, the bot prompts the user for input, waits for a response, and then passes the result to the next step.
So, let’s consider our Conference Buddy bot. If the user types:
“Hello there, buddy!”
The Root Dialog will send the string to LUIS and wait for a response. LUIS will evaluate the string and send back a JSON object with the results. For each intent,
LUIS gives a confidence score, it highlights the topScoringIntent and identifies
the entities in the query, as well. The following code shows an example response:
{
“query”: “Hello there, buddy”, “topScoringIntent”: { “intent”: “Greeting”,
“score”: 0.9887482
},
“intents”: [
{
“intent”: “Greeting”,
“score”: 0.9887482
},
{
“intent”: “who”,
“score”: 0.04272597
},
{
“intent”: “learnmore”,
“score”: 0.0125702191
},
},
],
“entities”: [
{
“entity”: “buddy”,
“type”: “Person”,
“startIndex”: 20,
“endIndex”: 24,
“score”: 0.95678144
}
]
}
When LUIS returns the intent as “Greeting”, the Root Dialog processes the function “Greeting Intent.” This function displays the Greeting Dialog, which in
our example does not need to invoke a bot task. The control will remain with the Greeting Dialog until the user types something else. When the user responds, the Greeting Dialog closes and Root Dialog resumes control.
Now let’s explore the following Root Dialog sample code to see how the rest of the intents are processed:
public Task StartAsync(IDialogContext context)
{
context.Wait(MessageReceivedAsync);
return Task.CompletedTask;
}
private async Task MessageReceivedAsync(IDialogContext context, IAwaitable<object> result)
{
try
{
var activity = await result as Activity;
string message = WebUtility.HtmlDecode(activity.Text);
if (string.IsNullOrEmpty(message) == true)
{ return;
}
// Handle the explicit invocation case in Skype
string channelId = GetChannelId(activity);
if (channelId == “skype” && message.StartsWith(activity.Recipient.Name) == true)
{
message = message.Substring(activity.Recipient.Name.Length).Trim();
}
else if (channelId == “skype” && message.StartsWith
(“@” + activity.Recipient.Name) == true)
{
message = message.Substring
(activity.Recipient.Name.Length + 1).Trim();
}
// Handle intents
LUISResult luisResult = await GetEntityFromLUIS(message);
string intent =
luisResult.intents?.FirstOrDefault()?.intent ??string.Empty;
string[] entities =
luisResult.entities?.Select
(e => e.entity)?.ToArray() ?? new string[0]; if (intent == “greeting”)
{
await ProcessGreetingIntent(context, message);
}
else if (intent == “who”)
{
await ProcessQueryIntent
(context, activity, BotTask.AskWho, message, entities);
}
else if (intent == “learnmore”)
{
await ProcessQueryIntent
(context, activity, BotTask.AskLearnMore, message, entities);
}
else
{
await ProcessQueryIntent( context, activity,
BotTask.AskQuestion, message, entities);
}
The Root Dialog does not get invoked unless a user types a message. When the Conference Buddy bot receives the first message, we do special handling in the code for messages coming from the Skype channel.
We discussed what happens when LUIS returns the Greeting Intent. In our example chatbot, we anticipate three other possible intents from LUIS:
• If the intent is “Who,” the Root Dialog posts the question to the bot task “Ask Who.”
• If the intent is “Learn More,” the Root Dialog posts the question to the bot task “Learn More.”
• For all other intents, the Root Dialog sends the text to the “Ask Question” message.
At this point, the Root Dialog hands control to the appropriate bot task.
The Bot Brain abstraction layer
The abstraction layer handles the Post call to a bot task within the Bot Brain. This is where the benefit of the microservices implementation becomes clear. The Root Dialog has handled the message and LUIS processed the intent. At this level, the Bot executes the relevant bot task.
Let’s explore the code:
private static async
Task<string>ProcessQueryIntent(IDialogContext context,
Activity activity,BotTask task, string query, string [] topics)
{
// Prepare the request to invoke a bot task within the bot brain
AskQuestionRequest request = new AskQuestionRequest()
{
ConversationId = activity.Conversation.Id, Question = query,
SessionId = SessionId,
Topics = topics != null ? topics.ToArray() : new string[0], UserId = string.IsNullOrEmpty(activity.From.Name)
== false ? activity.From.Name : activity.From.Id
};
// Invoke the bot task to process the request AskQuestionResponse askQuestionResponse = await
HttpClientUtility.PostAsJsonAsync
<AskQuestionResponse>
(new Uri(BotBrainUrl + task.ToString()), RequestHeaders, request);
// Handle the response returned from the
bot task to be shown as cards depending on channel if (askQuestionResponse.Results?.Count() > 0 == true)
{
IMessageActivity foundMsg = context.MakeMessage(); AskQuestionResult result = askQuestionResponse.Results[0]; if (string.IsNullOrEmpty(result.Source) == false)
{foundMsg.Text = string.Format
(“Got it. Meanwhile, from {0}:”, result.Source);
}
else
{
foundMsg.Text = “Got it. Meanwhile, here’s what I found:”;
}
await context.PostAsync(foundMsg); IMessageActivity cardMessage;
string channelId = GetChannelId(activity);
if (channelId == “kaizala”)
{
cardMessage = await
GetKaizalaCardMessage(context, request, result);
}
else if (channelId == “directline” || channelId == “emulator”)
{
cardMessage =
GetAdaptiveCardMessage(context, request, result);
}
else
{
cardMessage = GetHeroCardMessage(context, request, result);
}
await context.PostAsync(cardMessage);
}
else if (task != BotTask.AskQuestion)
{
IMessageActivity notFoundMsg = context.MakeMessage(); notFoundMsg.Text =
“I can’t seem to find it.
Can you rephrase the question and try again?”;
await context.PostAsync(notFoundMsg);
}
return “success”;
}
What’s important in this layer, no matter which bot task is called, the request, invocation, and response are handled the same way. The Data Contract called
AskQuestionRequest combines  the  ConversationID,  Query,  SessionID,  and
UserID to pass to the bot task through an HTTP Post.
The HTTP Post is the call into a bot task within the Bot Brain. When the appropriate bot task executes the query, it prepares the response in the AskQuestion Response where no matter which bot task, the response is handled generically.
Because the Conference Buddy bot is omnichannel, the response card is displayed differently according to the channel; the last part of the code shows how the bot implements adaptive cards.
The Data Contract
Without the Data Contract, there will be no abstraction layer at all. The Data Contract code that follows acts as the formal agreement between the bot and Bot Brain and abstractly describes the data to be exchanged.
Let’s explore the code and see the details behind how the AskQuestionRequest, which specifies the details to be sent with each query, and the details behind the AskQuestionResponse, which specifies the details for each response, no matter what the bot task does:
namespace ConferenceBuddy.Common.Models
{
[DataContract]
public class AskQuestionRequest
{
/// <summary>
/// The session identifier
/// </summary>
[DataMember(Name = “sessionId”)]
public string SessionId { get; set; }
/// <summary>
/// The conversation identifier
/// </summary>
[DataMember(Name = “conversationId”)]
public string ConversationId { get; set; }
/// <summary>
/// The user identifier
/// </summary>
[DataMember(Name = “userId”)]
public string UserId { get; set; }
/// <summary>
/// The text of the question
/// </summary>
[DataMember(Name = “question”)]
public string Question { get; set; }
/// <summary>
/// The topics of the question
/// </summary>
[DataMember(Name = “topics”)]
public string [] Topics { get; set; }
}
[DataContract]
public class AskQuestionResponse
{
/// <summary>
/// The unique id of the response
/// </summary> [DataMember(Name = “id”)] public string Id { get; set; }
/// <summary>
/// The results of the response
/// </summary>
[DataMember(Name = “results”)]
public AskQuestionResult [] Results { get; set; }
}
[DataContract]
public class AskQuestionResult
{
/// <summary>
/// The title of the result
/// </summary>
[DataMember(Name = “title”)]
public string Title { get; set; }
/// <summary>
/// The answer of the result
/// </summary> [DataMember(Name = “answer”)] public string Answer { get; set; }
/// <summary>
/// The image url of the result
/// </summary>
[DataMember(Name = “imageUrl”)]
public string ImageUrl { get; set; }
/// <summary>
/// The source of the result
/// </summary>
[DataMember(Name = “source”)]
public string Source { get; set; }
/// <summary>
/// The url of the result
/// </summary> [DataMember(Name = “url”)] public string Url { get; set; }
/// <summary>
/// The url display name of the result
/// </summary>
[DataMember(Name = “urlDisplayName”)]
public string UrlDisplayName { get; set;
}
}
The Data Contract allows the separation of functions between how a query is processed and how the response is generated. Think of the Data Contract as the postal carrier. From the postman’s perspective, the specific details of the contents in the letter/package are irrelevant. What matters is the format of the “To” and “From” address to allow delivery to the right location.
If we had to make different HTTP calls to each bot task, the Conference Buddy bot will be unwieldy and difficult to build, test, deploy, and scale. In the next section, we see how the microservices implementation makes it simpler to develop the Bot Brain’s intelligence and teach our Conference Buddy to bot new skills.

Adding “Plug and Play” Intelligence to Your Bot

We can teach our Conference Buddy to bot new skills by developing the Bot Brain’s intelligence. So far, we have built a Conference Buddy bot that has three main bot tasks:
• Ask Who task
• Learn More task
• Answer Question task
We built the Conference Buddy architecture in a flexible way, so a developer can easily add more bot tasks. So, let’s expand on our Conference Buddy bot scenario. Suppose that the conference is broadcast globally and the audience members hail from different countries and speak different languages, whereas the speaker understands only English. You might want to add a new task to allow your bot to handle questions in different languages and translate the question to English for the speaker to address.
For our bot, we will make an additional call to Cognitive Services: Microsoft Translator. This is a machine translation service that supports more than 60 languages. The developer sends source text to the service with a parameter indicating the target language, and the server sends back the translated text for the client or web app to use.
The translated text can now be used with the other Cognitive Services that we have used so far, such as text analytics and Bing web search.
To make a call to a new Cognitive Service, you need to log in to your Azure portal. This Quick Guide walks you through editing the bot code and using Azure Functions to invoke various APIs. In the sample code that follows, we illustrate how to add the new translation bot task.
Let’s explore the code:
[FunctionName(“AskQuestion”)]
public static async Task<HttpResponseMessage> Run(
[HttpTrigger(AuthorizationLevel.Function, “post”, Route = “AskQuestion”)]HttpRequestMessage request, [Table(“Session”, Connection =
“AzureWebJobsStorage”)]ICollector<SessionTableEntity> sessionTable, TraceWriter log)
{
MediaTypeHeaderValue contentType = request.Content.Headers.ContentType;
// Check if content type is empty
if (contentType == null)
{
return request.CreateResponse
(HttpStatusCode.BadRequest, “Missing content-type from header.”);
}
else if (contentType.MediaType.Contains(“application/json”) == false)
{
return request.CreateErrorResponse (HttpStatusCode.UnsupportedMediaType,
string.Format(“Request’s content type ({0}) is not supported.”,
string.Join(“, “, contentType.MediaType)));
}
// Read content from request
AskQuestionRequest requestBody = await
request.Content.ReadAsAsync<AskQuestionRequest>();
// Verify content contains a valid image uri
if (string.IsNullOrEmpty(requestBody.Question) == true)
{
return request.CreateResponse(HttpStatusCode.BadRequest, “Question is missing from the request content.”);
}
else if (string.IsNullOrEmpty(requestBody.SessionId) == true)
{
return request.CreateResponse(HttpStatusCode.BadRequest, “Session id is missing from the request content.”);
}
// Translate question
requestBody.Question = await
ServicesUtility.Translator.TranslateTextAsync(requestBody.Question);
// Answer the question
AskQuestionResponse response =
await AnswerQuestion(requestBody, sessionTable);
// Return request response with result and 200 OK
return request.CreateResponse(HttpStatusCode.OK, response);
}
public static async
Task<AskQuestionResponse> AnswerQuestion(AskQuestionRequest request, ICollector<SessionTableEntity> sessionTable)
{
// Get unique identifier
string id = Guid.NewGuid().ToString(); DateTime timestampUtc = DateTime.UtcNow;
// Run keyphrases extraction
request.Topics = await ServicesUtility.GetTopics (request.Question, request.Topics);
// Run search services
string queryWithTopics = request.Topics?.Count() > 0 ?
string.Join(” “, request.Topics).Trim() : request.Question;
Task<BingWebSearchResult> bingWebSearchTask =
ServicesUtility.BingSearch.SearchWebAsync
(query: request.Question, count: SettingsUtility.MaxResultsCount); Task<BingWebImagesResult> bingWebImagesTask =
ServicesUtility.BingSearch.SearchImagesAsync
(query: request.Question, count: SettingsUtility.MaxResultsCount);
await Task.WhenAll(bingWebSearchTask, bingWebImagesTask);
BingWebSearchResult bingWebSearchResult = bingWebSearchTask.Result; BingWebImagesResult bingWebImagesResult = bingWebImagesTask.Result;
// Process results
AskQuestionResponse response = new AskQuestionResponse()
{
Id = id,
Results = new AskQuestionResult[0]
};
if (bingWebSearchResult.WebPagesResult?.Values?.Count() > 0)
{
response.Results = ServicesUtility.GetAskQuestionResults(bingWebSearchResult);
}
if (response.Results.Any(r => string.IsNullOrEmpty(r.ImageUrl)
== true) == true && bingWebImagesResult?.Values?.Count()
> 0 == true)
{
response.Results = ServicesUtility.AddImageToResults(response.Results, bingWebImagesResult);
}
// Upload search document
await ServicesUtility.UploadDocumentToSearchService (SettingsUtility.AzureSearchIndexName,
new SessionSearchDocument
(id, timestampUtc, “AskQuestion”, request, response));
// Write to the session table
sessionTable.Add(new SessionTableEntity
(id, timestampUtc, “Question”, request, response));
return response;
}
In the first part of the code, the function AskQuestion reads the content from the request and translates the question into English using the Translator. It then
extracts the Key Phrases using Text Analytics and sends the query to Bing Web Search and Bing Image Search to create a card for the response. The Key Phrases go to Azure Search to power the bot’s analytics as well as the dashboard. In this example, we do not translate the response back into the original language, but that could be an option for other implementations.
Now that we have successfully added a new bot task, we can continue to develop the Bot Brain’s intelligence to add more abilities like adding other APIs such as vision, speech, and more through our Cognitive Services. 

Leave a Reply

Your email address will not be published. Required fields are marked *