The Basics of the Alexa Skill IntentSchema

In this post, I will describe the basics of designing and working with the IntentSchema for an Amazon Alexa skill. To provide some context, an Alexa skill can have multiple intents. Each intent is a specific action within the skill.

Imagine we have a calculator skill. The user can ask Alexa to add or subtract two numbers. The skill would be the calculator and the two intents would be adding and subtracting. The intent schema comes into play because we are allowing the user to add or subtract any two numbers. We use the IntentSchema.json to prepare Alexa to accept those two arguments. Note: The programming idea of an argument is referred to as a slot in the Alexa world, I will refer to them as slots for rest of post.

For our addition and subtraction intents, we would need to define two slots for both of them the first number and second number in the calculation. See below the intentSchema.json currently with only the AddIntent slots defined in it.

{
   "intents":[
      {
         "intent":"AddIntent",
         "slots":[
            {
               "name":"firstNumber",
               "type":"AMAZON.Number"
            },
            {
               "name":"secondNumber",
               "type":"AMAZON.Number"
            }
         ]
      }
   ]
}

You can see in the json that we have two individual slots (firstNumber & secondNumber). Each has a name and a type attribute these are required for every slot that you define.

{
	"intents": [{
		"intent": "AddIntent",
		"slots": [{
			"name": "firstnum",
			"type": "AMAZON.NUMBER"
		}, {
			"name": "secondnum",
			"type": "AMAZON.NUMBER"
		}]
	}, {
		"intent": "SubtractIntent",
		"slots": [{
			"name": "firstnum",
			"type": "AMAZON.NUMBER"
		}, {
			"name": "secondnum",
			"type": "AMAZON.NUMBER"
		}]
	}]
}

Above is an example with both of our intents defined inside of the intentSchema.json Notice slots within the same intent must have unique names but slots inside of a different intent are within a new scope.

All slots must have a defined types. Amazon has some default types that you can use for your apps. These types are the most common:

 Slot Type  Description
 AMAZON.NUMBER  Able to recognize numbers and convert then to integers. Example: Two converts to 2
 AMAZON.TIME  Converts times into programmable values. Example: “Set alarm for seven pm”. Converts to 7:00 or 18:00 depending on settings
 AMAZON.DURATION  Able to change durations into usable values Example: “Set alarm for 45 minutes”. Converts to PT45M
 AMAZON.FOUR_DIGIT_NUMBER  Recognizes 4 digit number sequences like years and converts them. Example: “Wikipedia war of eighteen twelve” converts to 1812.
 AMAZON.DATE  Converts dates into usable formats. Example: “What is the weather today” converts to what is the weather for 2017-3-2

If you find yourself struggling with understanding what the above types do and how they handle user input. Make sample apps and look at what Alexa returns as the data. It is useful to see the real world conversions that Alexa does.

The intentSchema makes Alexa powerful because it can make your skills more dynamic, in that you can accept all types of input from the user. It is possible to build custom slots as well as lists to really build out your skill. I will make a post in the future about these advanced topics.

The Components that Make Up an Alexa Skill

This post will give the big picture on the components that make up an Amazon Alexa Skill. It will contain next to no code. But will introduce you to the pieces and terminology used in the Alexa world.

First, let us start what an Alexa skill is. The skill is essentially an action or function that your Amazon Echo device will perform. A skill is invoked by asking Alexa a specific phrase.

Examples of skills would be

  • Alexa, what is the weather today?
  • Alexa, where is my stuff?
  • Alexa, will it rain today?

At a very high level, each of these phrases invokes a skill where Alexa will parse the words. The words are then sent to a predetermined function (set of code) in the cloud based on what the phrase was, the function performs a series of actions, then a result is returned back to your Echo device. This high-level flow is the same for all skills. Let’s dive in and get into the details of how this process works.

amazon-echo.jpg

The main components that make up Alexa Skill:

  • Utterances
  • Intent Schema
  • AWSLambda

Utterances (text)

As I said above you have to speak a specific phrase to your Echo device in order to invoke the skill. These phrases are called utterances. Utterances are contained in a simple text file. They are the phrases that Alexa is on the lookout for and if she recognizes a user’s phrase as an utterance she knows what action to perform.

AddIntent what is 2 plus 2
AddIntent add 2 and 2

SubtractIntent what is 5 minus 2
SubtractIntent subtract 5 and 2

The first words you see AddIntent and SubtractIntent are individual intents within a skill. Skills can have more than one intent. And in the case above it has two intents. Essentially an add and subtract. Right now the skill is extremely basic and only capable of recognizing the above hard-codes phrases.

So that is very basic and we want our users to be able to add and subtract any whole numbers. For that, we need to tell Alexa to expect any number. We do that by still using our utterances but also combining that with and intentSchema file. Here is an example of an utterances file allowing for the addition of any two whole numbers.

AddIntent what is {firstNumber} plus {secondNumber}
AddIntent add {firstNumber} and {secondNumber}

SubtractIntent what is {firstNumber} minus {secondNumber}
SubtractIntent subtract {firstNumber} and {secondNumber}

Our utterances text file now contains placeholders instead of hard coded values, great! But how is Alexa supposed to make sense of {firstNumber} and {secondNumber}? That is where the intentSchema json file comes in.

Intent Schema (JSON)

The intentSchema provides meaning to our variables. Instead of variables, amazon calls them slots so I will refer to them as such. Each slot is filled by whatever the user says. If the user asked, “What is 10 plus 5”. Alexa would know that 10 refers to firstNumber and 5 refers to secondNumber.

Here is an example intentSchema.json file.

{
    "intents": [{
		"intent": "AddIntent",
		"slots": [{
			"name": "firstNumber",
			"type": "AMAZON.NUMBER"
		}, {
			"name": "secondNumber",
			"type": "AMAZON.NUMBER"
		}]
	}, {
		"intent": "SubtractIntent",
		"slots": [{
			"name": "firstNumber",
			"type": "AMAZON.NUMBER"
		}, {
			"name": "secondNumber",
			"type": "AMAZON.NUMBER"
		}]
	}]
}

By using the utterances.txt and intentSchema.json together our Echo device is capable of understanding whatever number a user says. Alexa knows to expect any potential number in our slot firstNumber and secondNumber because we match the same name and give it an Amazon.Number type.

I do not want to get too detailed right now into the intentSchema files but they are a powerful method for gathering dynamic user input for your skill. Expect more posts in the future getting into the more powerful aspects of the intentSchema file.

Amazon AWS Lambda

Now that we have Alexa understanding what a user says. We need to tie that together with our function in the cloud to actually do some processing based on what the user said. We create an AWS Lambda function for this.

aws-lambda

“AWS Lambda is an event-driven, serverless computing platform provided by Amazon as a part of the Amazon Web Services. It is a compute service that runs code in response to events and automatically manages the compute resources required by that code.”

The lambda function is connected to our particular skill so the event that causes it to fire is the user says a specific utterance that Alexa was expecting. Alexa creates a json object based on the users phrase, the lambda function performs actions on data inside of the json object, and last it creates a json object of its own to send back to Alexa as a response. It is your responsibility as the developer to write a lambda function to do the processing and creation of json object to send back to Alexa.

This json object sent back to Alexa will contain the response for Alexa to repeat back to the user. This whole process happens very quickly. As long as your lambda function is efficient and does not need to do a lot of computing you should be able to get your response back within a second.

Currently, functions for AWS Lambda can be written in Node.js (JavaScript), Python, and Java (Java 8 compatible), as well as C#. Also, it is very noteworthy to know that as of right now Alexa skills can only be hosted on US East (N. Virginia) and EU (Ireland) regions. Lambda functions for other purposes can be hosted in many other regions but if you are creating one to be used with an Alexa skill it must be hosted in one of the mentioned regions.

To bring it all together there are three major pieces that make up an Alexa skill. We have the utterances, intentSchema, and the AWS Lambda function. Expect a more technical guide very soon on how to create your own Alexa Skill. If anything confused you in this article feel free to leave a comment and I can clarify.

https_proxy