Table of Contents
Amazon Alexa, the popular voice assistant helping with everyday tasks, has become a best friend to many households in the US and elsewhere. The list of available features for Alexa development, released by both Amazon and third parties, keeps growing steadily, but according to reports from January of 2018, that the total number of skills (apps) working in the US only 25,784. More vendors are looking to integrate their product with the voice assistant—currently, you can link it to your Gmail account, your calendar, phone, or even home lighting.
Ordering groceries or a cab? Alexa can do that for you. Some even claim that the technology may become the industry standard for personal assistants everywhere. A truly fascinating prospect.
Given the above, I decided to do some quick research on Alexa and learn how to extend Alexa’s features by writing apps--called “Skills”--ultimately putting together a proof of concept in the form of a simple app. If we ignore the fact that the debugging part still needs some work, that testing with text chat doesn’t work, and that you’ll probably feel you’re going a little insane after talking to your computer for a couple of days, then the conclusion is that developing skills for Alexa is quite easy.
In the following tutorial, I’ll show you how to deploy a sample Alexa Skill called “Conf Room Manager” and explain how it works. Afterwards, you should be able to write your own Skills and use them in your project.
Most of the examples available online don’t use alexa-sdk (npm package), but I went with it because it provides a nice DSL that allows you to write less code. I will also use the default data storage for that SDK (DynamoDB) and Amazon Lambda as a backend (Alexa can use any compatible backend via HTTP).
My example won’t include any real integration with the calendar or anything because that’s beyond the scope of this piece. There are, however, many resources out there that will help you implement genuine Google Calendar integration. Additionally, I’m using DynamoDB only to save results but you can read data in a likewise manner and create more complex skills. One last thing—I haven’t used any JS transpiler but it’s definitely a better idea to send one compiled js file than zipping the entire dir ;-)
Getting Ready for Alexa Development
- Watch the “Developing Alexa Skills” video series. If you haven’t work with Alexa before, this will help you get the gist of how it works.
- Get familiar with the developer console
- Sign in/up to your Amazon Developer account
- Sign in/up to the AWS console
- Install and configure AWS CLI
- Clone my repository with the example
Running the Sample
Creating a skill from the sample
- Go to Alexa Skills and click Create Skill.
- Provide a name and choose Custom type.
- In the skill editor, go to the JSON Editor and paste the contents of the skill.json file (we’ll go back to that later).
Creating a Lambda function from the sample
- Create a new DynamoDB table here for persisting data
- Create a new Amazon Lambda function alexa_conf_rooms here
- Add Alexa Skills Kit to triggers and Amazon DynamoDB to resources as pictured below:
- Deploy code to Lambda using ./deploy.sh script (assuming that the name of your function is alexa_conf_rooms) and you have AWS CLI configured.
Connecting everything together
- Check your AWS Lambda function ARN (top right corner of the image above)
- Check your Skill ID in Skill Editor -> Endpoint
- In the Lambda editor, set ENV variables: DB_TABLE_NAME - DynamoDB table name and APP_ID (Skill ID)
- In Skill Editor -> Endpoint, set the Default Region to your Lambda function ARN
You're all set, let's talk
- Download the Alexa app from the store and install it on your mobile device
- Sign in to your developer account to access your skill
- Say “Alexa, open conf room manager” or “Alexa, ask conf room manager if there is a free conf room?” or “Alexa, ask conf room manager to book a conf room” or any other similar combination.
AWS permissions
During the steps above, you will probably have a lot of AWS permissions errors, missing group/users etc. This is because each feature (DynamoDB, Lambda, etc.) requires some permissions that are not granted by default. But don’t worry, just send the error message to your AWS admin and they’ll know how to help.
Let’s Dive Into the Details
An Alexa Skill comprises two parts: skill configuration in the skill editor and the endpoint (code on AWS Lambda or on your server).
Skill configuration explained
In the skill editor, you define:
- your invocation name—in our case “conf room manager.” It’s the name you’ll give to Alexa to indicate that you want it to perform a given skill.
- intends—all the functions that your skill will provide. In our case, we have two functions: checking whether there any free conf rooms and booking rooms. In some tutorials, you will see that one function can be split into more intends or multiple functions can be handled by one “universal” intend. I guess that such an approach may be useful in very complex cases, but normally you shouldn’t do that.
- data types—each intend can have variables of given type. There are many built-in data types for common cases, like numbers, dates or durations, but we can define our custom types as well. In our case, there is a ROOM_NAME type with predefined values (at Monterail, we have them set for “red,” “green,” “blue,” etc.) and aliases. We can add IDs to them. IDs are very useful because even if you specify a list of possible values, you may get a different value returned that you cannot block. Using IDs, however, you can check the received value’s ID—if it’s blank, the value is not on your list.
Intends
Let’s start with the FreeConfRooms intend.
It doesn’t have any variables. We just need to specify sample utterances in the skill editor that will trigger that intend, e.g. “Find a free conf room” or “Are there any free conf rooms now?” That’s it. Now, saying “Alexa, ask conf room manager if there is a free conf room” should trigger this intend and we just need to handle it in a Lambda function.
The second intend, RoomBooking, is a little bit more complex.
There are two variables (intend slots): period and room. Now, when we define sample utterances, we can use those variables, e.g. “Book the {room} conf room for {period}.” But we don’t have to use all variables in each utterance. Actually, we shouldn’t. We should define utterances with only some of them, e.g. “Book the {room} conf room,” or even without any variables at all, such as “Book a conf room.” Then, in Intent Slots -> Edit we can define questions and sample answers for each slot, e.g. “For how long?” “For {period}.” Those questions will be asked when you don’t provide the necessary values.
The Lambda function explained
Now we need to write handlers for each intend in the AWS Lambda function (see handlers in index.js). Let’s start with the simplest one:
'FreeConfRooms': function() {
this.emit(':tell', "Maybe there are...");
},
So now when you ask the conf room manager about free conf rooms, it will just answer “Maybe there are…”
Of course, in a real-life example, you’d have to perform some requests to the calendar API and then call this.emit(':tell', ...)
in a promise. This is quite straightforward.
The handler for RoomBooking, however, is a little bit more complex and definitely not as straightforward.
'RoomBooking': function() {
let intent = this.event.request.intent;
const period = _.get(intent.slots, "period.value");
const room = getRoomId(intent);
if (period && room) {
let periodMinutes = moment.duration(period).asMinutes();
saveBooking({period, room}, this)
this.emit(':tell', `Ok, I will book ${room} room for ${periodMinutes} minutes`);
} else {
this.emit(':delegate', intent);
}
}
First, we are trying to get slot values from this.event.request.intent.slots.period
is a duration encoded in ISO8601 format. Using getRoomId()
, we are reading room IDs hidden deep inside the object.
Then, if both values are present, the handler converts the duration to minutes, saves the data to DynamoDB, and tells us that we’ve successfully booked the room. If not, we need to call this.emit(':delegate', intent)
to instruct Alexa to ask you additional questions defined in the slot, such as “For how long?”
Alexa won’t do this automatically, to allow you to react in a custom way when some slot values are missing—for example, by setting default values. Additionally, getRoomId()
will remove the room value from the slots if it doesn’t have the ID to ask again about the room using :delegate
The last thing we need to do is implement handlers for some common intends:
- LaunchRequest—this is triggered when you call our skill without any intend, e.g. “Alexa, open conf room manager.” “Please say that again?” is the question Alexa asks when your response can’t be parsed.
- Unhandled—It’s not clear when this event is triggered (read this issue: https://github.com/alexa/alexa-skills-kit-sdk-for-nodejs/issues/43), but it’s a good idea to implement it anyway and just ask some general questions to be sure that the execution won’t fail
- SessionEndedRequest—triggered when, for example, a user closes the session unexpectedly.
Data storage explained
By using a storage adapter for alexa-sdk, we can easily save data by setting values in this.attributes
object. It will be saved in DynamoDB with the ID of a given user and when that user uses our skill again, we’ll have access to these attributes.
Debugging
Unfortunately, all your console.logs will go to Amazon CloudWatch logs, so debugging is quite a pain. Additionally, if you are an introvert like me, you’ll probably feel exhausted after a couple of hours of talking to Alexa—the text chat debugging tool doesn’t work.
Conclusion
It seems that Amazon's Alexa features will become more ubiquitous in the near future. Before the end of 2018, Amazon is planning to release at least 8 new Alexa-powered devices including microwave oven, receiver, subwoofer and car gadgets making voice assistants a part of everyday life.
As Jeff Bezos, the CEO of Amazon stated in July:
We want customers to be able to use Alexa wherever they are. There are now tens of thousands of developers across more than 150 countries building new devices using the Alexa Voice Service, and the number of Alexa-enabled devices has more than tripled in the past year.
You can set up your Alexa-enabled devices, listen to music, order groceries or manage your lights. Source: AppStore
So it's good to see that writing voice apps might be easier than you’d think. The biggest issue right now? During my development efforts, I was dreaming about possibility of debugging my code in a manner similar to the browser developer console, where I can easily inspect any object during code execution.
Have you tried coding your own Alexa skills yet? We’d love it if you shared your experiences with us.
External links:
- Alexa Skills Kit: https://developer.amazon.com/alexa-skills-kit
- Alexa Developers Youtube channel: https://www.youtube.com/channel/UCbx0SPpWT6yB7_yY_ik7pmg/
Want to build meaningful software?
Pay us a visit and see yourself that our devs are so communicative and diligent you’ll feel they are your in-house team. Work with experts who will push hard to understand your business and meet certain deadlines.