From Siri to Cortana to Alexa, voice assistants are rapidly becoming part of everyday life. But have you ever wondered how these voice assistants know that you’re talking to them? The answer is that you simply call them by their name. Now you may be wondering how that is even possible. That’s where Keyword Spotting (KWS) comes in! In this blog, let’s take a look at how it works and what you can do with it.

Keyword spotting, what is it?

Keyword spotting systems work by enabling a hands-free speech recognition experience through the detection of a trigger phrase that is used to initiate interaction with a device. It is up to you to choose which phrases or words you want to act as triggers and what action to take once they are identified. With voice assistants such as Siri or Alexa, the interaction takes place via voice input, making the accuracy of KWS systems extremely crucial. For example, a user might say, „Ok Alexa, play some music“, or „Hey Siri, what’s the weather today?“.

Speech transcription vs keyword spotting

Both speech transcription and keyword spotting are great tools to turn speech data into valuable analytics for your business. But if you already have speech transcription, why use KWS? Just transcribe all the speech you hear and then search for your keywords! To find the answer, why don’t we have a look at their differences?

How does keyword spotting work?

KWS begins where many NLU (Natural Language Understanding) tasks begin; extracting useful features from the audio. More specifically, Mel Frequency Cepstral Coefficients (MFCCs) are extracted. These contain information about the frequency throughout the audio. The MFCCs are then passed to a model of some sort. This may be a neural network or some other appropriate type of model. The model then spits out different probabilities, each indicating how likely it is that the audio contains a certain keyword. It can work with a sliding window, which slides over the input audio. The model is then called every so many times step. With the probabilities following for each time step, posterior handling can be taken. For example, if a keyword has a high probability for a few consecutive frames, then the probability is high that that keyword has been said.

Use cases using keyword spotting

So apart from being used in voice assistants, what other use cases are there? Here are a few ways you can use keyword spotting:

  • Employee training: Use keyword spotting to gain insights and find ways to help your employees improve their customer service.

  • Customer satisfaction: You can use keyword spotting to detect negative or positive words/phrases from your customers, which can help you evaluate customer satisfaction.

  • Product interest: By tracking specific product lines or names in your company, you can see which products are doing well and which ones may need some improvement.

  • Privacy-preserving analytics: By using keyword spotting, you can extract valuable insights from data without the need to worry about private information. Indeed, you can only extract keywords you are explicitly searching for.


As you can see, there are many ways you can use keyword spotting to extract useful information. Want to know if keyword spotting systems offer solutions that could benefit your business? Our Data Science team would be happy to help you out!

This blog is written in collaboration with Bas Göritzer – Junior Data Scientist at Mediaan Conclusion.