For many IoT manufacturers managing a fleet of devices, ensuring that you will be able to interact with, update, secure, and troubleshoot in-the-field devices is a critical part of business operations. The ability to remotely interact with devices can improve the user experience and save companies thousands to millions of dollars on device maintenance and additional development costs.
Some organizations, especially those with existing datacenter investments and applications infrastructure, may decide that on-premises platforms are more appropriate for their needs. But, for many others, the scalability and other benefits of public cloud infrastructure is an exciting opportunity to accelerate product development and minimize startup costs.
At Witekio, we’ve helped a variety of clients select the right platform to suit their needs and helped configure IoT devices across a variety of cloud environments. In this article, we’ll be reviewing some of the most important considerations when selecting a IoT device management platform.
What is device management and why is it so important?
IoT device management or “device fleet management” is the tooling and processes used to operate, maintain and secure all your IoT devices. This can include everything from automated device application updates to remote device monitoring and troubleshooting.
As the platform that you select to manage your devices is a critical component of your business, it’s important to get a better understanding of your requirements before making a choice.
For example, imagine that as a device manufacturer we’ve already designed the hardware and peripherals of our devices and now need to quickly select an IoT device management vendor out of the crowd of vendors and platforms that exist today.
We can ask a few initial questions:
1. Does our device need to communicate bi-directionally with cloud services and APIs?
2. Does the end-user of the device require authentication?
3. Do we need to monitor the device logs for anomalies or events?
4. Do maintenance technicians need tooling to manage or maintain the fleet of devices?
These are just a few of the first basic questions we can ask when developing a set of requirements for the platform our devices are integrated with. Depending on our product and our product roadmap there are many other potential questions that we could ask that determine what platforms we should consider.
If we don’t ask these questions early on, then we end up in a situation where we may waste significant amounts of time and money:
2. Devices that might require end-user authentication require different authentication mechanisms that often integrate with web APIs. Not planning for this in advance can cause large development roadblocks.
3. A lack of monitoring and logging on our devices and device applications can drastically slow the debugging and development process.
4. Technicians who might have benefitted from remote access to in-the-field devices, would otherwise need to spend hours on work and travel.
With examples like these in mind, let’s look at a step-by-step approach in evaluating any device management platform. As part of this evaluation, we’ll work through some of the most important context and considerations we need to review when selecting a platform.
We’ll also use these considerations to contrast several common public cloud providers: Amazon Web Services, Microsoft Azure, and Google Could Platform – all of which our team at Witekio has used in different customer engagements.
1/ Identity and Authentication
If we’re working with a fleet of devices that we plan to communicate with over time, we’ll need to decide on a way to authenticate the devices and assign them unique identities.
For a hands-on example, I discuss many of the common forms of device authentication in the authentication section of this guide.
At a high level, there are a few options:
1. Using Symmetric Keys (not recommended for production)
2. Using X509 Certificates
3. Using Trusted Platform Modules (TPMs)
Symmetric keys are an authentication option that provide a simple on-device symmetric key string that allows the device to authenticate with the IoT platform. However, because they often lead to bad practices like reusing keys across devices or hard-coding keys within software, they are not recommended for production. Also because of this, not all platforms support using symmetric keys and even if they do (like Microsoft Azure) they recommend only using them for prototyping purposes.
X509 certificates, are a much more secure and sophisticated identity and authentication tool. Essentially, you create a certificate chain, starting with either a self-signed root certificate or a certificate purchased from a public CA vendor. This certificate is then used to sign lower-level intermediate certificates or itself configured as trusted within the IoT platform. When this happens, the root certificate (or an intermediate certificate it has signed) can be used to sign unique leaf certificates which live on individual devices. These leaf or device certificates are completely unique and can also be used to authenticate the device’s identity to the IoT Platform.
TPMs or Trusted Platform Modules are dedicated hardware components that perform cryptographic operations to enable authentication. During manufacturing, these TPMs can be registered and configured with the IoT platform in order to provide similar identity and authentication functionality. The main functional difference between a TPM and something like an X509 certificate is that the TPM is a physical component attached to your board that securely stores and manages device identity. The the X509 option on the other hand is a set of files that can be copied on to (or off of) the device and must be stored as securely as possible on the device given that.
Device management platforms will almost always integrate with one or more of these authentication mechanisms tools. Let’s look at how different public clouds support these features:
*Microsoft does not recommend using Symmetric keys in production and suggests only using them when prototyping.
The Authentication Lifecycle
When configuring device certificates, keys, and other possible authentication mechanisms, you’ll also need to have a plan for:
1. How to setup devices with authentication credentials (Provisioning)
2. How to rotate your authentication mechanisms (Rotation)
3. What to do if a device is compromised (Revocation)
How you manage these steps is another important consideration when evaluating identity and authentication options with a platform.
With symmetric keys for example, you might need to simply replace an environment variable or a file containing the symmetric key used to connect. But, you’d have to do this for every device. So, you will need a mechanism to update that value on each device either through a secure connection from the IoT platform or, less efficiently, through a manual connection to the devices.
The same situation comes up with X509 certificates. First, you need to provision a new device certificate from your intermediate or root certificate. Then, you need a way to add the new certificate on the device and shift your device over to using the new certificate.
Different platforms have different ways of allowing you to manage authentication mechanisms. For the most part, these steps are handled through the platform’s APIs and SDKs. When you need to do this at a mass scale, you’ll need to write your own code to automate the process.
Certificate Provisioning, the process of placing a certificate on a device, can happen in many ways depending on the manufacturing process. One possible workflow looks like this:
1. A self-signed root certificate is created OR a CA-signed certificate is purchased from a vendor
2. This top-level certificate is used to create an intermediate certificate
3. The intermediate certificate is registered with the cloud platform
4. The intermediate certificate is used during manufacturing to create unique device certificates
5. The devices use these certificates to connect to the cloud
Rotation occurs when an X509 certificate is close to expiring or needs to be replaced for other reasons such as certificate compromise. To carry out certificate rotation, you’ll need a way to reliably provide a new certificate. Depending on the platform, the process might require the device to reconnect and reprovision. Because failed rotation can make a device inoperable it is a high-risk operation. While required when certificates approach being expired, rotation should be done with upmost caution and be thoroughly tested and rolled out slowly.
1. A new leaf certificate is requested and provisioned for the device
2. The certificate is sent securely to the device
3. The device securely stores and
4. The device tests that new certificate and after confirming
Certificate revocation may be required when a device certificate is known to be compromised. Typically, this might involve patching a device and then providing it with a new certificate that it then uses to connect to the platform. However, if an intermediate certificate is thought to be compromised, perhaps because of a manufacturing process, you might need to do this at a larger scale.
So how do different platforms handle these processes? At a high-level here are some of the options:
AWS has several options to onboard your devices. The majority of these involve writing some custom code and using the AWS Lambda service to register your devices when they connect with a trusted X509 certificate. This can either rely on a registration template or a more customized process in which Lambda functions with process each registration more individually.Each of these platforms have their own APIs and SDKs that can allow for customizable onboarding processes given some development overhead. But, these options can require developers to write significant amounts of code to make sure that the devices provision properly. However, some also offer other methods for device onboarding and provisioning with different levels of customization and automation.
Microsoft Azure provides its own more managed services for onboarding device by using the Device Provisioning Service, or DPS. This service allows you to configure how you want your devices to be provisioned, on which IoT Hubs and with what sort of device twin properties, based on the enrollment groups they’re associated with in DPS. This can help avoid writing significant amounts of provisioning code. It also has additional benefits when it comes to doing credential rotation down the line as DPS can be reused not just for an initial device registration but also any re-registration when devices need to renew certificates or other credentials.
Azure also offers the Azure Sphere platform, which allows manufacturers to purchase devices with a managed hardware and software platform that streamlines securely connecting to the Azure platform without manually managing identity certificates.
2/ Authorization and Permissions
Authentication is only part of the device security story. You also want to control what authenticated devices, users, or applications can do within an environment. For this we have to talk about Authorization and permissions.
For some environments, you may want to control what devices can communicate with each other or how those devices can communicate with the cloud after they are authenticated.
At other times, you might have multiple end-users that you want to grant access to certain devices
At present, the most developed system for control over managing permissions for individual devices or device groups is AWS IoT Core Policies. In combination with AWS IoT Device Groups, you can use it to control what actions a device is authorized to take – such as which devices can publish to certain MQTT topics.
Azure IoT Edge Authorization Policies do allow Edge computing devices to be managed in a similar way to AWS IoT Core Policies, but apart from this, both Azure and Google Cloud Platform have less customizability in this area and instead rely on restricting a device to act only on cloud resources that are related to that device, such as device twins.
The authorization structures on each platform differ widely, so it doesn’t make sense to compare them side by side. Instead, let’s review a few common methods of authorization that come up when building IoT Applications on any cloud platform.
Cloud Provider Identity and Access Management
All the cloud providers have their own internal systems for determining which entities can access various cloud APIs. For the most part, these are usually called AWS/Azure/GCP “Identity and Access Management” (in Azure this usually refers to the features of Azure Active Directory).
These systems allow each cloud to assign unique permissions to human users and non-human entities (like applications, not extraterrestrials). These permissions determine if the entity or user can access different cloud services, or the data stored within those services. When granted to human users or application entities there are often authentication methods outside of the methods described above for unique devices. These include things like access tokens and trusted SSH keys.
JSON Web Token Scopes
JSON Web Tokens (JWTs) are another common authentication *and* authorization tool you might encounter. They are issued after an entity has authenticated themselves to an Identity Provider (IdP) and contain information required to both authenticate and authorize themselves for an API.
The authentication is handled by a cryptographic verification of the token and its signature that is run against the identity provider that issues the token.
The authorization component happens when the “scopes” on the token are reviewed, issues especially on the web-application side of things. In addition to the cloud-provider specific credentials for accessing cloud services.
3/ Picking Device to Cloud Communication Protocols
When selecting a cloud platform, we must pick a protocol for our IoT devices to use when communicating with the cloud.
Importantly, realize that the initial protocols talked about here are for device-to-cloud communications. When working with edge device scenarios, where IoT devices may not connect directly to the cloud themselves, you may use a combination of protocols. One set that allows edge devices like lower-powered sensors to connect with a gateway device. Another set to connect the gateway devices to the cloud. Let’s take a quick look at some of the most common protocols for this latter group to use when connecting to the cloud:
MQTT – The standard in device to cloud communication. MQTT is a lightweight protocol designed for resource constrained devices. It is supported by all major platforms and because it is one of the most common protocols, you will likely find the most support and tooling available for it.
HTTP(S) – Another common option when working with IoT Devices. While HTTP benefits from being a web standard, it has less IoT-specific features. It is also more resource intensive and lacks the resiliency of MQTT with unreliable networks. With that said, if you’ve got a bunch of mobile devices, tablets, or other personal devices relaying data HTTP is also one of the easiest protocols to work with.
AQMP – A less common option that prioritizes deliverability and security. While it is also a lightweight protocol compared with HTTP, AQMP is somewhat less common and less supported option for direct device-to-cloud communication.
WebSocket – This protocol allows for two-way communication without forcing devices to use higher-bandwidth methods like HTTP polling. When implemented by cloud providers, it is used in combination with MQTT or AQMP.
Depending on your project, you may already have some requirements for one of the above protocols. If so, review the chart and the documentation references below to ensure that the platform you are considering meets your requirements:
As shown above, most platforms will be able to support the most common protocols.
Noticeably absent from this list are protocols more common in edge computing environments like LoRa, Sigfox, Bluetooth, and others. These protocols are each used to optimize different project needs such as battery consumption and signal distance. However, these edge protocols will not be used to connect to cloud platforms directly. As such, they are more important to consider when selecting edge gateway hardware and peripherals.
Evaluating these considerations is a part of any good evaluation of Public Cloud platforms for IoT. Of course, for any project, there will be many other considerations to make depending on the project’s requirements. If you’d like help evaluating or developing your project then don’t hesitate to reach out to us for a discussion on how Witekio can help you with your IoT projects.