Which Over the air (OTA) update solutions for your embedded system ? The Ultimate Guide

Homepage Which Over the air (OTA) update solutions for your embedded system ? The Ultimate Guide

Find your path in the jungle of OTA update solutions

Welcome to the jungle of solutions for OTA update (over the air update) of IoT devices and connected systems. You might wonder why the jungle? Because there are several options available to update an IoT device. All have pros and cons and you will quickly realize that it will not be easy to make the right choice for your device. Because “it depends” will be the most common answer to your questions. And because you will have to answer many different questions:

Are you using a microcontroller or a microprocessor?

Are you under Linux, Android, FreeRTOS?

Do you need A/B partition schemes for your updates?

All this makes it impossible to tackle such a vast subject in a single article. That’s why I decided to give you 3 articles! The first one will focus on IoT devices using microprocessors and Linux, the second one will be about Android, and finally the third one will be specifically about microcontrollers.

So, let’s get started!

What you should probably not do for your Firmware over-the-Air (FOTA) updates

If you are using a computer running Linux, you probably use a package manager to update your applications and OS.

Do not use a package manager to update your IoT devices running Linux.

Among the good reasons I see, let’s focus on the 2 most important: power failure and scalability.

  • Power failure:

In the event of a power failure during an update, the package manager might end up in an unknown state with a partially installed update. One of the key requirements for an update manager for IoT devices is to always be in a known state after an update. This capability is called atomicity and can be found in almost any article about OTA update for IoT devices 😊. Package managers have never been designed to make atomic updates.

Let’s shed some light on the possible consequences with an example.

After the first update, 20 out of 20,000 devices are not updated properly due to a power failure. These 20 devices still work, with a software using a mixture of old and new versions of libraries. when the next update comes, these 20 devices all break down together, while 20 new others are in an unstable state. After one-year, 10% of your devices are in an unrecoverable state and your only option is to send a technician to the field to manually update them. And that obviously could be a big deal…

  • Scalability:

Again, let me use an example to highlight the challenges. You are using a Raspbian distribution for your IoT devices. You can use the package manager to update the kernel, libraries, … But how do you automate this process? If you have 20 devices you can do this manually or semi-manually with a script and connect individually via SSH to your devices. But what if you have 20,000 devices…

Do not use Docker containers as such.

An option could be to use Docker containers to update the different applications running on your device. This seems like a good idea, but this solution has one major drawback: delta updates.

An excellent feature of Docker is the ability to only update the differences between 2 versions of a container. This feature is based on the union filesystems that manage each container as a set of layers:

OTA Update Benchmark layers_1

If you had a new feature, Docker adds a new layer to the container that only includes the differences between these two versions of the container.

OTA Update Benchmark layers_2

What’s the catch? The union filesystems are not known for their stability and their reliability in case of a power failures… It has been considerably improved with overlayfs2 (a new type of union filesystem) but it’s not perfect yet, as shows the conversation here.

A brief summary of this thread: it doesn’t work well in case of a power failure. Sometimes, it can be fixed with a couple of command lines, which involve deleting all the persistent data used by the containers… But is this acceptable if you must update thousands of devices? I don’t think so.

Before moving forward, it is important to note that while Docker containers should not be used on embedded devices, most of the underlying technologies used by Docker could be. But we’ll come back to that later in this article.

Reasonable solutions for OTA update of IoT devices using microprocessor and Linux

Single copy update, a solution for some cases

Have you ever been sitting in front of your television, but instead of watching your favorite program on Amazon Prime or Netflix, you’ve just been waiting for:

OTA Update Benchmark_wait for the update

This specific kind of updater is the simplest type. Basically, when a new update is available it offers to install it. If you say yes, your device will restart, then an application will download the update to the memory, and finally write it to the flash.

This approach brings several problems:

First, there is a UX issue: During the update process, your customer will
not be able to use your device.

Second, the partition used to store the updated version of the OS is the same as the one used to run the OS. In the event of a power failure when the update program writes the update in the flash, your device will be… bricked. OTA Update Benchmark_Brick

“I beg your pardon, but what do you mean by bricked?” Well, if this scenario should happen, there would not be no way to recover this device (in fact, you can but you have to physically open the device and solder a couple of wires and …).  As a result, this device becomes as useful as a … brick. Software engineers pinned the word and for their convenience turned it into a verb 😊.

Personally, I have never tried to unplug my Amazon Firestick during an update. Well, I use it to watch my favorite series and I would prefer not to brick it. But I suspect that in the event of a power failure, they included a mechanism to automatically recover the device. For instance, the software used to download and install the updates is never affected by the update itself and is automatically restarted when the power is restored. Then it simply downloads the update again and restart the flashing process. That said, I could be wrong, so do not try that at home!

How to implement a single copy update for my IoT devices?

Even if it’s not perfect, it might be what you need for your projects. In this case, on Linux, you have three options:

  • Develop your own custom update system. Well, you should probably not do that. We are talking about Linux; everything is available for you to design your own implementation. But there are many options that have already been battle tested. You can choose one of them instead of reinventing the wheel.
  • SWUpdate: One of the most popular option for implementing single copy update. It works very well, and we used it at Witekio on several projects. If you want to know more about this tool, you should check its excellent documentation. And, if you want to give it a spin, follow the link to this great tutorial.
  • RAUC: Another option if you want to implement a single copy update. A big plus of this solution, compared to SWUpdate, is the possibility to generate differential patches with casync (still experimental).

Pros and Cons for choosing a single copy update

Let’s summarize the pros and cons of a single copy update system:

OTA Update Benchmark table 1

As we said earlier, this approach is far from perfect. You may consider it as a good solution if your IoT devices are always connected to the internet and if a downtime of several minutes for each update is acceptable. Otherwise, an alternative solution might be to use a dual copy update.

Dual copy update, a better option for your OTA updates?

It is obvious that using a single copy update for your product is a risky choice. One way to improve reliability is to add a second partition. How does it work?

Suppose partition A is the active partition, which means that your Linux system with your applications is running on this partition.

OTA Update Benchmark partition_1

When you receive an update for your IoT device, it is written to the partition B.

OTA Update Benchmark partition_2

When the update is complete, the system will automatically restart and boot the partition B. If it fails to boot properly, a hardware watchdog will reboot the system and automatically mark partition B as invalid before restarting the previous version of the software on partition A. If the boot succeeds, partition B is marked as functional and will be used by the processor as the new official version.

OTA Update Benchmark partition_3

 

With this scenario, even in the event of a power failure during the update of partition B, the update system will be able to restart on partition A. This system is bullet proof, but this means that for each update, you need to download a full Linux image.

Does it matter? It depends! Knowing that the size of a Linux image can range from a few MB to a few GB, how much bandwidth can you use to update your devices in the field? Is the internet connectivity always good?

How to implement a dual copy update for my IoT devices?

Well, you have three good options:

  • RAUC and SWUpdate: They can also be used to implement a dual copy scheme.
  • Mender: This solution is fully integrated. It includes the web front-end and the embedded software for your IoT devices. The project itself is fully open source and based on Yocto for the embedded part. It can be adapted to any hardware compatible with Yocto. You can either deploy the web server in your own premises or use the SaaS version offered by Mender. This is a great alternative to SWUpdate or RAUC if you don’t need a high degree of customization of your update solution.

Pros and Cons – Dual copy update

Let’s summarize the pros and cons of a dual copy update system.

OTA Update Benchmark table 2

To summarize, dual copy update can be a perfectly bullet proof solution, but with a serious drawback: a great need for storage space and bandwidth. So, let’s discuss another alternative which would solve this issue: it will have to be based on deployment with containerized applications.

Deploy OTA updates with containerized applications – let’s put everything into little boxes.

Let’s consider an embedded system. It is usually composed of a Linux OS and several applications. In a single or dual partition update system, these elements are considered a single artifact. But, each of them could be updated independently.

With such an approach, you would have several updates instead of one, and the size of each update would be significantly reduced. But on IoT devices, each of these updates need to be atomic to avoid any unknown state. One way to overcome this challenge is to containerize each application. In this case, your applications are completely independent from each other and from the Linux OS.

OTA Update Benchmark containers scheme

In this example, there are two applications. One of them runs a UI with Qt and communicates over a virtual network with another application running a MQTT broker. Both applications run on the same Linux host system. Each application is completely independent from the other and from the host system.

Thanks to the isolation, a trivial approach to atomically update a container could be:

  • A container with the latest version of the application is downloaded.
  • The system confirms that the container with the new version of the application is not corrupted.
  • The container with the old version of the application is stopped.
  • The container with the new version is started.

This approach is very close to a dual partition scheme but for a containerized application. At the beginning of this article, I explained how Docker uses delta updates to update containers. Unfortunately, the solution chosen by Docker is not atomic. Therefore, what are the other available solutions available:

  • Balena: This is the first solution developed to use a container approach to update IoT devices. Their solution is based on Docker, but with a twist. They use a proprietary approach to perform the delta updates (which solve the issue described at the beginning of this article).  But one key disadvantage of their solution is the way they update the Linux OS. They use a dual partition approach with all the objections described in the dual copy update section of this article.
  • FullMetalUpdate: This solution is based on RunC containers, SystemD for containers life cycle and uses OSTree to update either the Linux OS or the containers:
    • Runc is a container runtime compatible with the OCI standard. It is also the official runtime of Docker. Since it is based on a standard, RunC is only one of the available implementations. Which is good because if tomorrow RunC developers would stop maintaining it, you could switch to a… a different runtime that supports the same standard. Moreover, since this runtime is used by Docker, it can be considered stable and tested :).
    • OsTree is a tool to generate delta updates between different versions of a filesystem. It is based on client server architecture that can be compared to Git. Each update is fully atomic. Basically, it brings the possibility to achieve the approach of the Docker delta updates on embedded devices!

By integrating these two open source solutions together, FullMetalUpdate tackles the major challenges of OTA updates on modern IoT devices. With FullMetalUpdate, you can update the Linux OS and your applications with the minimum possible overhead on mass storage and minimize the bandwidth usage!

Pros and Cons – Container update

Let’s summarize the pros and cons of a container update system:

OTA Update Benchmark table 3

So, what is the ideal OTA update system for your IoT devices? FullMetalUpdate of course! It has been specially designed to address all the challenges described in this article. Learn more about our project in the FullMetalUpdate presentation article.

This article will give you an overview on the architecture of FullMetaUpdate and explain in detail why a new open source solution was needed.

And to get back to the original question, the right answer really depends on your use-case. Sometimes the simplicity of a single partition update system is the best solution. Sometimes it is not.

So, before you decide on the OTA update solution that you will use for your next project, the best answer is probably to consider discussing this topic with an expert. And what happens if we move on to Android, do we have so many options? No, even though Android is based on Linux, the way it can be updated is limited to A/B partitioning (or a recovery partition on older Android devices).

In the next article, we will describe how OTA updates are implemented on Android and, more interestingly, what mechanisms have been implemented to secure the process.

Cedric Vincent - Chief Technical Officer
15 January 2020