Data management
What if we could easily and massively gather measurable intelligence about the world? This is the promise behind every connected solution. But in order to be able to extract business value and formulate concrete reactions from all the collected information, attention to the design of the data-model is key. Let’s overview the main questions that need to be answer.
The different kinds of data
Some telemetries often belong to the category of “well-known” metrics in the IoT world. Classic sensors found on many objects are able to monitor the direct surrounding of a connected device such as weather conditions (temperature, atmospheric pressure, humidity, wind-speed), position in space (accelerometer, gyroscope, GNSS coordinates) or internals values (battery level, power consumption, storage space).
When a device is equipped with actuators, such as buttons, levers or a screen, it may be able to transmit traces about how a human is interacting with it. Beyond data closely related to numerical values, some heavy payload may be streamed such as a video feed from a surveillance camera.
For configurable devices, a common way to manage them is to link their settings to a digital twin. To reflect the desired properties managed by an administrator, the device need to emit back what they have applied in the form of reported properties. This kind of traffic is less verbose.
Also, for debugging sessions, a remote operator would also want to be able to fetch journal log entries or non-consumer facing data. The targeted device is expected to respond correctly to command sent on the fly.
Starting from these initial observations, the size and frequency of emission of the data will determine a set of constraints which will need to be confronted to the chosen technology of connectivity. Some collection patterns won’t be possible on a limited network with a small allocated bandwidth. Format of the payload and associated compression may alleviate some limitations a bit. For example instead of repeating identical emission timestamps they can probably be enumerated only once.
Security around the channel of communication should not be neglected. A lot of guarantees are strongly coupled with the elected encryption scheme: confidentiality of exchanges, authenticity of the emitters and receivers…
Time handling is central
The freshness of data can really matter for a category of devices while being totally irrelevant for others. Therefore, is it worth the trouble sending already stale data before the latest one if no exhaustive historization is needed? This kind of question can have direct consequences on the device caching strategy.
A device may also work in various modes and switch between them during its lifetime: sleeping (mute) when still in the original factory and being deployed on the field, in passive monitoring during regular usage or in an active mode, with high volume of collection and emission during emergencies for example.
When we’re zooming out at the scale of the whole fleet, the notion of priority evolves to another dimension. If a subset of all devices requires an increase attention during a period, the platform should be able to ingest their telemetries first and limit the bottleneck of regular ingestion during the treatment.
When a device is equipped with actuators, such as buttons, levers or a screen, it may be able to transmit traces about how a human is interacting with it. Beyond data closely related to numerical values, some heavy payload may be streamed such as a video feed from a surveillance camera.
For configurable devices, a common way to manage them is to link their settings to a digital twin. To reflect the desired properties managed by an administrator, the device need to emit back what they have applied in the form of reported properties. This kind of traffic is less verbose.
Also, for debugging sessions, a remote operator would also want to be able to fetch journal log entries or non-consumer facing data. The targeted device is expected to respond correctly to command sent on the fly.
Starting from these initial observations, the size and frequency of emission of the data will determine a set of constraints which will need to be confronted to the chosen technology of connectivity. Some collection patterns won’t be possible on a limited network with a small allocated bandwidth. Format of the payload and associated compression may alleviate some limitations a bit. For example instead of repeating identical emission timestamps they can probably be enumerated only once.
Security around the channel of communication should not be neglected. A lot of guarantees are strongly coupled with the elected encryption scheme: confidentiality of exchanges, authenticity of the emitters and receivers…
Time handling is central
The freshness of data can really matter for a category of devices while being totally irrelevant for others. Therefore, is it worth the trouble sending already stale data before the latest one if no exhaustive historization is needed? This kind of question can have direct consequences on the device caching strategy.
A device may also work in various modes and switch between them during its lifetime: sleeping (mute) when still in the original factory and being deployed on the field, in passive monitoring during regular usage or in an active mode, with high volume of collection and emission during emergencies for example.
When we’re zooming out at the scale of the whole fleet, the notion of priority evolves to another dimension. If a subset of all devices requires an increase attention during a period, the platform should be able to ingest their telemetries first and limit the bottleneck of regular ingestion during the treatment.
The main challenges of data management
Ingestion pipelines
Various strategies can be devised to connect the micro-services along the way. Should the focus be set on “Extract, Transform, Load (ETL)” where the data validation and calibration comes right at the beginning or is the “Extract, Load, Transform” more suitable, with raw data rapidly accumulated in data lakes?
First, the data may need to be decoded depending of the protocol and format chosen for the communication at the network layer. Then the cleaning step of non-conformant or corrupted values can be coupled to a normalization phase, to improve genericity between multiple generations of devices and ensure forward and backward compatibility.
First, the data may need to be decoded depending of the protocol and format chosen for the communication at the network layer. Then the cleaning step of non-conformant or corrupted values can be coupled to a normalization phase, to improve genericity between multiple generations of devices and ensure forward and backward compatibility.
Lifetimes
How long should the data stay in the system before being moved somewhere else or discarded entirely?
This interrogation have implications on aggregation and compaction tactics. The notion of time to live and the speed in which various typology of information has to be gathered guide our decisions about tiers of cold and hot storage solutions. A cold storage is perfect for archiving already treated data that is not directly relevant to day to day operations. Meanwhile, a quick database or cache layer is the only way to provide responses to services interested by recent data. It has an impact on the cost.
Note that in some industries, legislation may have the final say.
This interrogation have implications on aggregation and compaction tactics. The notion of time to live and the speed in which various typology of information has to be gathered guide our decisions about tiers of cold and hot storage solutions. A cold storage is perfect for archiving already treated data that is not directly relevant to day to day operations. Meanwhile, a quick database or cache layer is the only way to provide responses to services interested by recent data. It has an impact on the cost.
Note that in some industries, legislation may have the final say.
Generating business value with data management
For passive monitoring, the dashboards giving access to tables, histograms, charts must be both efficient and meaningful. If the selected visualization brings more questions than answers because it does not fit what type of data is being observed, then the goal is not met.
front-end applications
To provide the right level of intelligence, the needs of these front-end applications must be supported by back-end services gating the data (via REST APIs, GraphQL endpoints…). These servers themselves have to obtain the values from the storage layers.
Database type
Therefore, for optimal query efficiencies, the correct type of database must be put in place, usually column-oriented or timeseries. The partition and sharding must be able to give both useful metrics at the fleet level, but also quickly narrow the focus on specific devices.
System alerts
In the case active monitoring, it’s really important to receive quasi-live feedback in order to react to alerts from the system. Again, the notion of priority comes into play. The final medium of delivery (emails, notifications) depends on the criticality and the level of responsibility of administrators in the organization.
SUCCESS STORY
Manage Velan's IoT valves data
Velan needed support developing connected IoT valves to be deployed in the nuclear energy industry. The valves will provide reliable and up-to-date telemetry to customers. To make this possible, data from internal valve sensors needed to be migrated in real time to a custom web application dashboard, in a secure and accessible manner.
Witekio can help with your device data management
Witekio simplifies device data management, enabling businesses to efficiently capture, analyze, and act on connected device data. From telemetry collection to secure data ingestion, Witekio ensures your devices transmit actionable insights by designing robust data models tailored to your needs. Our expertise spans protocol selection, data storage optimization, and real-time responsiveness, allowing for seamless integration of telemetry with business intelligence. With Witekio, transform raw data into measurable intelligence that powers informed decision-making and rapid response.