The Kernel Migration Field Guide: Backporting, Blobs, and Beyond

Newsletter

Table of content

Regulatory pressure from the Cyber Resilience Act (CRA) has turned a simple truth into an unavoidable reality : keeping an embedded kernel “frozen” for years is no longer a viable strategy.

In the embedded world, jumping to a recent Long Term Support (LTS) version is almost never a plug-and-play update. You aren’t just moving code; you are digging through layers of accumulated debt, undocumented vendor hacks, and undocumented dependencies.

To help navigate this, we’ve put together a structured playbook based on how we tackle these projects and problems in the field:

  • The Pre-Flight Audit: Doing a deep dive on the existing code base to have a deep understanding of the project.
  • Strategic Arbitration: Deciding between a full migration or surgical backporting.
  • Technical Execution: Managing API drift, orphan drivers, and build environments.
  • Validation: Implementing a robust test suite to go from “just booting” to proven functionality.
  • Organization & Documentation: Ensuring the project remains maintainable for years to come.

Engineering teams usually face a choice: upgrade and risk breaking the platform, or stay and accumulate security debt. Common triggers include:

  • Regulatory Compliance: CRA, NIS2, or industry-specific cybersecurity audits.
  • End-of-Life (EOL): Kernels that no longer receive stable upstream patches.
  • Hardware Evolution: New components requiring recent drivers or vendors dropping BSP support.
  • Maintenance Costs: The effort to maintain a heavily diverged kernel becomes untenable.

Phase 1: The Pre-Flight Audit

Before touching any code, you must understand what kernel you are truly running. Embedded kernels are often composed of multiple layers: Upstream Linux at the base, the silicon vendor BSP, OEM modifications, and finally product patches.

Binary Blobs & Drivers

Proprietary drivers (WiFi, GPU, DSP, hardware accelerators) are your hardest constraints. For each one, you must evaluate:

  • Is it ABI-locked via vermagic?
  • Does the vendor provide updates for the target version?
  • Can you legally redistribute a compatibility wrapper?

 

If the answer to these is “no,” that single blob can block the entire migration. Contact your vendor early; some provide updated binaries upon request.

Witekio Tip: In-Tree vs Out-of-Tree
One afternoon spent listing every out-of-tree module can save weeks of wasted work on a path that turns out to be dead.

Patch Inventory

You must classify every commit that separates your kernel from upstream.

Patch SubjectUpstream StatusAction
Fix race in irq handlerMerged in v5.4Drop
Custom SoC power mgmtNot upstreamPort
Workaround for DMA bugObsolete (hw rev B)Drop
CVE-2021-XXXX fixMerged in v5.10.42Verify

As you build this inventory, the table will inevitably grow. You will find undocumented patches, squashed commits, and modifications that nobody remembers adding. This is precisely the audit working: this “software archaeology” is how you transform a legacy black box into a controlled system. Finding these shadow areas now is the only way to prevent them from becoming critical blockers during the build phase.

Userspace ABI

Interfaces in /proc, /sys, and ioctls change between versions. When they do, daemons can break silently without logs or panics.

Witekio tip: Trace your critical services before you start. Knowing exactly how your applications interact with the kernel prevents weeks of blind debugging after the first boot.

Phase 2: Migrate or Backport?

Many teams end up doing both: backporting for immediate needs and migrating in parallel. Fine, as long as it is a deliberate decision. Consider this:

  • Full migration makes sense when the kernel is truly EOL, multiple subsystems need updates, or the patch delta has grown so large that maintaining it costs more than migrating.
  • Targeted backporting makes sense when the exposure is limited to specific CVEs, a full upgrade triggers re-certification, or the compliance deadline is next quarter and the migration would take an excessive amount of time.

 

Phase 3: Surgical Execution

“Cherry-pick the upstream fix, rebuild, and ship.” That is the theory. In practice, execution is where you encounter the friction of reality. Success in this phase requires navigating three main obstacles:

  • the Dependency Trap, where a fix compiles but fails because a prerequisite structural change was missed;
  • API Drift, where internal kernel functions evolve or vanish between versions;
  • and Orphan Drivers, those critical legacy modules abandoned by vendors that you must now keep alive.

 

To move forward without breaking the platform, you need a surgical approach to code integration.

The Dependency Trap

In practice, most CVE fixes depend on prior refactoring commits. A null-deref fix in usb_submit_urb might need three earlier commits that restructured the error path; applying the fix alone may compile and pass basic tests, but the vulnerability remains because the prerequisite was missed.

  • Mapping the chain: Use git log –ancestry-path scoped to the affected files.
  • Short chains (3-5 commits): Apply these in chronological order.
  • Long chains (dozens of structural changes): Choose to adapt the fix manually or partially refactor the subsystem.
  • Documentation: Always document the rationale behind your choice.

 

API Drift & Performance

APIs that break between v4.x and v5.x:

  • access_ok() lost its type param in 5.0
  • ioremap_nocache() gone in 5.6
  • tasklet_init() became tasklet_setup() in 5.9
  • pci_set_dma_mask() gone in 5.18
  • set_fs() removed per-arch: x86 in 5.10, ARM in 5.18

 

Know the exact version boundary. When a cherry-pick conflicts, you need to tell whether the conflict is from the fix or from unrelated API drift.

Furthermore, a security patch that adds validation on a hot path can cost you 20% throughput.

 

Orphan Drivers & Wrappers

What happens when a vendor stops maintaining a driver, but the hardware is still in every unit you ship?

Wrapper Strategy

Isolate the driver behind a compatibility shim in a single header file (e.g., driver_compat.h). This file contains all version‑specific macros, leaving the core driver code untouched and preserving test coverage. When you eventually reach a kernel version where the shims are no longer needed, simply delete the file.

Change Only What Breaks

If a deprecated function is still present in the target kernel and merely triggers a warning, consider suppressing the warning rather than rewriting the call. Any functional change introduces a risk of regression; therefore, only address issues that cause compilation or runtime failures.

Legacy Bugs

New kernels often reveal memory bugs that older kernels silently masked. Enable KASAN, stress the driver through repeated load/unload cycles, and classify every reported issue.

Do not ship with unclassified KASAN findings; they will inevitably resurface as customer filed bugs six months down the line.

 

Phase 5: Build & Validation

A successful migration isn’t defined by the first successful boot; it is defined by the stability of the system under production conditions. This phase transitions from “getting it to work” to “proving it works.” It requires a shift in mindset from development to rigorous quality assurance, ensuring that the new kernel meets or exceeds the performance and reliability established by its predecessor.

 

Toolchain & Build: The Foundation of Reproducibility

Before compiling the final image, you must secure your build pipeline. Inconsistencies in the build host or compiler versions are the primary source of “it works on my machine” bugs that are nearly impossible to track down in production.

  • Pin Everything: GCC version, cross-compiler, and host tools. Containerize the entire environment and commit the Dockerfile to your repository.
  • Version Sensitivity: When upgrading GCC, expect new warnings. Modern compilers (like GCC 12) will error out on code that older versions (like GCC 8) let slide, particularly regarding -Wimplicit-fallthrough or -Warray-bounds.
Witekio Tip: Compilation noise
Enable comprehensive GCC warnings (e.g., -Wall -Wextra) from day one. Fixing 200 warnings early is routine; doing it the week before release is a crisis.

Validation: Beyond the First Boot

Verification must be proactive and data-driven. A kernel that boots is merely a starting point; a migrated kernel is one that has been stress-tested, benchmarked, and hardened against regressions.

  • Boot stability: Perform 100+ reboot cycles via a persistent systemd service.
  • Network/storage performance: Measure network and storage throughput against pre-migration data. A 5% drop in throughput is a genuine regression, not random noise. If you see such a drop, investigate and identify the bottleneck early.
  • 72-hour stress test: Run concurrent CPU, memory, and I/O loads. Race conditions and memory leaks rarely appear in short bursts; they are often only reproducible under high system stress.

Hard-won Tips & Pitfalls

Large Version Changes
Migrating across multiple major kernel versions in one move means that if a regression surfaces, the root cause could lie anywhere within a vast range of changes. Rather than searching blindly, rely on bisecting to identify the exact commit that broke the behavior. This turns a potentially chaotic debugging effort into a systematic, efficient process.

Silent DT Changes
Device Tree bindings evolve between kernel versions. A node accepted in v4.x may be ignored or partially parsed in v5.x without any warning. The hardware still probes, but initializes with incorrect parameters. Always run dtbs_check against the target kernel bindings. Also note that many bindings moved from .txt to .yaml between 5.0 and 5.10, which introduces a dependency on dt-schema.

The oldconfig Trap
Running make oldconfig can silently reset renamed or split CONFIG_ options to their default values. Security features may disappear, drivers may be disabled, and subsystems may change behavior without any visible error. Always diff the configuration before and after.

Late Testing
Testing added in the last week finds bugs with no time to fix them. Establish a test baseline before starting the migration, for example with LTP, and run it after each major step. Continuous validation prevents late surprises.

Phase 6: Organization & Documentation

A kernel migration is as much a project management challenge as it is a technical one. The complexity of the Linux kernel means that unforeseen issues are not a possibility, they are a certainty. Navigating this phase requires a framework that balances technical transparency with rigorous documentation to ensure long term maintainability of the platform.

Managing Expectations: The Reality of Scope

One of the most critical realizations during a migration is that scope grows. As the project progresses, you will inevitably uncover patches nobody knew about, blobs nobody documented, and services depending on interfaces nobody mapped.

Budget for this technical debt from day one. If your audit reveals 200 custom patches instead of the 50 initially estimated, that is the audit doing its job. It is far better to adjust the timeline and inform stakeholders early than to discover these dependencies during the final integration phase.

Strategic Control: Milestone Gates

To prevent “scope creep” from turning into a project failure, you must establish explicit Go/No-Go gates. These points of control ensure that you never move to the next stage of the migration without a stable foundation.

  • Post-Audit Gate: Is the migration path clear? Have we identified all blob blockers?
  • Post-Intermediate-LTS Gate: If following a stepped migration, does the LTP (Linux Test Project) suite pass? Are performance baselines met for this version?
  • Post-Target Gate: Is the full validation suite green?

Any failure in a quality gate must immediately pause the migration process. While a schedule delay creates internal pressure, shipping a regression affects customers and damages trust. That is far harder to recover from.

Vendor Relations: Ask Before You Build

If your platform depends on a vendor BSP, proactive communication is mandatory. Ask early whether they plan to support your target LTS version. If a proprietary blob appears to be a blocker, contact the vendor before you invest time in building a complex wrapper. In many cases, updated binaries or internal roadmaps are available to partners upon request, which can significantly alter your technical strategy.

Documentation: Defeating Technical Debt

The most overlooked aspect of kernel migration is the documentation of the “Why.” Teams rotate, but embedded products remain in the field for years. The engineer who wrote a specific patch today will likely not be there to explain it during the next security audit.

To prevent your kernel from becoming a black box of technical debt, every non-upstream patch must be documented with:

  • Upstream Origin: Which commit does this come from?
  • Rationale: Why was it adapted?
  • Alternatives: Which other approaches were rejected and why?
  • Measured Impact: What is the performance or stability cost?
  • Exit Condition: When can this patch be safely removed?

Without this context, you aren’t just migrating a kernel you are passing an unexploded bomb to the next maintenance team. Proper documentation ensures that the knowledge remains even after the project ends.

Conclusion

Kernel migration projects are rarely simple upgrades. They uncover years of accumulated technical debt, vendor modifications, and hidden dependencies that have built up over time. Yet the outcome justifies the difficulty. A platform aligned with modern Linux development becomes easier to maintain, easier to secure, and easier to evolve. Instead of maintaining a frozen kernel that grows increasingly fragile over time, engineering teams regain control of their platform lifecycle.

sarah arroudj
Sarah Arroudj
Embedded Software Engineer
Sarah is an Embedded Software Engineer working across application and system layers on embedded Linux platforms, with a focus on Qt/C++ development, Yocto-based systems, and migration projects.

DISCOVER OUR LATEST ARTICLES

GUI on MCU Hero Banner
GUI on MCU: what a drawing app taught us about MCU/MPU trade-offs
06/12/2026
remote device hero
Maximizing efficiency with Remote Device Management (RDM)
06/05/2026
Businessman holding signs with a checkmark and a cross, symbolizing making the right choice.
ThreadX vs FreeRTOS vs Zephyr: Choosing the right open source RTOS
06/05/2026

Newsletters
Signup