System integration and debug: Go incremental or go “all up”?

There are two somewhat conflicting approaches for integrating and debugging a system, and you never know which would have been better – until the project is done.

Bringing up and debugging a complex system with multiple functions and ICs is a challenge, no news there. You’ve got to make sure all the components play with each other at the most basic interconnect level and that the circuit’s system-wide operation also functions properly. Doing so often requires considering the unavoidable characteristics in timing, interaction, and even shortcomings of individual functional blocks and ICs, to have the top-level features and performance meet the requirements.

There are two strategies for verifying basic-level interaction and compatibility as well as system-wide performance. The first and more-obvious alternative is to incrementally build up your system “brick-by-brick,” verify that things work at each additional step, and keep going until you are done, or at least in pretty good shape.

If a problem develops as you add a new function or component, you usually have useful clues as to the likely source: it’s due to the “something we added.”. This approach has been used successfully in many situations but has shortcomings and can become a series of small steps leading to misplaced full-system confidence. One reason is that the interaction among the many system functions is not a serial or linear process, while incremental build-up assumes it is.

The alternative is often referred to as “all up.” Here, you independently build and test every subassembly to the fullest extent possible. Then you assemble and integrate them (somewhat like a large puzzle) into the complete system as it will be in the final configuration and debug it from there.

If things don’t work as expected or desired – or don’t work at all, as may be the case – you start from some known point and work through the various corners to see what’s happening. Hopefully, you’ll eventually discover the source (or, in many cases, the sources) of your problem.

From a high-level perspective, all-up seems to be a hard way to go. It’s mainly a challenge in configurations where almost everything has to work – at least to some extent – for anything to work at all. That’s why of the two choices, incremental system build-up and debug, seems to make the most sense and be less frustrating than the all-up approach.

But the reality is that the apparently “sensible” incremental approach brings a different set of potential problems. Each time you add something, there are multiple sets of new interactions, hardware loads, software initialization steps, and subtle hardware/software issues that can and do arise. That’s why while incremental seems to be more logical and less risky, that may not be the case in many complex-system situations.

Start with very big

All-up has usually been used for physically large and extremely complicated projects, as shown by two well-documented cases. The first is from the excellent book, “Apollo: The Race to the Moon” by Charles Murray and Catherine Bly Cox.

Their insightful “step-back” perspective of Chapter 4 looks at the difference between aircraft engineers, early rocket engineers, and the NASA system engineers charged with making the moon landing happen. They note that NASA eventually realized that “you didn’t build brick-by-brick anymore; all that did was waste time. Whenever you added a new stage, the ground support equipment was different, the checkout procedures were different, the countdown was different, the hardware was different. You had to relearn everything anyhow.”

Another case is the development of the first nuclear-powered submarine, an effort driven by Admiral Hyman Rickover against a strong “it can’t be done” analysis (admittedly, with many valid reasons). When he argued for a nuclear sub, there weren’t even any land-based reactors generating electricity full-time in the world; there were a few highly experimental ones.

Even after he sold his idea of a nuclear sub, the Nautilus, the conventional wisdom was that they should first build a large-scale, easy-to-access prototype reactor using the same principles as the sub-based one would use, as countless innovations were needed and issues to resolve). Then, only after that one was operational and tested would they “scale it down” to fit in the sub.

Rickover said that was a bad idea in many ways, as noted in “The Rickover Effect: How One Man Made a Difference” by Theodore Rockwell.

He reasoned that a whole new set of problems would arise from scaling it down: piping, access, control, materials performance, reactor performance, fluid flow, thermal issues, safety, observability, and more. He insisted and prevailed, and the prototype model, with its enormous learning curve, was sized to fit and function the same way as the final unit would in the sub. In that way, the actual production unit for the sub could be built as a “copy” of the prototype and, in fact, was built almost concurrently, lagging each step of the prototype by only a few weeks.

Making the choice

Of course, most engineers are not working on projects having the scope of those two. But today’s design situation is that even physically small projects have many complicated functional blocks, such as sensor front end, user interface, data processing, and communications link. As a result of the interaction among these blocks, the all-up approach may be the most viable approach. This is especially the case since for wireless projects extending into the wireless range, much of the actual (not modeled) performance is highly dependent on subtleties of the physical layout. Ironically, the incremental approach may actually be detrimental and not indicative of the final performance.

So, which is better: incremental test and debug, or the all-up approach? The answer is the same as with so many things in engineering: “it depends.” You never really know except with the benefit of hindsight.

To me, that’s a side-effect of the incremental vs. all-up decision because there are always after-the-fact people and committees who will look back with the benefit of such hindsight and proclaim, “maybe you should have done it the other way” to make themselves look good. The excellent book “Debugging: The 9 Indispensable Rules for Finding Even the Most Elusive Software and Hardware Problems” by David J. Agans clarifies that debugging is hard, while after-the-fact “aha” insight is much easier.

Not surprisingly, it’s largely about tradeoffs and balance. In many cases, the best design, integrate, and debug strategy is some blend of both incremental and all-up, but with the dividing line between them to be determined, and with some iterative back-and-forth aspects as well.

Related EE World Content

Engineering the atomic submarine, Part 2: One man’s audacity, determination, dedication revolutionized naval reality
Engineering the atomic submarine, Part 1: One man’s audacity, determination, dedication revolutionized naval reality
When audacious engineering leads to major success, Part 3: Apollo Mission profile
How too many tradeoffs can kill a project