Someone once told me, and it’s something I’ve found to be a good guide is that you should always have a thorough understanding of one level deeper than where you’re mostly working.
That is a good rule of thumb, and certainly where I was coming from. Whilst I program in HLLs (Java these days) I have a very good understanding of machine architecture by coming up the other way, and it has helped tackle really obscure problems by dropping down and seeing what the machine is doing.
It’s a bit like me thinking I am (hopefully) a better System Engineer, by first being: apprentice; electronics craftsman; research engineer, project support, project manager and then (after reverse op ) back to system engineer. I know system engineers who have come straight in at that level after doing a qualification in it and some of them are OK, but some are hopeless. All of that background and cross domain skills give me a lot of insight.
Whilst nobody did ask I will quickly reminisce on my C application problem which I only solved because I could “go one level deeper”. It was fun thinking about again, and it’s the sort or challenge you will never forget!
It was a program that received data on a serial port from an external sensor every 50ms, decoded it, processed it, displayed it as part of an operator in the loop control system. The serial port handler was operating on an interrupt driven basis, which introduced an asynchronous nature to the code.
This was all set up and programmed in Microsoft Programmer’s Workbench (remember that?) and CodeView as a debugger
It was all working nicely other than a few times a day a random crash would happen being reported as a stack overflow error. That “bugged” me as I thought the stack size was plenty as there were not that many nested function calls (and no recursion to blow the stack). So I increased the stack size and the random crash was still there. Over a week I kept looking at it, increasing the stack size to the maximum (64K segment), and the crash was still there. If I changed to polling the serial port (not ideal) the crash did not occur.
Of course we know that software has no random failure mechanism like hardware does. It is in fact 100% deterministic when the input conditions into a bug are satisfied. This crash was obviously something to do with the interrupt driven nature leading to some condition that caused a stack overflow.
But why? Well I dropped down to debugging in assembler view to try and catch it, and and one day when I was stepping through the interrupt code at the assembler level I noticed that the segmented stack frame address had changed completely from what I had seen in the previous run. I continued stepping and the program crashed.
I could then of course instrument the code and dump some diagnostics and every time the program crashed, the stack pointer was not in the 640K of user memory we had at the time that the C compiler normally set up the EXE’s stack within. It was in memory normally reserved for MS-DOS!
So, what was happening? To cut a long story short, as this took a while to discover(!), the crash never happened when the program was in my own C code. It happened when my code called an MS-DOS IO function, such as updating the display, which was achieved via INT Functions. What was causing the crash is that MS-DOS when it is in an INT function caches a program’s stack pointer and sets up its own, restoring the program’s stack pointer when it completes. The MS-DOS stack area was something ridiculously small, like 5 bytes!! So if data arrived on the serial port and triggered my interrupt handler whilst the code was in an MSDOS INT call, such as updating the user display, then my interrupt function’s local variables (about half a dozen from memory) that are created on the stack were enough to overflow the MS-DOS stack!
Man, that was a “I do not believe it” moment. Why would MS-DOS do that? I never of course found the answer to that, but the solution was to, again at low level and some inline assembler in the C code, set up my interrupt handler to mimic what MS-DOS was doing and allocate its own stack on entry and restore whatever stack it was (application or MS-DOS) on exit. I was then back in control of the stack size, could ensure it was large enough and Failure mode removed!
That is one I do not think I ever would have caught without a knowledge of processor architecture, assembler, how a stack works, etc.
All good fun!
I haven’t looked at C++ for 20 odd years, so do not know how it has evolved, but I certainly liked Java’s approach to take C++ as the starting point and then simplify the language, throwing out things like polymorphism, operator overloading, etc, as “clever features” that could cause a lot of confusion once you have been away from the code for a while, or are new to it with poor documentation. And the Java fathers argued that you could write perfectly good and more maintainable without them. I certainly have not missed them!
Evolving languages is a double edge sword. A decade ago I was programming in LabVIEW - a really nice paradigm to an ex-hardware engineer - and I was programming FPGAs for time critical systems in LabVIEW FPGA (this was for a contract that brought me down under for a few years @brad ). It was really nice to think purely in data flows via virtual wires into functions (like wiring components that did things). Very neat. As I understand it, people are now turning away from LabVIEW as National Instruments have over complicated it and turn it a framework architecture that is too complex, and they have (I am told) lost the original paradigm that made LabVIEW popular in the first place.
PS @Brad good call to give this it’s own thread!