Hardware is ... hard

^ Hey that’s me! Don’t ask what I’m doing though … I promise it was important.

Hardware is … hard

Most of my career has been spent right on the boundary between hardware and software. I’m talking bare metal microcontroller programming with custom hardware. I can recall a lot of times overhearing a conversation between some of my coworkers who were working on desktop of web projects and hearing phrases like:

Was there anything in the logs?

What did the stack trace say?

and the ever present

I found a package that does that.

Meanwhile on my side of the room, you might here phrases like

I can’t even get the debugger to connect.

Fuck! Memory leak…

and finally

HOLY SHIT. THE LED IS BLINKING!

The point I’m getting at is that the closer you get to the hardware, the harder software development can get and the less tools you have at your disposal.

Marcy says: Call the jelly school

He’s just jealous of you, web devs. I say he should get off the cross already.

— Marcy, Resident Cat

Alright yes, I’m jealous but I’m also … elitist? I think many embedded developers take pride in working and delivering under tight constraints like this and I’m no exception.

One thing I’ve noticed in my transition to product management is that being a successful PM of a hardware product means being aware of the constraints unique to the environment. Blindly following what the “best practices” are for PMs can be disastrous because many of them don’t apply. So let’s talk about what makes hardware different and how we can build better widgets that help people.

The gotchas

Let’s start with a few key things that make hardware products different from your traditional SaaS application.

Development is brutal

Writing firmware on a bare metal MCU is unforgiving to put it mildly. And to be clear, I’m talking about bare metal, not an embedded Linux device. No kernel. No OS to help you. Dereference a bad pointer? HardFault. Didn’t allocate enough stack memory for a thread? HardFault. Access a peripheral that isn’t enabled? HardFault.

Marcy says: You blew it

For the uninitiated. A HardFault is a catch-all exception. Throwing one means your tiny hooman brain made a mistake.

— Marcy, Resident Cat

Oh I’m sorry, were you hoping for a stack trace after that error? Probably not gonna happen if it was memory related. Want to log the error? Well you’re in a privileged interrupt so … get fucked.

Better start throwing breakpoints around to find the problem. Oh and by the way, you only have like, 3 of those…

Nothing will make you understand memory management like writing some firmware in C for an ARM microcontroller. Mistakes are easy to make and hard to debug if you’re not careful. Speaking of mistakes…

Hardware mistakes are costly

Modern agile software development (and by extension SaaS product management) is often premised on the idea that rapid experimentation and the mistakes that go with it is the fastest path to delivering value. And to be clear, that’s true! Small iterations with tight feedback loops and quick pivots are the best way to build software. But there are two assumptions that underlie that:

Mistakes are easy to undo.
The consequences of mistakes are relatively limited.

If what you’re building is a SaaS application, those assumptions are usually true. You can deploy a canary version, measure some signals and roll back the change if needed. A small percentage of your users experienced a weird new feature and that’s it. All is forgiven.

Let’s contrast with how a hardware design (even a prototype) gets built:

You spend a few weeks creating the schematic, PCB layout, and BOM.
You place the PCB order for a few prototypes (5-10) from a fabricator.
You wait a few weeks for the design to be built and shipped to you.
You power on the board … nothing works.

It turns out you forgot to connect one of the main power pins on the microcontroller. It’s a BGA package so you can’t connect it easily with a flywire. Congratulations! You just wasted $10,000 and 3 weeks and have a wonderful green plastic brick on your hands.

Mistakes in hardware are painful and expensive.

Some decisions you have to live with … for years

This is similar to the mistakes problem but takes it to a whole nother level. Once a device is in the field, that’s it. It’s out there in someone’s house or on a job site or who knows where. Which has a few fun consequences:

You can’t change the hardware. No one wants to send their device back to you for an upgrade or a fix.
Software updates trickle out slowly device by device IF your device has connectivity. If not, forget it.
You always have to think “will this work on an older hardware revision?” It’s like having to support every version of the iPhone ever made simultaneously.

But all is not lost

Despite all this … it’s doable! You can build and ship real things to people. But you gotta be careful. So here’s a few things you can do…

Instrument, instrument, instrument

The more diagnostic and debugging information you can build in from the ground up, the better. That means multiple logging outputs (serial, flash, API, etc.) with configurable severity levels for development and production. Once of the best things I ever wrote was a decent logging system that could feed to multiple outputs depending on configuration.

But it’s not just software. Put test points EVERYWHERE on your board. They’re free and they might save you! Not only for troubleshooting but also because they make great landing spots for jumper wires.

Pinout twice. Fab once

The most wonderful thing about software is that it’s easy to change. Look at this blog for example. If I make a typo or the layout is screwed up, I write a quick fix, push the code and boom. The site is updated for everyone, everywhere all at once. Sometimes hardware can’t be fixed and if it can, you may have to do a complex repair on every board you made.

So, you should be very cautious and take your time on hardware and system design. All of that move fast and break things bullshit has no place here. Are you sure that footprint of the IC is right? Why don’t you check the pinout of the MCU one more time. We definitely have enough UARTs right???

Each one of these things could costs tens of thousands of dollars and weeks or months of time. It’s worth taking the extra hour or two to double check.

There are one-way doors and two-way doors. Get good at knowing the difference

I know, I know. This one is a Jeff Bezos thing (at least I think it is), but it’s a great analogy. A two-way door is a decision that is easy to undo and has few long lasting consequences. A one-way door is just that: you can’t go back through it.

Hardware product development is a minefield of one-way doors. Once the hardware is in a customer’s hands, everything on that PCB is a closed door. There’s no going back.

And there’s even some more subtle ones in the firmware. Sure you have over the air updates, but what if the bootloader has a bug? Can you update the bootloader over the air? If not, you better be damn sure that thing is bulletproof.

Another tactic you can use is to do your best to turn one-way doors into two-way doors. Hedge your bets so to speak. For example:

Should we use chip A or B?

Maybe put the footprint for both on the PCB but only populate one. If chip A doesn’t work, you can easily switch to chip B.

Do we need to sell these in the EU or just the US?

Certify for both just in case. Fixing any problems now is way cheaper than retooling the manufacturing 6 years from now.

If you’re a PM, you need to get really good at identifying these risks and mitigating them. It could be the difference between a successful pivot and a flop.

But oh baby, when that LED starts blinking…

The struggle is real, but I can honestly say the juice is worth the squeeze, both personally and professionally. When that motor finally spins, or that device finally shows up on the dashboard, or the god damn green LED finally blinks … there is no greater feeling. The Javascript world doesn’t know what they’re missing.