The Build vs Buy Dilemma: Early Startup Edition


Note: Most of my perspective in this post comes from working with a small team of full-stack engineers that can “build anything” @ Prepared911


Some of the biggest mistakes I’ve made while working on engineering @ Prepared911 is thinking “we can build this ourselves”. I would see a cool open-source project that is free, that does “everything we want!”. I would justify using it, saying “we need to be scrappy and this is free!”

This was the start of what I now realize is the build vs buy dilemma and I had fallen into the build trap.

For a startup, where resources are limited, it’s not just building vs buying: it’s either learning how to build the solution itself, maintain it, and solve novel to you problems in the process or buying a product that has already solved most of those problems at scale.

Below is some of my thinking behind this dilemma for an early-stage startup and examples of decisions we’ve made, both good and bad, over the years to build or buy.


💡 Definitions:

  • I'm defining Building as creating something that could be time-consuming and may require continued maintenance.
  • I'm defining Buying as paying for that software offering as a service or using a prebuilt solution. `Buying` does not necessarily mean there is a cost associated.

Asking Yourself: “What are you in the business of?”


“A good plan violently executed now is better than a perfect plan executed at some indefinite time in the future.” ― George S. Patton Jr.

One of the engineering team’s core values is `Done > Perfect`.

We strive to put an emphasis on speed — constantly asking ourselves if there is a way to get 80% of the benefit for 20% of the work.

For every feature, we decide between quality and speed based on our confidence that this feature will impact customers:

  • If the answer is “highly certain,” we build for higher quality.
  • If the answer is “not certain,” we build for speed.

I’ve seen many exceptional engineers tend towards building for the most technically correct solution. However, the most technically correct solution is not always the correct solution for us right now. We challenge each other to put decisions in the context of the company as a business, not just the ‘satisfaction’ of the engineering team.

Many times, the most technically correct solution is not the right solution for right now.

When an engineer proposes ‘building’, we ask ourselves, is there a solution we can just buy that will get it done quicker?

Many times this involves “buying” that solution…


We buy —

  • to prove the value of an idea.
  • to decrease future headaches involved in maintaining and supporting an offering.
  • when we can get 80% of the value for 20% of the effort
  • when ‘good enough’ will suffice
  • when it’s quick to implement and strip away (highly reversible — Type 2).

We want to be fast and nimble, moving on to the next thing as quickly as possible, and buying really does regularly help prove value faster.

A simple question I’ve found that summarized this tradeoff idea of when we should be building vs when we should be buying is asking myself: What are we in the business of?

So what does that mean in practice?

What are we NOT in the business of?


It turns out, we are not in the business of A LOT. Below are some examples of things that I don't believe we are in the business of, and many of the times we chose to build when we should have bought from the beginning, and ended up scrapping the whole projects for a paid solution.

🔑 Key: <br />
<span style="color: red;">[Red]: We incorrectly chose to Build first. </span> <br />
<span style="color: orange;">[Orange]: We started with Build but quickly switched to Buy.</span> <br />
<span style="color: green;">[Green]: We correctly chose to Buy.</span>

____ [2-Person Eng Team] ____

<span style="color: green;">[Green] Authentication with AWS Cognito:</span>
TLDR: We are not in the business of user authentication.

  • It has all the compliances, we offload the burden of compliance to them! Now a password breach is much less likely! Most of the gripes we have (like not being able to change emails without issuing new passwords) are best security practices.
  • Was amazing for initially getting the project out and we could not have built our own auth process back in the day. It got the auth portion out in hours so we could focus on the actual features.
  • <span style="color: red;">[Red] [4-Person Team] Switching Off of Cognito:</span> As we grew, we started getting annoyed with Cognito’s restrictive auth system and we debated switching off of it to our own custom-built auth service. We started putting engineering effort towards building it, but after 3 weeks we scrapped the entire project because we realized that we were just implementing Cognito. By using the current solution we could accomplish all the same goals with less effort. After starting to build it ourselves, we understood the complexities that Cognito was handling for us and the things that we had taken for granted that they had already solved.

____ [4-Person Eng Team] ____

<span style="color: orange;">[Orange] Infrastructure Monitoring with DataDog:</span>
TLDR: We are not in the business of hosting our own system analytics.

  • DataDog is expensive — like crazy expensive.
  • We originally saw the price of DataDog, got mad, and decided, we can just host Grafana + Loki and get the same thing! This was a mistake… We spent a month of one engineer's time trying to get everything hooked up.
  • After that month, we cut the project and bought DataDog. We realized that the cost of development to make it work properly and maintain in the long run would just not be worth it.
  • An added benefit: using DataDog instead allows someone less familiar with how monitoring software work to navigate and answer questions. This empowers the less dev-ops-enabled engineers (like me!) to navigate the entire stack, or find and diagnose issues. Paid solutions like theirs also have a dedicated support team to solve your issues.
  • This is the only decision that is actually breaking the bank… and we regularly debate if we should switch off (see Sunk Cost Fallacy bellow)

<span style="color: green;">[Green] Product Analytics with Mixpanel: </span>
TLDR: We are not in the business of making our own product analytics

  • Using Mixpanel for analytics empowers the product team to answer any question that they may have, without the need to involve an engineer (or rarely as they request the connection of new information)

<span style="color: red;">[Red] UI Library  + StorybookMUI: </span>
TLDR: We are not nearly big enough to be in the business of making our own style system.

  • We made the initial decision to use custom components and storybook too early on — which made it harder for other developers to get the hang of our systems because we had to maintain Storybook at the same time as building new interfaces that we were destined to scrap (and did scrap over the year). When we switched to MUI, it was a boost for developers to build simple interfaces quickly.

<span style="color: orange;">[Orange] Custom Onboarding Flows ⇒ AppCues </span>
TLDR: We are not in the process of building out bulky onboarding flows in code.

  • We leverage AppCues to empower customer success to build and edit training workflows without touching engineering.

_____ [6-Person Eng Team] _____

<span style="color: green;">[Green] Customer Success Monitoring ⇒ Vitally</span>
TLDR: We are not in the business of creating customer health scores and notifying CS of those scores every week.

  • Instead of having engineering build customer health scores, we implemented vitally to empower customer success.
  • We built "good enough" health scores inside of Mixpanel until we had a sufficient quantity of customers that the system was starting to break down. Implementing Vitally has been incredibly helpful for tracking successful implementations of our software.

So, what are we in the business of?


To determine “what we are actually in the business of”, I look towards the company as a whole: our mission, roadmap, as well as our engineering invariants:

Our Mission: Provide every person with access to lifesaving technology.
Our Engineering Invariants:
Compliance, Security, Reliability

We are in the business of providing a highly secure, highly compliant, and highly reliable infrastructure to 911 centers, allowing them to receive information from 911 callers in real time. This means we build only things for those areas in-house. Below are some examples of systems where hosted 3rd party solutions exist, but we chose to build in-house:

<span style="color: green;">[Green] Live Video Providers:</span> We find live video to be core to our system so we don’t outsource that functionality to a hosted provider like 100ms or Dyte. We also see offloading those to a third party to be a barrier to the reliability and security standards we hold ourselves to. They are unable to meet our strict security, compliance, or reliability invariants, and thus we must build it in-house. Thus, we are in the business of implementing our own live video systems.

<span style="color: green;">[Green] Our Deployment Process and Infrastructure:</span> We are in the business of making sure our system is secure and reliable, building it ourselves forces us to understand our systems inside and out, ensuring we know the gaps that exist.

  • These portions fell within all 3 engineering invariants and because we took it seriously from day 1, we were able to identify gaps, fix them, and meet very specific compliance frameworks incredibly quickly. Writing our entire infrastructure in code allows us to have a significantly more auditable infrastructure and security posture.

<span style="color: orange;">[Orange] Automated QA system</span>

  • We originally decided to write our own QA library in-house to be able to do the very custom things we want to accomplish in our QA process. We built the whole thing because we correctly believed QA was core to our business.
  • However, we were soon overburdened with having to build new features in it, and realized it would be much easier to implement an 80% solution that was open source. We ended up scrapping the entire 6-week project in favor of building on top of an open-source solution.
  • We still build our QA process in-house, but leverage more common open-source solutions, and are ok trading off ease-of-use for a solution that is not perfect. This incentivizes engineers to actually write tests and adopt the framework.
  • Reliability is core to our company’s success as our software is used 24/7 in 911 centers across the country. We NEED to ensure every deployment does not introduce bugs or take down our systems. Thus, we are in the business of QA.

In general, we don’t offload any of the core functionality of our system to third parties as that would limit our understanding of core functionality, and our ability to modify it.

The Sunk Cost Fallacy


Once a decision is made, and effort is put towards it, and it is not going well, it is really hard to give up that time spent and realize there may be a better decision.

One concept I have been historically bad at is knowing when too much is too much. If an engineer you trust is telling you they are only a few days away, every day for 2 weeks, it is hard to not fall into the sunk cost fallacy. I’ve fallen down this trap many times. It may actually be faster to switch, and has been historically for us, than wait for the project in its current form to be done.

We try to always be ok with the re-evaluation of a previous decision. For example, we regularly reevaluate the DataDog decision (see above). DataDog is expensive. However, each time we have evaluated the service, we realize switching to another provider would take too much time, be a worse experience for developers, and pull us away from our core competencies. Thus it would not be worth the cost. We are just NOT in the business of hosting our own system analytics.

Some strategies I’ve found to help get out of this sunk cost fallacy is:

  1. Time-boxing decisions and trials for ‘building’ and ‘buying’ is super helpful. We call these SPIKES.
  2. Starting small: Releasing stuff following the Crawl —> Walk —> Run strategy really helps you realize when something is not worth the effort. If you try to get it out in the crawling stage and are unable, that may be indicative of the effort to make it walk or run in the future.
  3. Not growing attached to your work and being comfortable giving it up and closing the PR (and not just leaving them unmerged…).

Conclusion


It has become easier and easier to ‘buy’ more and more portions of a startup’s business.

There has been a real commoditization of startups in recent years and decades, where almost every piece of your business can be purchased as a service. More often than not there are multiple vendors of the same service to choose from that all serve different niches. Odds are, if you have a problem, there will be a service that can get you there faster and cheaper than building it yourself. I’ve found that the tools you buy have a low learning curve and empower non-technical people to do technical actions through a powerful GUI (DataDog, Mixpanel).

More often than not, people are more expensive than any service you will use (except DataDog and cloud providers ☁️).

More and more recently, we have opted to buy things. This allows engineering and product to focus their effort on things that make our product wow customers and give us an edge over the competition. The less we build ourselves, the more we are able to focus on what matters.