Anatomy of an Enterprise Infrastructure Unicorn Product

I analyzed >150 unicorn enterprise infrastructure companies, both exited and private, to understand the kinds of products that can support a $1 billion+ business. Note that the data includes startups that have raised at a unicorn valuation or has IPO-ed/acquired for that amount.

For myself there are some interesting takeaways here. But to simplify, we can boil down most of these products into a few key themes for founders looking to build a unicorn in this space:

Your product should touch production compute or production data, or secure them.

  • For compute products: Monitor traffic or support compute on bare metal

  • For data products: Store, transport, or govern mass amounts of data

  • Security and auth are needed for access to both compute and data

The Core Themes: Secure, Store, Run, Monitor, and Govern

The primary thread running through most successful enterprise infrastructure companies is a strong focus on one or more of the five core tenets: security, storing production data, running production compute, monitoring, and governing access to data or compute. These five aspects form the backbone of an effective infrastructure strategy.

Attributes of Unicorn Products:

As I was looking at the the defining product for each company, I couldn’t help but notice that most of them had two following attributes:

  1. Innovation moat around a paradigm shift. The product was/is a next generation product that took a lot of engineering effort to make it possible in the first place, usually released at inflection points where paradigms of compute were changing (On-prem to Cloud, local apps to SaaS, etc). Before Databricks (Spark), it was practically impossible to run transformations on massive amounts of data. Confluent (Kafka) introduce real-time streaming event infra to the world. Snowflake was built as a new paradigm cloud-native data store with governance controls.

  2. They touch a large portion of production in some way; whether it’s monitoring all of a company’s services, or storing or analyzing exabytes of data. The scope is vast for most of these products.

There were very few products that are a wrapper on another piece of technology and merely makes the tech easier to implement or abstracts it away. I also noticed that there is a lack of products that focus on a specific niche of compute. For example, ML model-only monitoring, framework-specific app hosting, etc.

Core questions to ask:

  1. Is there currently a paradigm shift?

  2. If so, can I build a next-gen product around compute workload, security, data store, monitoring, or governance that is needed in this new paradigm?

Takeaways to Build Billion Dollar Products:

Each day exabytes of data are created. The more product data your product stores, the more valuable it is to the customer. Try to store as much production data as possible. Many data store company in the dataset usually had some sort of unfair advantage in the way that they ingest or process data that was novel at the time they were scaling. For example, MongoDB was differentiated as a next-generation NoSQL database.


Manipulating and governing data, along with providing user-friendly tools for data processing (such as pipelines, ETL, and workflows) is a massive market. Nobody wants to build internal data management and verification tools or infra. Devs would rather be building the product itself. Most of the unicorns built next-gen data management capabilities. For example, Databricks at its founding was a managed service on top of Spark (compute over distributed data sets).


Most of the observability companies in the dataset are monitoring a large proportion of production traffic, not just a subset (such as ML models only).


Successful CI/CD startups own the entire build and deploy stack. It’s probably not a big enough market for a product that only focuses on a subset of that stack.


When new paradigms of computing and data infrastructure emerge, founders should consider how changes in authentication and access to compute resources might introduce new opportunities to build an identity or auth startup. For example, the shift from local apps to cloud-native SaaS apps opened up a huge opportunity for Okta.


For AI models, focusing on next-generation models is advised. Democratizing non-deep tech models can be challenging to achieve substantial success and a significant exit in this space.


Only a small percentage of infrastructure unicorns open-sourced their products, with data stores being the most common archetype. Entrepreneurs should take this into consideration when developing their business models.


Takeaways from Underepresented Verticals:

Avoid focusing solely on non-production aspects of the developer lifecycle, such as ML experiments and local devtools. Successful products tend to prioritize solutions that directly address production needs.

I noticed that for CI/CD platforms, the ones of this list the entire software development lifecycle. Everything after code check-in to build, test, deploy, and gitops. Although they’re not running compute themselves, they need to own as much of the post-check in developer lifecycle as possible to capture as much value as possible from customers. It’s likely not a big enough market to support a unicorn that focuses solely on test/build/deploy.


Don't limit the scope to a small portion of compute in an observability product. Consider comprehensive solutions that cover various aspects of observability, including computation and data.


Building a unicorn around a compute-only product, such as search, without owning the underlying data can be challenging. Search was an interesting vertical on the underperforming list (Algolia, Coveo, etc). Although they touch production data, they don’t really own that data themselves.

Lack of end-to-end managed compute products

My most surprising finding was that many managed cloud products tend to be “base layer” products (Docker, K8, VMs, etc) that focus on supporting compute on bare metal. There are very few products in the dataset that manage the entire application compute end-to-end (most notable one here is Vercel).

From my time as a back-end engineer, managing compute in an end-to-end manner is quite difficult to do. Each application is different and has different requirements in terms of scalability, uptime, security, networking, and even application type. The application infrastructure will look different for an internal ML API vs external application that has a 99.999% SLA. It’s difficult to build an infra deploy management platform that can account for all of these different facets. As a result, it shows in the data that it’s easier to build a big business around “base layer” compute products (Docker, K8 - Redhat) that focus on enabling compute on bare metal that can apply to any kind of workload.

Final Thoughts

Keep in mind that this list is largely a lagging indicator of what types of products have been successful. These are not hard set rules or guidelines on what types of products may or may not support a unicorn sized company. However, this exercise reveals some common themes that runs through most of these successful products. These themes are essential for founders to keep in mind as they evolve both their product offering and company into a titan in the enterprise infra space.

Next
Next

Liquidation Preference - Explained