Platforms Need New Building Blocks
Junction Labs’ mission is to empower platform builders with a service to service networking platform that bridges the developer-operator divide. This is essential to fixing how much applications teams have become bogged down by their infrastructure. In this post we explain why.
Applications Are Bogged Down By Infrastructure
Prior to founding Junction Labs, I spent four years as SVP of Core Engineering at Datadog. At my time of leaving, about a quarter of our 1500 engineers were building the infrastructure common to any SaaS company—e.g. compute runtime, developer tools, and application frameworks. Further, not only were we investing a large portion of engineering time on seemingly undifferentiated systems, but the application engineering teams also spent inordinate amounts of time in service of infrastructure-driven initiatives such as new regions and platform migrations.
When I left Datadog, I focused on this problem with a wider lens when I co-authored a book on Platform Engineering. Across the industry we're all suffering the same problems: Security patches that should take hours consume weeks of engineering time; version upgrades that should be automatic become multi-quarter projects; routine infrastructure changes are only accomplished after organization-wide fire drills. All the while, we are making substantial investments in infrastructure and platforms that at best provide minimal relief and at worst compound the problem by adding costly migrations to the burdens of application teams.
In trying to understand the root cause of the problem, my co-author and I found that everyone’s infrastructure architectures looked somewhat like this:
In the book we call this state the "over-general swamp" — where every application team maintains their own slightly different version of infrastructure integration code, configuration files, and automation scripts—what we call “the glue”. The glue is easy to create but costly to maintain, and the cost isn't just in the DevOps/SRE engineers hired to maintain it — it's in delayed product features, recurring reliability issues, security vulnerabilities, and lost market opportunities.
Architecturally the solution seems simple - platforms to consolidate and abstract the glue. However platform teams are struggling, making progress through massive projects that need large migrations, where it seems few outside the FAANGs are big enough to make the investments to succeed in a timely manner. Everyone else is stuck with the choice between no platforms or half finished platforms, so the best that can be done is spend more on observability and portals to help comprehend the result.
Why Are We Stuck?
In my view, we are stuck with the above architecture because the platform/infrastructure building blocks that come out of vendors and open source communities target one of two distinct groups to the exclusion of the other:
- Developers: End to end solutions like Heroku that emphasize ease of development experience
- DevOps/SREs: Building blocks like Kubernetes that emphasize simplicity in the system sense — that is, offering a lot of composability and control
This dichotomy isn't about IaaS versus PaaS. Rather the industry has created two isolated ecosystems of building blocks and left teams to glue them together. So the pattern is::
- Infrastructure teams supply the developer-unfriendly building blocks as their “paved paths”.
- Frustrated application developer teams give up their developer-first solutions and glue the infra team’s building blocks together
- Companies try to hire DevOps/SREs, but there is never enough to maintain each application’s glue.
- A few years later platform teams are formed to clean up the mess.
No wonder at that point why the platform teams resort to massive migration projects. But also no wonder why those massive projects drag on for years, and more often than not fail. Getting out of this swamp means realizing that you need more internal platforms, sooner. But this also needs the industry to realize that over a platform’s full lifetime, its owners are likely to be frustrated application developers just as much as infrastructure operators, and so we need building blocks that cater to both.
Platform Builders are Developers
In the last 15 years for the DevOps/SRE-focused solutions, the pendulum has swung heavily toward composability-first APIs, in particular these two patterns:
- Declarative configuration - having developers describe desired state in lowest common denominator formats like YAML and JSON
- Reconciliation loops - that autonomously resolve actual vs desired configuration state
While these have their place for managing true infrastructure, they get overused vs other patterns of API. To empower developers of platforms, they need building blocks that empower development, with:
- First class support for their preferred programming languages: Engineers should be able to build and extend platforms using the languages they already know and trust
- Integration with documentation and IDEs: Platform components should feel like natural extensions of developers' existing tools, with full IDE support, type safety, and inline documentation
- Familiar development-test workflows: Developers shouldn't need to context-switch between local development and platform-specific testing environments to iterate on their work
Consider Stripe: they've built a system handling millions of financial transactions daily, which in many ways are more complex than compute orchestration. Rather than expose that complexity to application developers as config and reconciliation machines, they instead provide intuitive, imperative, code-first APIs that abstract as much underlying complexity as they can.
Creating platform solutions like this may mean that debugging system state seems not quite as easy as when the only API is configuration, expressed in flat files, where action occurs when they are checked into source control. But most people’s experience with Kubernetes, Terraform and Gitops is that they are not exactly easy to debug, either.
Platform Builders are Operators
Focusing solely on ease of developer experience has the risk of creating "walled gardens", repeating the mistakes that doomed platforms like Heroku. Such solutions often bill themselves as making things “simple”, but instead they implement a naive form of making things easy, by not just reducing choice but completely neglecting composability and control. This means these solutions work beautifully for application teams seeking to quickly build applications using cutting-edge technologies in a single domain, but any type of integration is a challenge, and becomes an increasing burden as their coupled ecosystem ages in isolation.
There is a reason why Kubernetes, with all its well documented flaws, has succeeded despite them. Its creators thought deeply about how to support a diverse range of applications and configurations, and how to make managing that easier for the DevOps and SRE engineers who specialize in managing system complexity. This means giving them the knobs and integration points not to just tune the system, but take control of non-trivial facets based on the specific needs of their business and technical ecosystems.
Thus, if we really want “developer first” needs to balance “DevOps/SRE first” ones, we need fewer walled gardens and more building blocks. These will balance easy development with system composability, giving controls for reliability, security and efficiency to meet the broader architecture’s needs.
Where Junction Labs Comes In
Ben and I founded Junction Labs knowing we wanted to tackle this problem as it has been so prevalent and expensive at both of our last companies, and if anything things looked worse after the migration to Kubernetes than they looked before. As we started investigating options, our focus quickly narrowed to service-to-service networking. The traditional solution of DNS is simple, but is showing its age. Modern infrastructure approaches like Service Mesh and gRPC have tended to be both too hard for developers to directly use, but also too complex for DevOps/SREs to manage at scale. Finally on the developer-first side there are any number of “serverless” platforms, but they constantly struggle to succeed outside a niche and integrate into broader application types and architectures.
Decomposing software through services is absolutely critical to scaled architectures and to scaled product engineering organizations. Yet the reason we continue to have the “microservice or monolith” discussion is because the industry has been so poorly served by the building blocks available in this space. To do better, we need new platform building blocks, and they need to balance easy development with operational control.
Interested in Hearing More?
If you're a platform engineer tired of stitching together building blocks that are neither simple nor easy, or a technical leader looking to reduce your organization's infrastructure complexity, we'd love to hear from you. We'll be publishing technical deep-dives on specific platform engineering challenges and our approaches to solving them.
Subscribe to the Junction Blog
New blogs, product updates, events and more