Friday, July 24, 2020

Technical Debt, Target States and Local Maxima

There’s usually a tendency to select least-cost solutions incrementally, which typically means tweaking legacy estate rather than doing anything more transformative. While this does minimize delivery risk within individual projects, it can shift risk onto the institution’s future and allows it to cumulate until the “burning platform” moment requiring drastic, large-scale measures with no guarantee of success.

There needs to be a better way to weigh the pros and cons of solution options, balancing the short-term and the long term, the tactical and the strategic.

Technical Debt (TD) is a good framework to help determine relative merits of different solution options, if it is defined as the spend not accruing to the strategic target state of the assets being changed.

I like a geometric explanation:

C is the Current State
T is the Target State – the true, uncompromising target one should aim for long-term – 3-5 year horizon 
r is the funding available, circumscribing potential solution space – anything implementable has to lie within the circle of radius r around the current state 

Solutions 1 & 2 have costs r1 and r2 and while 2 is cheaper (r2 < r1), they have radically different Tech Debt:
s1 is the strategically-aligned portion of 1, and its Tech Debt TD1 = (r1 – s1) - regrettable spend that does not accrue to the strategic target. 


It’s clear that s2 (being the projection of r2 onto the line C-T) is actually negative, so solution 2 is a net detractor from T


Minimizing Tech Debt simply means maximizing alignment of next state solution with long-term target state.

The reason why target state needs to be uncompromising is so that it can serve as a North Star to judge alignment against, or an etalon to weigh options by. Even if it’s not achievable for some years, it provides an important reference point.

This bit can be controversial, but it can be explained in terms of avoiding being locked into local maxima. When target state horizon is confined to the vicinity of the current state, one excludes significantly better but somewhat more distant possibilities:

Local Maxima diagram by Joshua Porter


What’s interesting is that typical corporate planning & funding horizons, which are rarely longer than a year, can be a very real limiting factor in being able to “see” far enough.

Tech Debt measures bases on the true, long-term target state, allow one to compensate for short-termism of planning horizons. It is therefore responsibility of architecture to not be bound by scope or timelines of individual projects, but look further ahead and provide a coherent, long-term vision that is aligned to technology strategy.

Tuesday, January 28, 2020

Agile & Bridge Building

Sometimes there’s persistent confusion on the relationship between architecture and agile delivery.

If you’re building anything more complex than a website or a mobile app with a cloud backend for the front end (BFF), then you need to identify and decide upfront things that are going to be hard to change later. But this doesn’t mean you have to figure out every detail in advance. Architecture in agile is very much about figuring out and deciding the key things that are hard to change later – and these vary case by case.

A good analogy would be in building bridges: you have to decide what type of a bridge it will be, e.g. a suspension bridge:

Manhattan Bridge Construction, Wikimedia Commons»


Or a movable bascule bridge:

© Alex Fedorov, Wikimedia Commons


The type of the bridge you’re building has to be figured out in advance since you can’t easily change one into the other mid-construction. It is a function of core constraints and requirements – the span, the height of the ships you want to be able to pass underneath it, etc.

However, what colour to paint the bridge or how to decorate it is a very flexible choice – no need to debate it upfront as you can change it at any time later (well, ideally before you bought a million dollars’ worth of paint).

Monday, January 13, 2020

Distributed Systems: Orchestration, Invocation, Events


A very typical concern in distributed systems is the need to coordinate logic in different domains / bounded contexts. Let’s say we have two domains, A and B. When A performs a bit of logic ‘a’, B needs to perform corresponding logic ‘b’. There are generally three ways to accomplish this:

1. Create an orchestrator:


Pros:


  • A and B remain simple and agnostic of each other.

Cons:


  • Orchestrator absorbs the complexity of knowing about both A and B, interfacing with APIs exposed by A and B, implementing rollback and/or retry logic to mitigate failures. 
  • If either A or B change, the Orchestrator has to accommodate those changes. 
  • If we want to add another version of B (let’s call it B′) or add another participant C, we have to modify the Orchestrator to deal with all such changes. 
  • Orchestrator is in danger of becoming a god service — one that has to know everything about everything and that has to stay in sync with changes throughout the system. 
  • This is a path to a distributed monolith.

2. Direct invocation:


Pros:


  • Simplicity.

Cons:


  • Tight coupling between A and B: A now has to understand the logic of B and interface with its APIs. If B changes, A might have to change. 
  • This effectively forces A to be the orchestrator, violating the single responsibility principle. 
  • A becomes stateful, holding state until the invocation of B successfully completes or handling failures. 
  • While simple with just A and B, in the context of dozens or hundreds of services, point to point invocations quickly become difficult to manage, highly dependent on discovery and topology, etc. 


3.1. Event-Driven Architecture (Uncompensated): 


Pros: 


  • A and B completely agnostic of each other 
  • Loose coupling, allowing adding new publishers and subscribers without having to modify existing actors 
  • With event sourcing based on appropriate message transport (e.g. Kafka), ability to replay/re-process events 

Cons: 


  • Assumes reliability of message transport or ability by subscribers to compensate 


3.2. Event-Driven Architecture (with Observers): 


Pros: 


  • A and B completely agnostic of each other 
  • Loose coupling, allowing adding new publishers and subscribers without having to modify existing actors 
  • With event sourcing based on appropriate message transport (e.g. Kafka), ability to replay/re-process events 
  • Observer processes can be simple, only looking to match “done ‘a’” with “done ‘b’” without knowledge of details of ‘a’ and ‘b’, and flagging missing correlates 

Cons: 


  • Requires a bit of extra sophistication to implement 


Conclusion:

Option 3.2 is clearly the best pattern, even if it requires familiarity with implementation of this pattern. I’d argue that if a team or org are embarking on building distributed systems, it should invest in building the requisite expertise to do it right. 

Luckily, implementations of this core pattern are increasingly available in the open-source world, without the need to build such plumbing from scratch.