Friday, July 24, 2020

Technical Debt, Target States and Local Maxima

There’s usually a tendency to select least-cost solutions incrementally, which typically means tweaking legacy estate rather than doing anything more transformative. While this does minimize delivery risk within individual projects, it can shift risk onto the institution’s future and allows it to cumulate until the “burning platform” moment requiring drastic, large-scale measures with no guarantee of success.

There needs to be a better way to weigh the pros and cons of solution options, balancing the short-term and the long term, the tactical and the strategic.

Technical Debt (TD) is a good framework to help determine relative merits of different solution options, if it is defined as the spend not accruing to the strategic target state of the assets being changed.

I like a geometric explanation:

C is the Current State
T is the Target State – the true, uncompromising target one should aim for long-term – 3-5 year horizon 
r is the funding available, circumscribing potential solution space – anything implementable has to lie within the circle of radius r around the current state 

Solutions 1 & 2 have costs r1 and r2 and while 2 is cheaper (r2 < r1), they have radically different Tech Debt:
s1 is the strategically-aligned portion of 1, and its Tech Debt TD1 = (r1 – s1) - regrettable spend that does not accrue to the strategic target. 


It’s clear that s2 (being the projection of r2 onto the line C-T) is actually negative, so solution 2 is a net detractor from T


Minimizing Tech Debt simply means maximizing alignment of next state solution with long-term target state.

The reason why target state needs to be uncompromising is so that it can serve as a North Star to judge alignment against, or an etalon to weigh options by. Even if it’s not achievable for some years, it provides an important reference point.

This bit can be controversial, but it can be explained in terms of avoiding being locked into local maxima. When target state horizon is confined to the vicinity of the current state, one excludes significantly better but somewhat more distant possibilities:

Local Maxima diagram by Joshua Porter


What’s interesting is that typical corporate planning & funding horizons, which are rarely longer than a year, can be a very real limiting factor in being able to “see” far enough.

Tech Debt measures bases on the true, long-term target state, allow one to compensate for short-termism of planning horizons. It is therefore responsibility of architecture to not be bound by scope or timelines of individual projects, but look further ahead and provide a coherent, long-term vision that is aligned to technology strategy.

Tuesday, January 28, 2020

Agile & Bridge Building

Sometimes there’s persistent confusion on the relationship between architecture and agile delivery.

If you’re building anything more complex than a website or a mobile app with a cloud backend for the front end (BFF), then you need to identify and decide upfront things that are going to be hard to change later. But this doesn’t mean you have to figure out every detail in advance. Architecture in agile is very much about figuring out and deciding the key things that are hard to change later – and these vary case by case.

A good analogy would be in building bridges: you have to decide what type of a bridge it will be, e.g. a suspension bridge:

Manhattan Bridge Construction, Wikimedia Commons»


Or a movable bascule bridge:

© Alex Fedorov, Wikimedia Commons


The type of the bridge you’re building has to be figured out in advance since you can’t easily change one into the other mid-construction. It is a function of core constraints and requirements – the span, the height of the ships you want to be able to pass underneath it, etc.

However, what colour to paint the bridge or how to decorate it is a very flexible choice – no need to debate it upfront as you can change it at any time later (well, ideally before you bought a million dollars’ worth of paint).

Monday, January 13, 2020

Distributed Systems: Orchestration, Invocation, Events


A very typical concern in distributed systems is the need to coordinate logic in different domains / bounded contexts. Let’s say we have two domains, A and B. When A performs a bit of logic ‘a’, B needs to perform corresponding logic ‘b’. There are generally three ways to accomplish this:

1. Create an orchestrator:


Pros:


  • A and B remain simple and agnostic of each other.

Cons:


  • Orchestrator absorbs the complexity of knowing about both A and B, interfacing with APIs exposed by A and B, implementing rollback and/or retry logic to mitigate failures. 
  • If either A or B change, the Orchestrator has to accommodate those changes. 
  • If we want to add another version of B (let’s call it B′) or add another participant C, we have to modify the Orchestrator to deal with all such changes. 
  • Orchestrator is in danger of becoming a god service — one that has to know everything about everything and that has to stay in sync with changes throughout the system. 
  • This is a path to a distributed monolith.

2. Direct invocation:


Pros:


  • Simplicity.

Cons:


  • Tight coupling between A and B: A now has to understand the logic of B and interface with its APIs. If B changes, A might have to change. 
  • This effectively forces A to be the orchestrator, violating the single responsibility principle. 
  • A becomes stateful, holding state until the invocation of B successfully completes or handling failures. 
  • While simple with just A and B, in the context of dozens or hundreds of services, point to point invocations quickly become difficult to manage, highly dependent on discovery and topology, etc. 


3.1. Event-Driven Architecture (Uncompensated): 


Pros: 


  • A and B completely agnostic of each other 
  • Loose coupling, allowing adding new publishers and subscribers without having to modify existing actors 
  • With event sourcing based on appropriate message transport (e.g. Kafka), ability to replay/re-process events 

Cons: 


  • Assumes reliability of message transport or ability by subscribers to compensate 


3.2. Event-Driven Architecture (with Observers): 


Pros: 


  • A and B completely agnostic of each other 
  • Loose coupling, allowing adding new publishers and subscribers without having to modify existing actors 
  • With event sourcing based on appropriate message transport (e.g. Kafka), ability to replay/re-process events 
  • Observer processes can be simple, only looking to match “done ‘a’” with “done ‘b’” without knowledge of details of ‘a’ and ‘b’, and flagging missing correlates 

Cons: 


  • Requires a bit of extra sophistication to implement 


Conclusion:

Option 3.2 is clearly the best pattern, even if it requires familiarity with implementation of this pattern. I’d argue that if a team or org are embarking on building distributed systems, it should invest in building the requisite expertise to do it right. 

Luckily, implementations of this core pattern are increasingly available in the open-source world, without the need to build such plumbing from scratch.

Sunday, December 23, 2018

Hot Topic: Multi-Cloud

Large enterprises and regulated global institutions are increasingly tackling the subject of utilizing multiple cloud providers - AWS, Microsoft Azure, Google Cloud Platform. This is a complex area with many competing positions and interpretations.


Why Multi-Cloud?


As investment into cloud-native implementations is ramping up, many large enterprises are resolving to tackle the question of multi-cloud adoption. Key concerns typically noted:
  1. Avoiding concentration risk - i.e. ability to redistribute workloads across providers, initially or rebalancing in the future, meeting prudential regulators’ concerns around the dependence of our systems on a single provider
  2. Retaining ability to utilize unique innovations from competing providers without migration that is too costly
  3. Retaining possibility of commercial leverage and arbitrage for commodity services
  4. Avoiding single vendor lock-in
Notably, achieving higher levels of resilience is not typically among these, as utilizing multiple availability zones and regions can in many geographies achieve availability targets without multi-cloud.

Dimensions


The biggest challenge in multi-cloud is avoiding having to descend to the lowest common denominator (VMs or IaaS-only) and losing the benefits of higher-level PaaS services (cloud databases, object stores, functions, event queues, etc.)

Still, it's worth outlining different angles from which multi-cloud can be approached:
  1. Code-level portability - avoiding having to rewrite substantial amount of business logic or data transformation code in order to enable it to run in alternate cloud provider infrastructure
  2. IaaS portability - deployment and run concerns, from utilizing provider-independent images and container configurations to networking (e.g. BYOIP), load-balancing and block storage 
  3. Control plane portability - the hardest of them all. Notably includes monitoring/APM, alerts and end-to-end tracing, security & access management

A Measured Approach


I believe that code-level portability is the sweet spot for multi-cloud approach for new services and applications.

IaaS-only portability, e.g. forklifting whole VMs into cloud and eschewing the use of native/PaaS services cloud providers offer is fine for moving legacy workloads off-premise without refactoring (if you must), or for packaged apps where one doesn’t maintain source code and therefore has no control over code portability. However, for custom-developed, new code applications this approach would significantly reduce the value of the cloud proposition.

Control plane portability is today still quite difficult. Some of it can certainly be accomplished by utilizing common toolsets - e.g. OpenAPM, ELK, etc. across providers, but truly seamless control plane across multiple cloud providers is still either too difficult or too costly to achieve. Security and access management models vary across providers with few exceptions, e.g. BYOK has potential to broker root keys across cloud providers.

Code-level portability should be achievable more readily: after all, most PaaS dependencies - object stores, DBaaS, event queues and even functions/FaaS are being increasingly commoditized. It's possible to focus on common API subsets that eschew individual vendor tie-ins yet retain most of the power those native PaaS services offer. After all, why should the business or data transformation logic know whether they're publishing an event to Kafka, AWS Kinesis, Azure Event Hubs or Google Pub/Sub?

Even if common API subsets prove elusive in some cases, it should be possible to isolate dependencies into vendor-specific libraries/adapters, which can be shared across engineering teams. Dependency-injection and vendor-specific configurations added within build pipelines are another viable approach.

Historical Parallels


This in some ways parallels the path our industry travelled with respect to databases - from fully embracing vendor-specific features by writing business logic in stored procedures (with many good arguments marshalled in its favour) to common SQL standard and DB abstractions such as ORM that made cross-DB migrations much more feasible.

I fully expect multi-cloud to be an increasingly active area in open-source and open-core world with focus on common API abstraction for commodity PaaS.

Saturday, November 24, 2018

It’s time to build the bank of the future

Original article published on NAB Tech Blog

How can large retail banks tackle the monumental challenge of modernising core systems architecture without sacrificing their role as economic guardians?


When automatic teller machines (ATMs) started appearing in the late 1960s, there was a sense that we’d taken our first real step into the future of electronic banking.

As people discovered the freedom to access cash at all hours on demand and the convenience of card payments began to gradually replace cash, a golden era of mainframe computing helped banks manage the rising volume of financial transactions.

Jump forward to 2018 and most of our dollars are now dispensed digitally (through shopfronts real and virtual), as trips to the ATM rapidly become a thing of the past.

As banks, we’ve successfully evolved our systems and technology to meet customers’ fundamental needs, and they now feel very comfortable with the digital representation of money. But customer expectations have evolved far beyond these basics.

The next revolution


The truth is that many banks are only just waking up to the next revolution in their sector, and I predict that developments in coming years could well be just as disruptive as the rise of electronic banking.

Some drivers of this are evident in the rise of fintech start-ups developing innovative financial tools and services increasingly relying on real-time data. Of course, examples of cross-subsidization and regulatory arbitrage can be readily found as well in the fintech sphere.

Many fintechs do not attempt to replace the core services traditional banks provide, but augment their value proposition. These can only succeed, however, if bank platforms are open enough to allow API-based integration, event-based pub/sub data exchange and other modern approaches to creating flexible, extensible ecosystems.

The question is, how do system architects (who’ve focused so much of their work on the protection of customers’ financial assets) now set about opening these systems up to the wider world?

The scale of the task is significant.

The foundation of a lot of today’s banking architecture is built on a giant network of tightly coupled legacy systems, each purpose-built for very specific business needs, with layers of bespoke integration and security, risk management and control implementations embedded throughout them.

Many of those systems took years to evaluate and implement, but now the race is on to replace them with far more open, flexible and agile ecosystems — all while retaining the robustness and security needed to beat ever-more-sophisticated financial crime and fraud.

Defending our customers’ assets is a massive job in itself — and not one we can afford to fail. So, how do we transform our banking systems without sacrificing our role as economic guardians?

A robust, coherent and ambitious architectural vision is crucial in the transformation.

Thinking big


Modern architectural approaches are not new but sometimes take surprising time to become well-established within our industry as best practices. Services replacing monolithic applications, improving loose coupling via event pub/sub and avoiding distributed monoliths, employing eventual consistency where appropriate to improve availability and resilience to partial failures, reactive patterns, establishing coherent API ecosystems are hardly new but require experience to get right and avoid pitfalls. Increasingly, CQRS, event sourcing and real-time stream processing provide powerful ways to move the needle still further.

Systems, software and enterprise architecture should be about how internals of the systems work, translating into how resilient, performant and scalable they are, not just what functional features they provide. It’s very much about what’s under the hood. However, in many institutions the practice has been focused on buying and integrating technologies, systems and applications rather than designing and building them.

Starting with a coherent strategy and set of architectural principles aiming to maximize flexibility, maintain resilience and security, yet accommodate increasing velocity of change and easier experimentation the business demands at the edge, is key. These principles are translated into a concise set of patterns, technology choices and reference implementations providing reliable guardrails to individual delivery teams aiming to maximize their velocity.

We should evaluate all systems — existing or new — according to these key principles and patterns, so none becomes an obstacle in achieving a flexible technology ecosystem.

Anything blocking or impeding movement toward our agreed target states definitionally becomes technical debt, to be managed proactively. Applied consistently, it also gives us a model to phase out legacy estates where it’s an obstacle to our goals.

People power


As is true for any organization — whether a startup or a large firm — the people we choose to drive our technological journey are as important as the architecture that supports it.

Financial services are experiencing a truly seismic shift, as we seek to solve some fundamentally complex problems with innovative ideas and best-of-breed tools.

And technology as a field is continuing to undergo rapid evolution, offering us new tools and choices, whether it be stream processing underpinning the transition from Big Data to Fast Data, machine learning utilizing GPU compute, or quantum computing promising solutions to previously computationally intractable problems at a price of upsetting the status quo in cryptography. It is a great time to be a practitioner in our field.

Distilling the right set of architectural principles and patterns and picking our technology building blocks wisely, tying these together into a coherent architectural vision in our complex, fast-evolving ecosystem — where failure simply isn’t an option — represents an exciting career challenge indeed.

If you’re interested in joining us, you can learn more about working in technology at NAB here.

Saturday, June 24, 2017

Cargo Cult Agile

It is instructive to consider human tendency towards magical thinking that finds a wonderfully didactic expression in Cargo Cults.

Magical thinking fundamentally expresses faith that performance of certain rituals alters the status of the actor and can confer benefits that are not correlated with the substance of the ritual itself.
It is not difficult to find many examples where application of Agile is an exercise in ritualistically following the form rather than focusing on substance.

The original intent of Agile was to redress the balance of power between managers/stakeholders and engineers, affording more power to the latter, and achieving better results & productivity through their empowerment.

But it is not uncommon today to find instances where Agile became co-opted as just another mechanism of control, without altering the power equation, and rests on what amounts to a magical belief that by performing formal ceremonies and rituals of Agile, the benefits will accrue even if such performance is entirely divorced from its underlying rationale.

The very same powers that demanded of engineers compliance with rigors of waterfall-style development, simply switched the rulebook and now demand compliance with the rigors of Agile. Such hollowed-out Agile practice remarkably resembles the cargo cults I referred to at the start.

Sunday, April 2, 2017

On Well-Designed Event Models


I propose that events composing a well-designed event model should be of a form:


Where:
  • EAix→y is the event communicating transition of aggregate/entity Ai  from business state Sx to business state Sy
  • Ai is the i-th instance of aggregate or entity A consisting of attributes {α, β... ω} 
  • State transition (S→ Sy) is accompanied by changes in a subset of attributes of Ai {α, β... μ} to new values {α', β'... μ'}, these new values (if required, with previous values {α, β... μ}) are included in the data payload of the event EAix→y{α', β'... μ'}

This approach is in contrast with a trivial event model where events are simply attribute-level changes, e.g. instead of emitting EAix→y{α', β'... μ'}, emit the following series:


Ergo, developing a business state-centric event model requires starting with business state transition diagram for entity/aggregate A:


The benefit of business state-centric event model is to make explicit the state transitions (which typically involve changes to more than one attribute), instead of those having to be inferred/reconstructed by the consumer of attribute level events EαAi {α'} = Ai {α'}.

Saturday, January 21, 2017

The Why of Explainable AI

AI is all the rage today. And ARPA is investing into explainable AI - why?

I believe the key for us, humans, is to be able to have recourse to another human who, we viscerally feel, may understand our situation and have empathy, based on shared experience and understanding of the world and how we - mortal, fallible humans - interact with it.

So no matter how sophisticated the AI, will we ever trust it to understand us enough that we will not wish to have recourse to the judgement of another human with power to overrule the "Computer says NO"?

But, how would a human "appellate judge" be able to evaluate AI's decisions if they're a mystery concealed within the deep neural network, acting on more data than a human can readily deal with?

Hence the need for the explainable AI.

I will leave it as an exercise to the reader to infer what types of use cases ARPA might be interested in, where robotic judgement might need to be subject to human appeal.

Saturday, September 10, 2016

Fast Data > Big Data

What is the next step after Big Data? Creating a fast lane for generation and utilization of insights from business data, increasing customer relevance and reducing irrelevant noise. Some call it Fast Data. What are the ingredients?

  • Ability to source data near-real-time from transactional/OLTP systems either by tailing logs or consuming events 
  • Augmenting traditional ETL data pipes with stream processing 
  • Enabling greater model development agility and optimization via machine learning 
  • Ability to feed resulting insights (e.g. segmentation changes) back to OLTP systems supporting customer interactions

Saturday, September 3, 2016

Rationality of Rules vs. Rationality of Ends

"The problem of bureaucracy is that it values the rationality of rules over the rationality of ends."
– Matthew Taylor (with echoes of Max Weber) 

This can serve as a nice litmus test for when technology governance is in danger of devolving into a bureaucracy. This is why it’s so important to focus on tangible value in areas such as governance and technology architecture.

Wednesday, August 3, 2016

CQRS – The Sweet Spot

Experience teaches all of us that in technology there’s no panacea. Every tool has its uses and, conversely, can be misused. To really understand a piece of technology one needs to understand its design intent, what problem its creators were trying to solve, and what trade-offs they selected.

It is a truism that every engineering decision involves trade-offs – selecting certain strengths at the price of accepting certain limitations (not just in software/systems engineering, but in all engineering disciplines) at a given level of investment. Thus, every instrument has a sweet spot for its application.

CQRS is an interesting example to consider. It builds on the strengths of a more vanilla version of EDA (event-driven architecture) – uncoupling domains from each other – and adds asymmetric read/write models and event sourcing.

By enabling updates at highly granular, attribute level, coupled with arbitrarily denormalized aggregate reads, CQRS creates opportunities for independently scaling the read and write/update stores – much more so than adding read replicas to traditional, symmetric-model domain implementations.

These strengths come at a certain price of additional complexity – managing highly granular even models, ensuring interoperability with non-CQRS-based domains, dealing with compacting of logs in event sourcing, etc.

It would seem that domains that would benefit most from this approach are those where multiple actors are attempting simultaneous updates to varying attributes of the same aggregate – i.e. domains demanding high-concurrency writes.

Friday, May 13, 2016

Big Data and Magical Thinking

Every business is drowning in data. Every business believes they should make better use of that data. Seemingly a long time ago, Big Data came onto the scene as The Answer. It became a buzzword and a cottage industry. In some places, it simply became a synonym for Hadoop.

The challenge is that simply having more data, or combining all of the business’ data into a common pool or ‘lake’ isn’t by itself going to unlock insights, as if by magic.

Rigor is required in managing the data sources and the meaning of various data elements, and equally rigor is required in applying proper mathematical techniques in analysis of the data and avoidance of misleading conclusions.

Things like curse of dimensionality (applicable to sampling and anomaly detection among other things), misuse of p-values, and implicit assumptions about the shape of probability distribution come to mind as some of the most common omissions.

Wednesday, January 6, 2016

Clarity As Managerial Value Added

It is normally implicitly understood that people at more senior managerial or executive levels have to operate at (and be comfortable with) progressively higher levels of uncertainty. Often, one has to make choices being cognizant of a superposition of multiple future states that is yet to be resolved into one or the other (not unlike a quantum one).

It is also true that key role of management is to enable people to be as productive as possible. Ergo, the manager’s job has to include a meaningful reduction in uncertainty that’s being passed through to the subordinates.

Sunday, October 11, 2015

Barclaycard, London

After 5 years at PayPal, most recently serving as Chief Architect for PayPal Credit, I decided to move to London, having accepted the role of CTO with Barclaycard.

You might wonder why a techie with startup background would leave Silicon Valley to work at Britain's oldest bank.

The reason is simple: financial industry as a whole and consumer financial services in particular are undergoing a transformation, some might say a seismic shift, which is significantly technology-driven.

A wave of change disrupting the status quo presents a great opportunity for innovation.

I'm tremendously excited to join Barclaycard technology team.

Canary Wharf Skyline 1, London UK - Oct 2012
Photo By Diliff (Own work) [CC BY-SA 3.0 or GFDL], via Wikimedia Commons

Tuesday, July 7, 2015

Investing in platforms

'Platform' is a word that one hears a lot nowadays in our industry, and it sometimes suffers from over-use. Here are a few thoughts on building platforms:

1. The art of being a platform is letting someone else be the product. I.e. platforms should underlie products, enabling diversity, not attempting themselves to be all things to all people.

2. Platforms are collections of capabilities, not baskets of features. Capabilities imply higher level of abstraction, enabling development of diverse feature sets without defining them exhaustively.

3. Platform business is hard. End users interface with products, not platforms. Most businesses compete on stickiness and profitability of their customer relationships. Catering to end users gives one greater scale, and so many businesses that begin by declaring themselves platforms eventually become very keen to own end-user relationships under their own brand, rather than letting third party product developers own them. Evolution of the Apple platform is a good case in point.

4a. For a product-oriented company of sufficient scale, factoring out underlying platform/capability set is key to scaling across global markets with their diverse requirements. Without a good platform foundation, feature flow demanded by differentiated global markets becomes more expensive, there's duplication of effort (and code), gradual accumulation of tangled dependencies and increase in brittleness.

4b. Exactly the same concern arises for SaaS companies that develop tailored solutions for anchor tenants of the platform and need to migrate to true multi-tenant platform.

5. However, a typical product manager is not incentivized to be platform-minded. Most corporate incentive structures are targeting delivery of specific, discrete, value-added features, not investment in broader capabilities that may yield dividends over the longer arc of time.

A humble proposal: 

For every investment, there should be an explicitly factored out portion that genuinely accrues to platform building (with the rest going towards feature flow). If it's 0% than at least it's explicitly 0%, not ambiguously implying platform investment or enablement in their absence.

Sometimes, making implied things explicit is all one needs to reach agreements or effect change.

Sunday, April 19, 2015

Factoring and pipelining complex projects

We're all familiar with the concept of minimum viable product.

In fintech, those minimum requirements include a lot of regulatory compliance, legal and other complex concerns that can have significant ramifications on how every product feature is supposed to work. Throw in requirements from key partners and customers, and one can easily end up with an MVP with a large scope, and a lot of uncertainty and interdependencies among requirements/stories.

It's very easy then to start spending a lot of time reconciling it all together in attempt to arrive at a coherent picture all teams can align on in detail.

Finally, large investments are typically in the opposite of a skunk works/'fly under the radar until you've got something good to show' situation. Large investments imply commitments - in time as well as revenue. As in - in which quarter can it be delivered/ramped/booked. So for very good business reasons one ends up with a defined budget & timeline on one hand and a quivering mass of stories on the other. What to do?

The common mistake is to attempt to mitigate uncertainty by figuring it all out upfront in sufficient detail. Then everyone is spending most of their effort drawing and redrawing the complete picture and overall progress is very slow.

And, of course, every requirement/story has a product owner behind it, and no PO worth their salt will say 'uh, you know - the thing I'm responsible for - don't worry about it for now'.


Factoring:

In my experience, key has been to factor the entire whole into subgroups of mutually influencing requirements but with fewer fine-grained dependencies between these subgroups. Architecture can serve in leading or analytical capacity for this factoring.

Then decide by fiat which subgroup goes first and start cranking away at it. Yes, without other subgroups the 'MVP' won't see the light of day, and attacking other subgroups later doesn't mean they're less important (as their POs will be worrying about). But one has to start somewhere.

Pipelining:

And then, guess what - while subgroup 1 is being delivered, [C]POs, architects, dev leads and others with responsibilities beyond a single agile team or sprint can free up time to focus on subgroup 2, 3, n, n+1... creating a pipeline.

Seemingly simple, but requires persuading passionate people that being in n-th place in delivery pipeline doesn't imply being in the same place in terms of importance.

Saturday, November 8, 2014

Microservices Architecture - Finding The Sweet Spot

Microservices architecture has been more in the spotlight lately, and with good reason.

As I wrote before in  Loose coupling and integration - the pendulum swings, splitting execution into multiple processes requiring remote invocation isn't free however.

In my recent discussions with many practitioners, I noticed that microservices architecture is typically put forth in opposition to a more coarse-grained approach where multiple domains  are lumped together, sometimes resulting in a 'God service' or 'everything service' anti-pattern.

However, it's dangerous to posit that more services is always better and create incentives to which teams and managers might respond - e.g. how many services did your team create last quarter? Nice, but that other team created twice as many!

So how to find the sweet spot in this spectrum? My answer is not to lump conceptually unrelated domains together. Another rule of thumb would be to look at persistence domains - if data/persistence architecture segregates domains properly, they should be exposed via separate service interfaces.

Fundamentally, services are simply public interfaces for domains. Each domain should have at least one. But creating too much fragmentation within each domain, exposing separate services for interrelated concerns shifts burden onto clients in terms of discovery, orchestration and invocation latency.

Ultimately, the goal of service interface design should be to allow clients to reason about the domain the service represents and integrate with it as simply and efficiently as possible.

Martin Fowler provides an  excellent write-up on microservices here.

Friday, May 2, 2014

Emergent software design?


Software seems to be the only area where we often try to combine designing a system with building it. Or declare that one should just start building it and the design will emerge.

Perhaps this is because in the material world, limitations of physics, materials properties and manufacturing processes are intuitively understood by non-specialists.

Building a house is a good example: a house is a composite of multiple subsystems: structural, electrical, plumbing, HVAC and so on. These subsystems won't just evolve independently to end up well-enmeshed in a coherent whole, and no future homeowner would assume such a thing and tell their contractors to just start building the house and not worry about blueprints.

Yet software has very real limitations in terms of complexity, physical resource utilization efficiency and so on, but these are not intuitively obvious to non-technologists. It is very easy to build something that appears functionally correct but is crippled in terms of scalability, maintainability, scalability, stability and so on.

So while not every detail has to be figured out and specified in advance, overall structure/architecture and how different subsystems should interact does need some forethought and definition.

Sunday, October 13, 2013

Startup pivots - technology angle

This is what I've seen throughout my career as a typical [successful] startup lifecycle:

1. Build a product really fast - focus on experience, hopefully decent app code, basic persistence abstraction, monolithic database to keep things fast & simple.

2. If the product takes off, rapidly add features, variations for markets/segments, etc. Code grows more complex & tangled, resulting in greater brittleness and more bugs, database backend shows signs of strain.

3. Increase in number of users leads to business placing greater value on  availability and scalability, as well as performance, graceful degradation and similar matters.

Depending on the length of the the ramp up between stages 2 & 3, and the quality of tech design decisions at stage 1 despite pressure to get things out the door as fast as possible, techies at this point will be looking at a pile of technical debt sized somewhere between a molehill and a mountain.

What to do? Business still requires new features and capabilities to be added at the rapid clip. The more you build on the old without investing into refactoring or rewriting, the more the technical debt compounds (just like financial debt), increasing the tension between improving the internals vs the user-facing features.

Hopefully, as some money is now being made, reinvestment in improving technology platform is possible: break up the monolithic database (hopefully not too much logic is in the DB layer - stored procedures, etc.), try to break up increasingly tangled codebase, attempt to separate operations demanding synchronous, strong consistency guarantees from those that can be intermediated by async events. Words like SOA and EDA start to bounce around. If the business demands availability levels that require multiple datacenters, CAP theorem comes into play, eventual consistency strategies become critical, etc., etc.

In the meantime, some of the self-made components & modules may now have equivalent (or better) counterparts available as open-source or commercial software.

And it's at this point that investment in architecture, with focus on decomposition and encapsulation, as well as build/buy analysis unbiased by who wrote original in-house modules, becomes critical.

One often hears the term 'pivot' in startup parlance these days, usually referring to change in business model or product vision. I think the above illustrates that technology-wise, startups also have to execute one or more pivots to avoid being bound by the initial technical decisions that, just like business and product decisions, can rarely support the business forever.

Should we perhaps place greater emphasis on technology pivots? I think so.

Saturday, September 14, 2013

iPhone as a platform

Following Apple's announcement of iPhones 5C and especially 5S, there's been much discussion about the merits of their various features and components.

Some have regretted that Apple didn't come out with any new 'whiz bang' features. Others questioned M7 motion co-processor in 5S or outright called it a gimmick.

I think such arguments reveal a superficial understanding of Apple's strategy. A few years back, everyone in the Internet industry couldn't stop talking about 'platforms'. Unsurprisingly, despite the hype, few companies actually figured out how to make money on their 'platform'.

Now, I think Apple has figured out how to both create a hardware+software platform AND make money on it - this is the entire iOS ecosystem.

So when Apple introduces new capability to its platform - whether it's 64-bit A7, or low-power-consumption M7 - it's not about what those things do by themselves, it's what they enable millions of developers to do. That's what happened with adding accelerometer, compass and other capabilities in the past.

Finally, from my own experience with various apps that attempt to track your activity (like the number of steps you've made) throughout the day, they consume way too much power to run all the time, reducing battery life unacceptably. M7 should be an awesome soltuion for that. Also, most avid joggers I know use their iPhones as fitness trackers and media players in one and are thrilled about the prospect of not having to recharge their iPhones more often.