The importance of change
From time to time every development team has to make some technology choices (ranging from architectural choices to choosing new tooling) – this is a direct consequence of the fast pace frameworks, tooling and especially the business needs changes.
Such change is imperative: from a business point of view, an organisation that does not incorporate new technologies would soon find that the cost per feature would increase relative to his competition. This may result in a snowball effect where the cost of errors increases making us less prone to taking new risks and sticking with the old and familiar.
As this is an important recurring problem we must ask ourselves – how do we make such a decision ?
What affects our choices ?
First, we must recognize that our industry is filled with buzzwords, appealing buzzwords. We heard that X is an amazing new technology – we want to know X and gain experience. A friend of mind called this JIOP – Job Interview Oriented Programming.
The problem is – we aren’t always aware this is the case, after all we heard that X is amazing (and maybe it is!).
Second, we need to recognize that we, as (non-rational) humans, have biases. Most notably we tend of favorite our idea over someone else’s.
Third, there might be ego in play – the “I know better” approach.
We should keep all of that in mind and see non of these are in affect of the decision making process.
How the decision making process should look like – a case study
In this section we would consider a case study from my team: we had many statefull services doing processing work and we needed to scale them. More about that here.
Change is always the result of a need for an improvement (e.g new feature, better X) and to make a good decision we should first have an end goal with a list of parameters and guidelines to consider.
In our case we stated our goal as follows: We want that k instances of a statefull service could handle k times the load of a single instance. We described the following list of metrics to consider:
- The amount of time that it takes to migrate an existing service.
- The difference in time that it takes to write a new scalable staefull service compared to the existing approach (that leads to a non-scalable service).
- Fault tolerance – the amount of traffic lost by a failure to one of the k instances.
It is important to emphasize that all these are quantitative metrics. It’s not “I think that” nor is it “I feel that” rather it is a choice we can do based on numbers and data. This is also key to prevent deadlocks and disputes.
At this stage you may ask – where is this data coming from, we can only know the exact numbers only after we are done implementing and testing. And you are right, we don’t know the exact numbers – but it is better to make decisions based on estimations than decisions based on feelings.
We than build a table with a row for every option we consider and a column for every metric.
At this stage we can make an initial decision and start with a POC (proof of concept) to verify our estimation and see if we are satisfied by the proposed solution.
If not – we can eliminate this solution. If time permits we can try more than one option.
Still – it is crucial that the final decision would be based on the agreed metrics. If within the process we believe that the set of metrics should be changed then we should update our table accordingly. A word of cation – someone that advocates that X should be choosed my be tempted to suggest a new metric that X is better at than the alternatives. If we allow such metrics to pop-in we would end up in disagreements. To prevent such a case we should agree on the conditions upon new metrics can be considered.
In our case we had a lot of existing services to upgrade so many choices such us “replace every cache with Redis” were easy to eliminate for us even though it had some strong advocates for us. It was simply not possible to endorse this solution with the given set of metrics and even though Redis has advantages such as low latency we did not agree on that metric (as it wasn’t that important for us). In a standard Advantage/Disadvantage table something like the low latency could have gone for the advantage column. The use of metrics focuses us to the relevant advantages only and is thus helping the decision making process.
I am keen to hear about some cases you had in you career – how did you make a decision and what was the outcome ?
Let me know what you think in the comments.
Until next time,