Software quality is cheap
Is it only a provocative title, or is it again another specificity of software engineering that differs from traditional fields in a counterintuitive manner?
This article will focus on the building part of software product commercialization.
Firstly, we may take in account that estimating the real price of building a software is not an easy task since it is not limited to its code development, but it encompasses a lot of other activities, mainly:
- some activities directly related to the build of the software itself like architecture, design, development, testing and integration teams.
- some activities focused on the exploitation like, IT, support (help, indemnification, ...), documentation, ...
- the management is the glue between all the people involved in the associated processes.
TinyCorp a company among others
Using a fictional chronological story about a small company building a software we will try to reproduce the naive, but most common way the problems are "solved" by the software companies.
A fictional chronological story about developing a software
- TinyCorp is a small company with a small development team who developed its software TinySoft.
- Some customers are now using TinySoft and a few of them have issues using the product, so they contact TinyCorp for help and get the bugs fixed.
- TinyCorp's developers are contacted directly and the support part take more and more time thus TinyCorp hires a support team to let the developers do their main job.
- The support team filters the requests and forwards, to the developers, all the real issues that have to be solved.
- TinyCorp has more and more customers expecting more and more features therefore the size of the developers team increases, so do the support team.
- While TinySoft becomes a big and complex software the support team complains about having too much disappointed customers and too much work for their team size. The company reacts and hires a team dedicated to test and validate the software behaviours before allowing the release of the updates to the customers.
- TinyCorp get more customers. TinySoft get bigger and more complex. New developers join the company, consequently test and support teams also get new hires. For all the services managers are promoted. And so on.
- Someday, a small corporation called OtherTinyCorp starts to sell a product that roughly has the same features but is twice cheaper (5 times cheaper during the Black friday!). TinyCorp does not understand how its competitor can sell at such a price and is in a difficult situation: its customers have started to leave TinySoft and to buy OtherTinySoft.
At this time what could do TinyCorp? Reducing the price of TinySoft could not be done without reducing the size of the company? Which team should be reduced?
- Reducing developer teams will disappoint the customers that are waiting for new features or bug fixes. Moreover, developing new features could make a difference against the competitor.
- Reducing support teams will disappoint the customers having issues and is a big risk making them leave the product. If support team becomes overloaded with work, the employees could also leave the company by themselve until no one remains.
- Reducing test teams will overload the support team and finally disappoint the customers.
Such a stalemate!
Can you see what has gone wrong? Can you see when it has gone wrong?
We all make mistakes
In fact the same error has been made multiple times, here.
Risk management
Risk management is a daily concern for everyone.
Dealing with a known risk
When knowing a problem may occur you have two main ways to manage the situation:
- You may accept it can happen and deal with the problem afterwards with curative actions
- You may ensure that the problem cannot occur by preventative actions
Some hybrid approaches consisting on reducing the probability for the problem to appear or to dwindle the effects are also possible.
Depending on the kind of problem we are talking about either solution may be preferred. Dealing with this question is called risk management2.
- Usually low risks with low financial impacts are treated curatively.
- High risks with high financial impacts are treated preventively.
- High risks with low impacts and low risks with high impacts are less clear. It is a matter of choice like any other insurance you may subscribe (Is it worth it to have collision insurance on an old car?).
The adopted strategy is often driven by the costs.
This is similar to how are bugs fixed. They've got given a priority level that depends on different criteria like the impact on the overall product, the number of touched customers and/ or the relative importance of these customers. What is problematic is that you cannot forecast what will be the bugs you will face. Fortunately, most of them will have a small impact for the customers or will be easy to diagnose and fix. However, you may see those remaining bugs through a Pareto rule1 thus 20% of them will take 80% of your time to be fixed.
Samples
You are responsible for a mountain road that gets slippery every winter nights. The ditch near the way is shallow and ensures limited injuries/ damages, nonetheless it is deep enough to block some cars that fall inside.
Sample 1, curative actions: You do nothing until a car is trapped. Then, you help to tow the car to a nearby garage.
Sample 2, preventive actions: You install safety barriers so no car can fall into the ditch.
Addressing an issue
So, you opted for the curative way, and the problem you expected to never occur suddenly arose.
Admitting you want to handle it (ensuring return to normalcy), there are, again, 2 ways to do so:
- fixing/ changing the context so the problem cannot occur anymore and/ or resolves by itself
- dealing directly with the consequences, so the problem is not solved but compensated
Both may require extra actions to manage the side effects that occurred on account of the issue (for example: fixing a process that corrupts data will allow to get the expected behaviour back, but the already corrupted data remain).
Once more, Depending on the kind of problem we are talking about either solution may be preferred.
Samples
You live not so far a river and the particularly rainy days have led to a flood. The cellar will soon be flooded and all the stuffs stocked inside will become wet.
Sample 1: curative actions on consequence: The problem is the accumulation of the water in the basement. It wets walls and stuffs. You install a pump to reject the water flowing into the basement. You leave the wet stuffs to dry afterwards.
Sample 2, curative actions on root cause: The problem is that the water can enter into the house. You build a quick dike with sandbags in front of your house.
In practice, when dealing with a root cause you may find multiple ones especially if you try to find a root cause for the root cause you examine, etc. Some of them may be easier to handle or mitigate the situation.
Sample 2, curative actions on root cause+: The problem is that water flows outside the riverbed. You may channel the river itself or focus on a weak point.
Sample 2, curative actions on root cause++: The problem is that there was a tremendous quantity of water into the river. You may construct a stormwater retention pond/ redirect the water to uninhabited area.
What happens if the pump breaks down or if you do not have enough sandbags?
No doubt you had other ideas on how to deal with this flood. These samples are here to illustrate that you may find different way to solve the same problem, especially when you try to fix it upstream. The important things to note is that the most upstream an issue is handled the more freedom/ security/ time to react you get.
What is the risk to have a bug in a software?
You probably already know it, it is high, very high in fact. It is estimated about 50 bugs per 1000 lines of code3 or more4. We easily get an idea of the annoyance of these bugs for the customers, or the planning of the developers, besides with such a rate serious issue will most likely happen.
Fixing a bug that has been released to production costs 30 times more4 than fixing it at early stage.
Some mistakes cost a lot
Let us return to the analysis of the small story about TinyCorp.
Instead of building a software without bug, by design, TinyCorp chose to hire a support team. The problem is not to get a support team itself (which can in certain cases be a good fit) but to ignore the real cause of the problem, bugs and fragile basis, and thus not investing on better foundations for the product.
What happened next? Developers were asked to fix the bugs. It seems legit. They did.
In order to be able to form an idea we will make a few assumptions:
- A developer works 8 hours a day, and each day in average:
- 10% (48 minutes) of the time is spent on other activities that are not directly linked to development or bug fixing (emails, project tracking, meetings, ...)
- 60% of the time is assigned on the development itself
- 30% (2.4 hours) of the planning is filled by bug fixing
A bit more of half of the time is granted to the main task. Are you able to deduce another statistic of it?
You got it. We deal with 30% of 100% and not 30% compared with 60% (the real proportion of development).
30% is half of 60% and so the actual time dedicated to bug fixing is 50% of the time that has been spent to develop the feature. For every 2 minutes of development 1 will be lost later for bug fixing.
In a performance perspective it is a double sentence:
- a bug fixing time ratio increase implies that the development time ratio decreases, so the project development get slower
- worse it implies the quality itself is dropping since less development time requires more bug fix time.
In order to measure the performance you may use the following formula (dev_ratio / (dev_ratio + bug_fix_ratio)) and to mesure the cost factor overhead you may use the following formula 1/performance.
The previous chart shows how quickly the cost of production of a software increases when the ratio of bug fix time increases. The lines are for 0%, 10% and 20% of extra-activities (meetings, emails, project tracking, ...).
When the factor is 1, there is no overhead, the performance is optimal. Nonetheless, you will quickly see that a 30% bug fixing time leads to about a 50% cost overhead as expected.
Keep in mind that this cost overhead is a minimum since it only takes in account the time the developer spent on its main development task (e.g. relative to the developers' salary only). It does not include other financial impacts neither the new hires for the support team nor the customers' disappointment nor the latency to reach the market with the new features that require the developers' availability to work on it...
Steps further
So far so good. The company gets customers and in order to respond to the needs in new features hires new developers and proactively support profiles. A bigger software tends to have more defects. The number of developers on a software also tends to increase the ratio of bugs. Furthermore, newcomers on a project are slow, in addition the lack of knowledge of the product may induce some extra bugs. Meanwhile, the focus factor (time concretely assigned to the main task) decreased in favor of meetings and other requirements that appear due to the increased size of the service.
What were the issues? The latency to provide the features to the customers. The overloaded developers' team due to new requirements. Is it a cause or a consequence? It results of a 50% time overhead on the development team, so this is a consequence. Again, a curative approach has been chosen and since the causes are not identified and that the company does not deal with them, the consequences will grow more serious.
Then, come the complaints of the support team who handles the disappointment of the customers. Maybe should the quality of the software be improved?
A team of testers is hired to identify the bugs before the product goes to production.
Did the company finally found the solution? Did the company address the cause of the problem?
Sadly, not exactly. Testing afterwards allows to detect some errors that occur within some conditions but not to ensure the correctness of a program thus is called negative assurance6. In addition, due to combinatorial problems, only a small subset of cases can be tested this way. It does not scale.
It will help because it limits the causes why the support team is overloaded: too many bugs. Nonetheless, bugs are the consequence of the poor quality of the software development, the root cause remains.
The time to market increased. There is a false sense of security because patching the bugs found does not help to improve the overall code. The underlying code quality is still declining so do the development productivity. Hopefully, on a customer and support perspective, the product will be seen more stable, with fewer bugs.
As time goes by, the development of the product slows down. The company hires new developers and reinforce the other teams. New managers are promoted in each of them. The consequences are above, the absolute productivity increases while the relative productivity is weakened.
In fact, in a general manner, the productivity on a project slows down with time. Without explaining the details, the bigger a software code base is, the harder to add new features.
The challenger
One day a challenger arrives on the market with very competitive prices. The company is dumbfounded. There is no way to bring the prices into line with the rival: the software production is too expensive.
How did the challenger do?
The challenger opted for software quality. What did it change?
- The rival company probably started its product a little slower (but not so much slower) on better foundations.
- Facing no (or almost no) bug, the development team was fully dedicated to add new features.
- When customers started to be interested in the new alternative the company had the budget to expand the dev team, since no need for a support team emerged yet.
- The newcomers were not numerous and were integrated faster to the team which kept a small size and maintained its productivity with a high focus factor.
- The customer portfolio were broadening.
- Since adding new teammates was not significantly degrading the software quality due to the good foundations, and the still applied good practices, the company was able to increase again the size of the team.
- The need for a support team came, but the team remained small because main of the team's work was to help customer and was not related to software bugs.
- The need for a test team happened belatedly when the company would ensure the correct functioning of what was out of the scope of the developers work (high-level tests), the team also remained small.
A few things made a difference and are the consequences of building quality software:
- better focus factor on creating value on its product at development and company level what allowed to make up the leeway
- smaller teams and fewer levels of hierarchy what improves latency and eases the processes
Building quality software structured the whole company. Fewer employees and a better focus factor allowed aggressive but bearable pricing.
As mentioned earlier, the bigger a project is, the harder to add new features. This means that the same feature added on a project of 10 000 lines or 1 million lines will not have the same cost. For the same task (out of context e.g. on a small project or on a new project), developing a feature with a better software quality is usually more expensive than a poor quality one whereas with a tough feature the reverse is true. One of the reasons why the challenging company was able to catch up its rival is that the slowness that comes with a growing project also exists for quality software, but is less steep (for numerous reasons among other things no code duplication to maintain, separation of concerns, type safety, more generic and reusable code, easier to reason about and understand, documentation, code coverage, ...).
After the kind sample, the reality
Take a sit.
We took 30% as a basis for the defects fixing ratio because it is a small and convenient number, but the reality may be very different depending on the project. In practice, the bug fixing time (even when tracked) is often largely underestimated and studies quote that 50% to more than 90%4 of the time is spent on it especially when dealing with legacy code base.
When taking the lower bound of 50%, this would mean that only 40% of the effort is really assigned to write the feature, remaining is used to make it work. Maintenance of a software is estimated to cost at least 3 times more than the cost allocated to build it. Some study even estimate the maintenance cost to 9 times the initial cost7.
What can be expected of quality software?
It depends on the methods and practices that are used. For examples, if using TDD, Microsoft experiments let us know the teams realized a significant decrease in defects, from 62% to 91%8.
Combined with an improved type system, functional programming and other best engineering practices, a quality software may lead to 0 or almost 0 bug at all. Bugs are not a fatality whatever you may have heard until now.
Feeling skeptic? Just ask yourself why some companies (like ours, but also some others you may find on the web) are prepared to fix the bugs that may occur from their developments for free? Do not be mistaken, in business, things are never free. Nevertheless, something that is not likely to happen has no cost at all. Whenever it may happen, even once a year, it is not significant enough to be a relevant financial issue.
Beyond the development
We mainly focused on the development part to get an idea of the cost overhead induced by a poor quality software. This overhead increases when number of non-developers profiles / number of developers increases. This does not mean you only should hire developers, but that you should be aware when these profiles are a good fit or not. Having a tester for 10, 5 or 3 developers rise respectively the cost by 10%, 20%, 33%, without adding any extra feature, but with only one motivation, to make it work (what is the responsibility of the developer).
We did not mention other financial impacts of poor quality software, but you may also add a lot of implied consequences for example: bad press, lawyers cost in case of serious issue, refund of disappointed customers, demotivation of the employees, difficulties to get and keep good profiles, ...
If you develop a software product, the software development service should be at the heart of your concerns.
Why this article?
I wrote this article because I am dismayed of the appalling general mediocrity in computing science. It is especially noticeable on social networks like LinkedIn where top posts in computing science (not linked to communities focussed on software quality) are bad advice with comments that suggest practices that are even worse. The origin is often the same, fixing the unexpected or unwanted resulting effects with another dubious practice instead of questioning the grounds. For example, having a code base that is hard to reason about, someone will suggest the use of the console.log (and other console.* functions) of the browser whereas another will put forward the use of the debugger, ...
Please don't! Neither use the former, nor the latter, or only on unusual occasion. A developer job is not to debug.
Nowadays, many developers complain about the fact their job does not meet their expectations. For the less fortunate, they experience a loss of meaning (brown out). Numerous developers change occupations after a few years, disillusioned. This is revolting and sad because most software engineers, in fact, never experienced engineering at all (engineering is the use of scientific principles9). For years, they have missed the point.
This article is designed to challenge people to think about how is their daily software development job, whether it matches their expectations and enable them to grow and find happiness. If you think that it is not the case, just remind that you are not a debugger, you worth better than that, and your job should be much more interesting and matter of pride. Not a single bug in tens of thousands of production code is nothing like impossible. Moreover, when things are done rigorously and according to the rules, even what looks like a boring feature has a taste of achievement.
This article also aims to make companies aware that when looking for quick profits by neglecting quality at the source, they dig their own debts at the design of their company organization.
Then, for all of us, a reminder to always keep an eye on how we deal with issues. So we ensure we are not building an uncontrolled stream of issues caused by a wrong way to solve each of them. Always accept to go back on what you have learned and to learn once more.
Finally, please don't believe in this article, better try software engineering by yourself to form an opinion.
All men make mistakes, but only wise men learn from their mistakes. 10
Footnotes and references
- 1 Pareto for bugs: https://www.crn.com/news/security/18821726/microsofts-ceo-80-20-rule-applies-to-bugs-not-just-features.htm ↩
- 2 Risk management: https://en.wikipedia.org/wiki/Risk_management ↩
- 3 50 bugs per KLOC: https://labs.sogeti.com/how-many-defects-are-too-many/#:~:text=According%20to%20Steve%20McConnell's%20book,(1000%20lines%20of%20code). ↩
- 4 70 bugs per KLOC: https://coralogix.com/log-analytics-blog/this-is-what-your-developers-are-doing-75-of-the-time-and-this-is-the-cost-you-pay/ ↩
- 5 Fixing bug released: https://deepsource.io/blog/exponential-cost-of-fixing-bugs/ ↩
- 6 Negative assurance: https://en.wikipedia.org/wiki/Negative_assurance ↩
- 7 Maintenance cost: http://blog.lookfar.com/blog/2016/10/21/software-maintenance-understanding-and-estimating-costs/ ↩
- 8 Microsoft TDD experiments: https://blog.octo.com/des-chiffres-sur-le-roi-des-tests-unitaires/ ↩
- 9 Engineering: https://en.wikipedia.org/wiki/Engineering ↩
- 10 Winston Churchill ↩
Mathieu Prevel
CEO at Dedipresta
format_quoteSoftware engineering enthusiast and functional programming addict.format_quote
published on Feb 14, 2021