Understanding software development approaches and differences
Different developer/ programmer profiles, competences and approaches to solve a problem or to conceive a software lead to very different solutions. These distinct ways will be seen through an example with their pros and cons.
Different approaches to software conception
Low-code development
This activity is the farthest of what is usually called programming, since the aim is to use as little code as possible. It is called "low-code" (or "no-code") development and you probably already practiced it if you used a spreadsheet application. There are more and more software nowadays to create your own using interfaces with pre-built components, boxes you fill, arrange, order and bind to others to perform your expected design or application.
No-code development allows to do quick application developments when the need of custom behaviours is small and/ or when the kind of application and its requirements are well known like websites, video games, workflows, ETLs, ... More and more developers are low-coders thanks to a lot of progress in the underlying applications especially due to the artificial intelligence.
This article will not dive deeper on this profile and activity since our red wire will be a code implementation.
Software development
Software development1 includes a lot of different activities but we will simplify its definition with the following. Software development is the activity of building and shaping a software using known algorithms and processes, thus a software developer is a software craftsman mixing empirical practices with his own experience. The developed algorithms relay mainly on basic code structures (loops, conditional statements, ...).
Most of programmers are software developers (whatever the degree or job title).
Software engineering
Engineering2 is the use of scientific principles and processes to design and build a system. Thus software engineering3 is the activity of building and shaping a software using these scientific principles and a software engineer can be seen as a scientist using models and scientific knowledge to shape and build software.
Computer science finds its roots in mathematics, thereby using mathematics allows to resolve algorithm issues while achieving a goal of abstraction. Software engineering is much less common for the reason that it requires a wider knowledge and experience.
Approach differences by an example
In order to illustrate how the same problem can be solved in different ways, we will take a trivial example perhaps you already encountered.
The problem we will solve is to count the sum of pocket money received per child.
- Each month, a child may or may not receive pocket money.
- A month is represented with a map associating a children name (key) to a positive integer amount in euro (value).
- We assume the names are unique, so multiple occurrences (between different months) refer to the same child.
- The expected result is a map associating the name of each child to the total amount of received pocket money for the given months.
For example we may start with 2 months (january and february ; march will be used later) with the following data:
Pocket money for the month of January:
val jan = Map(
"Nolan" -> 10,
"Wendy" -> 15,
"Tuck" -> 10,
"Meddi" -> 10
)
Pocket money for the month of February:
val feb = Map(
"Mandy" -> 12,
"Quentin" -> 15,
"Nolan" -> 10,
"Wendy" -> 15
)
Pocket money for the month of March:
val mar = Map(
"Nolan" -> 10,
"Wendy" -> 15,
"Tuck" -> 10,
"Meddi" -> 10,
"Mandy" -> 12,
"Quentin" -> 15
)
The expected result for the merge of the maps jan and feb is:
Map(
"Nolan" -> 20,
"Wendy" -> 30,
"Tuck" -> 10,
"Meddi" -> 10,
"Mandy" -> 12,
"Quentin" -> 15
)
If you are a programmer, accept this challenge and take a few minutes to think to a solution (or better code it and measure the time needed to solve this problem).
Software development approach
The most often used approach is to cut a big problem in problems of a smaller size. So we will focus on the merge of 2 maps (january and february), then we will generalize this behaviour (because we may want to sum the pocket money for a whole year, for example).
The imperative way
In order to merge the 2 maps we will use a straightforward algorithm consisting on the accumulation of key/ value pair in a third map. Firstly we iterate on the first map and for each key we sum with the value from the second map. Then we iterate on the second map to add to the third map the keys that were not in the first map. The only risk in this simple algorithm is to forget the second loop.
Algorithm for 2 maps
// we create a map where the result will be stored
val result = mutable.Map.empty[String, Int]
// we add all children from the january map
// and for all of them we sum the received pocket money during january and february
for ((name, amount) <- jan) {
result += name -> (feb.getOrElse(name, 0) + amount)
}
// we also have to add the children that are in the
// february map but not in the january one
for ((name, amount) <- feb) {
if (!(jan contains name)) {
result += name -> amount
}
}
Generalized algorithm
Now that we have a working solution for 2 months we will generalize this solution for a list of months.
A mergeAll function will take a list of maps in input and use the merge function.
// we wrap the previous solution in a function
// and rename jan/ feb to m1/ m2
def merge(m1: Map[String, Int], m2: Map[String, Int]): Map[String, Int] = {
val result = mutable.Map.empty[String, Int]
for ((name, amount) <- m1) {
result += name -> (m2.getOrElse(name, 0) + amount)
}
for ((name, amount) <- m2) {
if (!(m1 contains name)) {
result += name -> amount
}
}
result
}
// we create another function that will take a list of months in input
def mergeAll(months: List[Map[String, Int]]): Map[String, Int] = {
if (months.isEmpty) { // the list may be empty
Map.empty[String, Int]
} else if (months.length == 1) { // or the list may contain only one month
months.head // take the first element
} else { // otherwise there are 2 or more maps in the list
var result = Map.empty[String, Int]
for (month <- months) {
result = merge(result, month) // merge the current list with the accumulated result and overwrite previous result
}
result
}
}
Did you think to the edge cases ? The sum of pocket money for 0 or 1 month only ?
Test
Testing imperative code with loops and conditional statements is often hard and takes quite a time to do, especially to choose and write the test data.
As expected, the merge function is not easy to test and will probably be tested with a sample that has been "computed" by hand (or worse, by the function itself!). It will ensure that the selected sample (and only this one), leads to the expected result.
The previous data are used to keep the test concise.
assert(merge(Map(), Map()) == Map())
assert(merge(jan, Map()) == jan)
assert(merge(Map(), jan) == jan)
val janAndFeb = Map(
"Nolan" -> 20,
"Wendy" -> 30,
"Tuck" -> 10,
"Meddi" -> 10,
"Mandy" -> 12,
"Quentin" -> 15
)
assert(merge(jan, feb) == janAndFeb)
Testing the mergeAll function is even harder since it uses a more complex input. Moreover, this function also has the same testing drawbacks we have seen with the merge function.
val janAndFebAndMar = Map(
"Nolan" -> 30,
"Wendy" -> 45,
"Tuck" -> 20,
"Meddi" -> 20,
"Mandy" -> 24,
"Quentin" -> 15
)
assert(mergeAll(List()) == Map())
assert(mergeAll(List(Map())) == Map())
assert(mergeAll(List(Map(), Map())) == Map())
assert(mergeAll(List(jan)) == jan)
assert(mergeAll(List(jan, feb)) == janAndFeb)
assert(mergeAll(List(feb, jan)) == janAndFeb)
assert(mergeAll(List(jan, feb, mar)) == janAndFebAndMar)
The functional way
For the purpose of seeing an alternative method, the following code uses a functional implementation. The aim is to keep it simple, so no currying or local mutability to improve performance is involved (because it is not the object here). However, to be honest, this way out to merge maps is a level higher than the previous implementation and implies some more knowledge and practice. It may be seen as an intermediate solution between the software development and the software engineering one. Actually, this is not surprising since functional programming principles also have root in mathematics.
Algorithm for 2 maps
Here the merge has been cut in 2 small functions. The first function compute the sum of the values on two maps and on a single key and return a pair associating the key (the child name) to the sum. This function has been extracted only for readability (in fact it is pure, so it is referentially transparent and could be replaced with its body). The second function iterates on the keys and accumulates the pairs for each child and amount on an immutable map.
These functions were dissociated because it is already a generalization. In fact, a map can be seen as an iterable of tuples of two elements where the first element of the tuple is the key and is unique while the second element of the tuple is the associated value.
// build a key/ value pair and sum the values of two maps on the given key
def pairSum(key: String, m1: Map[String, Int], m2: Map[String, Int]) = key -> (m1.getOrElse(key, 0) + m2.getOrElse(key, 0))
// merge 2 maps using an iteration on keys and an accumulative map
def merge(m1: Map[String, Int], m2: Map[String, Int]): Map[String, Int] =
(m1.keySet ++ m2.keySet).foldLeft(Map.empty[String, Int])((acc, key) => acc + pairSum(key, m1, m2))
Generalized algorithm
The mergeAll function has been made to get a code that looks like the previous version, nonetheless the body is concise enough to be used directly if preferred.
def mergeAll(months: List[Map[String, Int]]): Map[String, Int] =
months.foldLeft(Map.empty[String, Int])(merge)
We have written 3 functions pairSum, merge and mergeAll. pairSum body can be put in the merge function whereas the mergeAll is not needed at all.
def merge(m1: Map[String, Int], m2: Map[String, Int]): Map[String, Int] =
(m1.keySet ++ m2.keySet).foldLeft(Map.empty[String, Int])((acc, key) => acc + (key -> (m1.getOrElse(key, 0) + m2.getOrElse(key, 0))))
List(jan, feb, mar).foldLeft(Map.empty[String, Int])(merge)
It is, indeed, a 3 lines solution.
Test
Testability of these functions is much better than the imperative version of this algorithm. We can perform the test of these functions with minimal predefined assumptions. It is possible to write very generic tests that never explicitly define edge cases (for example that the list should have at least 2 elements).
To do so, we will use property based testing that allows to generate random data according to the expected input types of the function. The underlying framework will generate tens, hundreds or more samples (depending on the configuration) and ensure all of them verify the assertions.
To test the pairSum function we will check that whether the 2 maps contain or not the provided key, we are able to generate a tuple associating the key and the computed the sum value.
As you can see no assumption is done on the data itself. The expectation is to have two random maps with String keys and Int values. A few updates on the generated data are done to be able to verify some equalities. In fact, it is required to ensure a key is inside or absent from the map so it is enforced by adding the key (overwrite) or removing it from the generated data.
"pairSum" should "associate a key to the sum of its values from 2 given maps" in {
forAll { (m1: Map[String, Int], m2: Map[String, Int], k: String, n1: Int, n2: Int) =>
pairSum(k, m1 - k, m2 - k) shouldBe (k -> 0) // m1 and m2 do not contain the key
pairSum(k, m1 - k, m2 + (k -> n2)) shouldBe (k -> n2) // m1 does not contain the key
pairSum(k, m1 + (k -> n1), m2 - k) shouldBe (k -> n1) // m2 does not contain the key
pairSum(k, m1 + (k -> n1), m2 + (k -> n2)) shouldBe (k -> (n1 + n2)) // m1 and m2 contain the key
}
}
To test the merge function we will check that for each output key/ value pair, the value is equal to the sum of all the values from the list of maps where the key is the same. It is important to also check that no key is lost since if all of them were lost this previous test would always be true.
"merge" should "sum the values by key for 2 maps" in {
forAll { (m1: Map[String, Int], m2: Map[String, Int]) =>
merge(m1, m2).keySet == List(m1, m2).flatMap(_.keySet).toSet // ensure the keys are preserved
merge(m1, m2).forall { case (key, sum) =>
sum == List(m1, m2).flatMap(aMap => aMap.collectFirst { case (k, v) if k == key => v }).sum
}
}
}
To test the mergeAll function we just have to update a little bit the test used for the merge function. The test itself makes obvious that merge and mergeAll are deeply related and that the latter is a generalization of the former.
"mergeAll" should "sum the values having the same key from a list of maps" in {
forAll { list: List[Map[String, Int]] =>
mergeAll(list).keySet == list.flatMap(_.keySet).toSet // ensure the keys are preserved
mergeAll(list).forall { case (key, sum) => // ensure that each the sum associated to a key is valid
sum == list.flatMap(aMap => aMap.collectFirst { case (k, v) if k == key => v }).sum
}
}
}
Software engineering approach
If you know category theory4 you probably recognized our problem uses a (commutative) monoid5 i.e. an algebraic structure with a single associative binary operation and an identity element.
If you're not familiar with this notion, assume that the associative operation is called x
, so:
- there is an identity element
Id
such thata x Id = Id x a = a
- this operation is associative such that
a x (b x c) = (a x b) x c
In our case, the identity element is the empty map and the binary associative operation merges 2 maps into a single one summing the values per key, so the result is another map of the same shape (endomorphism).
Assuming you are using a known monoid (so it is implicitly provided by the library) the merge of the two maps can be computed quite easily.
jan |+| feb // if you like symbols
jan combine feb // if you prefer the ascii alias
It also can be generalized for a list of maps without any effort.
List(jan, feb, mar).combineAll
Look how much this solution is abstract since it only relies on mathematical properties.
Test
Since this monoid instance is already provided by the library we are using, it is already tested and adding our own test is useless. Nonetheless, for the reader's benefit, we will check that the mathematical properties are verified.
Like the tests seen previously it is based on property based testing to validate the properties on a great number of data. The test name is Map[String, Int] and the laws of a commutative monoid will be verified on random data of the type Map[String, Int].
checkAll("Map[String, Int]", CommutativeMonoidTests[Map[String, Int]].commutativeMonoid)
Running the test will output:
[info] - Map[String, Int].commutativeMonoid.associative
[info] - Map[String, Int].commutativeMonoid.collect0
[info] - Map[String, Int].commutativeMonoid.combine all
[info] - Map[String, Int].commutativeMonoid.combineAllOption
[info] - Map[String, Int].commutativeMonoid.commutative
[info] - Map[String, Int].commutativeMonoid.is id
[info] - Map[String, Int].commutativeMonoid.left identity
[info] - Map[String, Int].commutativeMonoid.repeat0
[info] - Map[String, Int].commutativeMonoid.repeat1
[info] - Map[String, Int].commutativeMonoid.repeat2
[info] - Map[String, Int].commutativeMonoid.right identity
It may appear confusing because we did not test the result we are expecting but that the data type we are using verifies the laws we need (and more).
A step further
So far so good. However a living project is likely to be updated sooner or later.
What happens if we decide to change the pocket money amount from the type Int to the type Double, BigDecimal, or to replace the name with an identifier of type Long ?
The first two algorithms version will require an update to match the data type changes.
Nevertheless, notice how the abstract algorithm using the monoid properties never betray the underlying data type.
While the new data type fulfill the same laws no code update will be needed.
As you guessed Map[String, Double]
, Map[String, BigDecimal]
or Map[Long, Double]
are also known monoids!
To Monoid or not to monoid, that is the question?
The monoid availability is checked at compile time. If the code does not compile anymore after a data type update there is a simple questions that should rise.
Is it expected that no monoid instance is found?
Yes, the data type is not a monoid!
- Great, you probably found a bug (at compile time) because there is probably no good reason to loose these properties.
- Interesting, may be some properties are not preserved anymore (eg. the structure is now a semigroup instead of a monoid).
- Sad, the requirements changed so much that you cannot use any abstraction and have to write your own algorithm.
No, the data type is a monoid!
- It is a custom data type so you will have to create a monoid instance.
Pros and Cons
It is time to compare the different implementations we made.
Implementation time perspective
If you accepted the challenge to solve this simple problem, you may know how many time it takes to write an implementation that probably looks like one of the above. It varies slightly but some reference time could be of:
- 30 minutes or more for the imperative version
- 10 to 15 minutes for the functional version
- 30 seconds or less for the abstract version
In terms of implementation time the difference between these 3 versions is quite impressive. The functional approach is twice faster to write than the imperative one, while the version using a monoid instance is 60 times faster.
Data type update perspective
This aspect has been studied when we evoked a data type update.
- the first algorithm requires 8 updates for the signature of the function but also for the conditional statement branches
- the second algorithm requires almost twice less, so 5 updates
- the third algorithm requires 0 update
About the maintainability due to the data type update, code abstraction leads to more flexible implementations and less update needs.
Test maintainability perspective
Testing a software is an important part of the developer job. The aim of the tests is not only to ensure the correctness of the developed feature but to ensure that future updates are not breaking a previously implemented functionality.
- the ability to test the first algorithm is low since it is based on a single sample arbitrary selected by the developer, moreover the test is tightly coupled with the code itself and will probably need an update every time the main code is updated
- the second algorithm took a little bit of abstraction such that the test itself can gain in abstraction ; there is no more update of the test data to do in case of the main function update and we can focus on testing the pre-conditions and post-conditions that are unlikely to change
- the last algorithm, here, requires no test at all since it uses properties that are already tested by the underlying library, however if we absolutely want to test it ourselves it is a single line of code to produce
Programmers often complain about the time impact of testing and maintaining tests. The main argument is that the ratio of time to develop new features divided by the time to update all the impacted tests decrease with time as the project grows. There is a bit of truth but has we just seen it is mainly a matter of software design and implementation. The most abstract your code will be the less code tests updates you will require.
Overall code maintainability perspective
Actually, the overall code maintainability depends on much more factors than those we've just seen. We will assume that best practices were adopted (docs, comments, single responsibility, ...). Consequently we will only have a look at the number of lines that is an underestimated metric.
When a new team member joins a project the overall size of the project does matter. When an update or a fix is required and when looking for the exact location to patch, the size of the feature implementation does matter. Moreover, the productivity on a project decreases while the project grows.6 With the benefit of hindsight provided by tens years of data and studies we also know that the number of defects increases with the number of lines of code7, or may be more interesting is that the number of lines of code, per day, per programmer, in average, is very low (10 to 50 lines a day) and does not depends on the language.8
Back to the main topic, though.
- the imperative algorithm is 36 lines of code with 36 more lines for the tests data, this is a 72 lines of code implementation
- the functional approach uses 29 lines of code
- the last implementation is a single line of code (with a total of 2 lines if you write your own test)
This is significant because not only the smallest implementations are the quickest to write but they also are not a future debt by their weight in terms of number of lines of code.
Understanding the repartition of programmers
For Alan Turing, the future of the computer science was expected to be in mathematics.9 So why is the immense majority of programmers performing software development rather than software engineering?
The reason is related to the complexity business needs.
The number of developers doubles every 5 years.10 What does mean such an exponential increase (as bad as the human is to understand the exponential function11) ?
- the number of needed developers is greater than the number of required professor to teach them properly
- the graduation level has to be accessible to enough people to ensure to satisfy the industrial needs
- the average experience is low
The first assertion implies that the quality of the education may vary a lot between different developers due to the lack of educational guidance.
The second argument is about the fact that when you perform a selection to get the best elements on a finite set (here it is about a pool of students to graduate or hire, but it could be about the size of fish to harvest, ...), when you need to enlarge the selection, it can only be done by reducing the global requirements. It is the exact opposite of improving the knowledge base.
The last point is about the lack of experience and mentoring. This growth in the number of programmer means that programming is a job dominated by young people by definition. 1 on 2 has less than 5 years of experience, 75% have less than 10 years of experience, ... This also means that if you are a programmer reading this article there is only 25% chance that you are 33 years old or older, 12.5% you are 38 years old or older...
With these reasons in mind we realize why software development prevails and why software engineering remains a minority in the industry.
In addition, choosing the road of software engineering is not an easy path for a company because the number of available candidates is relatively small and a lot of companies compete for them, so they tend to have higher expectations (salary, quality of the work, working context, ...). It is also hard to go back since a less qualified and skilled team member will not be able to understand and take part to the project.
Fortunately, it well worth the money. We've only followed a single simple example but mathematics have a lot of properties that can be useful to write a program.
Through the use of these properties you may improve substantially the quality of your software by preventing errors by design what is a major concern when at least 75% of the cost of an average software is dedicated to its maintenance.12 Thus, committing resources to boost the quality from the early stage of a project may reduce its total cost13 and improve customer satisfaction.
As our example let it assumed, abstraction improves the productivity and limit the number of expensive refactoring. Besides, it has been repeated by numerous studies, for more than 50 years, and despite the fact it seems counter-intuitive, there is no relationship between the performance of a developer and the years of experience, worse, the difference between a poor and a good performer is at least of one order of magnitude14 and thus investing in better performers with wider programming skilled is a valuable strategy.
To conclude, if you are a programmer who wish to improve your skills but think what you have read seemed too hard, just remember that it always seems impossible until it’s done.15
Footnotes and references
- 1 Software development: https://en.wikipedia.org/wiki/Software_development ↩
- 2 Engineering: https://en.wikipedia.org/wiki/Engineering ↩
- 3 Software engineering: https://en.wikipedia.org/wiki/Software_engineering ↩
- 4 Category theory: https://en.wikipedia.org/wiki/Category_theory ↩
- 5 Monoid: https://en.wikipedia.org/wiki/Monoid ↩
- 6 https://blog.codinghorror.com/diseconomies-of-scale-and-lines-of-code/ ↩
- 7 https://www.mayerdan.com/ruby/2012/11/11/bugs-per-line-of-code-ratio ↩
- 8 https://dzone.com/articles/programmer-productivity ↩
- 9 Alan Turing: Lecture to the London Mathematieal Society on 20 February 1947 ↩
- 10 https://blog.cleancoder.com/uncle-bob/2014/06/20/MyLawn.html ↩
- 11 https://en.wikipedia.org/wiki/Albert_Allen_Bartlett ↩
- 12 http://blog.lookfar.com/blog/2016/10/21/software-maintenance-understanding-and-estimating-costs/ ↩
- 13 https://brainhub.eu/blog/cost-of-quality-in-software-development/ ↩
- 14 https://accelerateddevelopment.blogspot.com/2013/03/it-no-experience-required.html ↩
- 15 Nelson Mandela ↩
Mathieu Prevel
CEO at Dedipresta
format_quoteSoftware engineering enthusiast and functional programming addict.format_quote
published on Mar 23, 2020