Software Philosophy: Data at the outer layer

Posted by

Vasco de Krijger on April 28th, 2021

Introduction

Recently I had the opportunity to work on the 'darker side' of a codebase. It was a system that contained a bunch of hacks to make it cope with the ever-changing requirements from business, but which never had the chance to properly adapt to these requirements, thus containing a lot of 'one off' statements in the codebase.

This startled something in me. So I started thinking about possible ways to prevent this, and I kept coming back to something I learned whilst learning Scala (more about this later on).

In this blog post, I'd like to first explain the general gist of the codebase I got to work on, after which I'd like to go into more details about the 'philosophy' I came up with which I'll follow up with describing the pros and cons and which I'll end by defining when (and more importantly; when not) I'm using it.

The system that started it all

As I mentioned before, I recently had the opportunity to work on a system which could use a bit of love. The goal of this system basically boils down to:

Given a set of selections from an user, try to find a set of rules which is applicable for the user.
(Where the user selection can have overlapping segments, thus selections from A could also be relevant for selections for D).

The last part of the above description is exactly where the hacks got introduced in this system. Whilst the system (initially) had a fairly straightforward architecture, it suffered from not being able to keep up with the ever-changing requirements, causing the codebase to slowly but surely fill with exceptions here and there, which created a very hard to manage system.

To illustrate the this a bit better; Lets assume the user has the following options:

A:1, A:2, A:3, A:4, B:1, B:3, B:4, C:1, C:2, C:3, C:4

The user could select any combination of the above options, to get a selection of rules which are applicable for him / her. In the basics this sounds fairly straightforward (e.g. if the user selects A:1, then show Rule:12, and then for all possible permutations). However with time this system became more and more complicated, because business started requesting more difficult requirements. An example of such a wish was that every time the user selected B:1, we also had to select A:3 (but not vice versa). Initially this was done by adding a bunch of if statements whenever the user selected A:3, to also look at B:1. Whilst this worked for a 'one-off' case, this quickly grew more and more, since more and more modifications were requested and the system never got the chance to adapt.

This caused me to start thinking on ways to tackle this problem (if not for this system, then for the systems in the future) and I had to think back of the time when I was teaching myself Scala. When I was learning Scala, I encountered various different patterns that are more common in the functional world. One of these patterns was to keep the 'IO at the outer layer' (thus no IO / Database calls / etc... inside the 'core' of the application). This pattern originates from the desire in the functional world to keep the code pure (and therefore have the code have no side effects). Since IO is a side effect (it can fail for reasons outside the system), it's viewed as an 'impure' function. You might be wondering now how you would write an application with only pure functions, and you're right, this is practically impossible (since a database call is technically a side effect as well). Therefore in the functional world they have a strong preference to have all these 'impure' function at the outermost layer of the system.

Initially this might sound a bit extreme, but it did get me to start thinking differently about the architecture of the above system.

The philosophy

With the knowledge of the 'IO at the outer layer' pattern, I started thinking about possible future improvements for the beforementioned system. After a quick inventarisation of all the 'one-off' hacks that were present in the codebase, I noticed that all these hacks were related to the data that was given and handled inside the application.

This startled something in me, and I started tinkering with the idea to refactor the system to, instead of building and modifying the data internally, be able to have the required data injected. As a result of this it would mean that instead of all the 'hidden' one-off hacks, we could expose all these hacks and handle these at the outermost layer of the system. With these changes in place, it would mean that we still have the hacks, however these hacks are now better defined and a lot easier to manage (and thus causing a lot less technical debt, allowing us to postpone a full refactor of the system).

In a nutshell, the idea of the 'IO at the outer layer' philosophy is to prepare all the data and dependencies at the outermost layer of the system. These details can then be passed into the 'core' of the system, which can then start using them without having to know about all the details of how these details got obtained in the first place.

Apart from the desire to keep the 'effects' (e.g. IO or data) at the outer layer, the philosophy also integrates nicely with the idea of dependency injection. This because with dependency injection, the preference is to define all the dependencies at the outer layer and then propagate them into the application. Which has some resemblances with the philosophy of keeping effects at the outer layer.

Example

Since I've mostly been talking quite abstract, I can imagine that it might be a bit difficult to follow. Therefore lets look at a very simple example of a situation where we apply the 'effects at the outer layer' philosophy.

<?php

/**
 * @param UserSelections $selection A data object containg all the selected 'rows' from the user.
 * 
 * @return Breadcrumb[] An array containg the breadcrumbs to display on the page.
 */
function buildBreadcrumbs(UserSelections $selections): array
{
    $breadcrumbs = [];
    
    foreach ($selections->getSelections() as $singleSelection) {
        // Example of a 'random' hack, these can be littered all over the codebase
        // Since this special case might also cause changes in for example the returned SEO data.
        if ($singleSelection->isA1Selected() && $singleSelection->isB2Selected()) {
            // If the user has selected both A1 and B2 in a single 'selection', we 
            // cant build the breadcrumb the normal way. Instead we have to build 
            // a breadcrumb that points to the homepage of the site.
            $breadcrumbs[] = $this->buildHomepageBreadcrumb();
            continue;
        } 
        
        $breadcrumbs[] = $this->buildBreadcrumb($singleSelection);
    }
}

Example #1: Without the philosophy applied.

As you can see in the above example, there is a 'hack' in place which requires us to add a different Breadcrumb. Whilst the example only shows this hack in a single place, you can probably imagine that this is something that will be checked in a bunch of other locations (e.g. when building the SEO data).

Let's now look at the above example, but then with the 'effects at the outer layer' philosophy applied:

<?php

/**
 * Normally we would define the functions below in their own classes.
 * However for the sake of simplifying the example, it's defined in a single file.
 */

/**
 * @param UserSelections $selection A data object containg all the selected 'rows' from the user.
 * 
 * @return BreadcrumbDataObject The data object which knows the {@see Breadcrumb}s we need to build.
 */
function createBreadcrumbDataObject(UserSelections $selections): BreadcrumbDataObject
{
    $breadcrumbData = new BreadcrumbDataObject();
    
    foreach ($selections->getSelections() as $singleSelection) {
        // Whilst the hack is still present, it can be centralized a lot more allowing you
        // to move the hack outside the 'core' of the application making it easier to create a single
        // location where all the hacks are defined.
        // Also I'm aware that this is not the best example (since the method could be moved inside the $singleSelection), but I'm trying to sketch a simplified situation here.
        if ($this->isAlternativeHomePageSelected($singleSelection)) {
            $breadCrumbData->addBreadcrumbData('/', 'Homepage');
            continue;
        } 
        
        $breadcrumbData->addBreadcrumbDataFromSelection($singleSelection);
    }
    
    return $breadcrumbData;
}

/**
 * @param BreadcrumbDataObject $breadcrumbData A data object containing all the details required 
 *                             to build the required {@see Breadcrumb}s.
 * 
 * @return Breadcrumb[] An array containg the breadcrumbs to display on the page.
 */
function buildBreadcrumbs(BreadcrumbDataObject $breadcrumbData): array
{
    return array_map(function (BreadcrumbData $breadcrumbDataEntry): Breadcrumb {
        return $breadcrumbDataEntry->toBreadcrumb();
    }, $breadcrumbData->getEntries());
}

Example #2: With the 'effects at outer layer' philosophy

Whilst the above example, isn't the best example (e.g. it would be possible to move functions around to make the hack a feature instead of a 'one-off case') I do hope that it helps to give some substance to the things I've mentioned earlier.

As you can see in the second example, we've vastly simplified the Breadcrumb building. This allows us to keep all the logic in the buildBreadcrumbs method pure, whilst we deal with all the exceptions at different places (the outer layer; createBreadcrumbDataObject). We can then have all the one-off cases be centralized in this outer layer, leaving the inner layers (e.g. the core of the system) to only deal with the things they were originally designed to do (clean of impure one-off cases).

Pros and Cons

Now that you, hopefully, have a bit of a picture of the philosophy described here, lets look at some pros and cons. These pros and cons can help you make an informed decision on whether or not you should use this philosophy for your next (big) project.

Pros:

The core system is a lot less suspecticable for changes. This because you only have to update the core if a new feature is introduced. Requirements for different data-flows can all be handled at the outermost layer.
- As a result of this, the system should be easier to follow as well.
It improves the testability of the code (see my other blog post for more information about this).
Depending on how heavily the data is integrated in the system, it can be fairly simple to refactor a system to make use of the 'effects at the outer layer'.
If a hack becomes more of a feature then a one-off case, you could add a function to the data object and then check for this function inside the inner layers (e.g. areMultipleConditionsSelected()).

Cons:

You do have to be careful to not create a God-object. It's easy to lose yourself into putting everything into a big 'bag' of data. This creates a very tightly connected application which is undesired.

When to use it

Whilst the system is dynamic and usable in a lot of systems, there are still cases where I would prefer to use a different approach. These situations vary from person to person, thus your mileage may vary. Therefore I've listed the cases when and when not I would use this philosophy below:

When would I use it?

Personally I would use it for systems which are handling complex selections of user input.
If there is a system that currently has a lot of one-off cases, I would see if I could apply this philosophy.

When would I not use it?

If a system is already in place and it works, it might not be worth it to refactor it to make use of this philosophy. E.g. in these cases your time can be better spend elsewhere.
Systems that are designed for machine-to-machine communication (and thus don't really have data / user input, or might not have control over the given input).

Conclusion

Nice to see that you've made it to the end of this (relatively) long blog post! Have you used this philosophy before? Or is this an entire new philosophy that you can add to your engineering toolbelt? Please let me know below, I'm eager to hear from you!

Software Philosophy: Data at the outer layer

The system that started it all

The philosophy

Example

Pros and Cons

Pros:

Cons:

When to use it

When would I use it?

When would I not use it?

Conclusion

Vasco de Krijger

How to build and structure API's in Laravel (including authorization)

Composition or Inheritance, when to use which?

Subscribe to Codekrijger