In Short

At Albert Heijn I started of with a POC on large-scale data gathering and reporting on software development performance metrics, the so-called DORA and SPACE metrics. That POC was very succesfull and we started to build a team around the initiative, pivoted slightly to automate internal audit steps around software development and deployment, and swapped various initial tools in the stack for more scalable better performing options.

Details of the Role

Within the global Ahold Delhaize IT organization, the dutch Engineering Enablement Platform (EEP), most known for being responsible for the infrastructure behind the Albert Heijn online grocery ordering platform (AH Online), has been celebrated within the Netherlands for the way it pioneered Azure, innersourcing and general DevOps tooling & practices.

As such, it’s a very innovative corner of the organization with a lot of ambitions. From a central point of view, they are tasked to think about building those platforms that enable the organization as a whole to improve it’s software development, software delivery and SRE capabilities. See also this dutch article, and this other dutch article, displaying the director’s ambitions.

Within EEP, a lot of those ambitions found their initiatives and projects, including: Terraform building blocks for Azure to replace older ARM- and Bicep-based building blocks, an internal Backstage implementation central to the internal development platform ambitions, centrally managed and maintained Github pipelines, Azure-Managed-Databricks-as-a-Service and, last but not least, my project: automated collection and reporting on software development performance what was initially called Agile Metrics.

The work on the initial POC I executed solo, which took about 2 months in total. The POC was mostly setup to data from Github, Jenkins, Jira, SonarQube, OpsGenie and Observability events involving infrastructure changes. As mentioned before, at the end we want to measure the 4 DORA metrics directly or through proxies: Deployment Frequency, Lead Time for Changes, Change Failure Rate and Time to Restore Service. After the first success of the POC, and several brainstorm sessions with the data, we decided to transform the POC into a project part of the EEP portfolio and started building a team.

The goal of that team consisted on the one hand to further improve the performance of the POC, by switching several components for others that are known to be more performative and move to better tools for downstream data analysis and visualization. For this we hired several engineers, devised roadmaps, aligned with future stakeholders and a lot more of what you typically do when you want to land an internal product within the organization. Next to the DevOps-teams, the most important other stakeholder was the internal audit, who were most interested in automating the typical internal control functionalities regarding change management and IT infrastructure reliability. Both goals could be accomplished using the data. Personally, I wanted to spot employee-stress or employee-burn-out signals early on, for which the data might only give suble hints.

Architecture of the Platform

The initial project involved Kubernetes, Argo Workflows, Django, Postgres and the Prometheus stack for both monitoring and reporting. Later one we switched to Strimzi Kafka on Kubernetes, golang based Kafka consumers and producers, DBT, DataBricks and PowerBI. If I would be designing the same platform again today, some of those components might be different.