Risky Business

What could possibly go wrong? You have your visions, plans, roadmaps and a great programme board backed up with lots of exhilarating dialogue. The risks have been ROAMed and everything is under control. In the words of Captain Picard “Make it So”.

And so you begin. I go back now to past experiences with risk management. The project manager didn’t want us young eager things to worry about impending disaster so the risk register was on her laptop, managed by her and communicated to us only when the need to duck and cover came to be. Risk management was exactly that, a matter for management As I sit here writing this I am telling you there has to be a better way.

Risks need to be transparent and communicated. They must be reviewed regularly and acted on as needed. There is a whole science about the identification, recording and analysis of risks. The objective is to understand issues such as how likely is the risk going to occur. How much impact will it have and so on. The mistake I have seen is to assume that the identification and management of risks is something that only the project manager can do. Now move time forward and we have agile, DevOps and frameworks such as SAFe. How should risk management look now?

Well the first thing to say is that risks are everyone’s business. In fact, the people most likely to identify a risk is a team member, although they may simply think it is some kind of worry bead that they are struggling with. Now recall the scrum guide and SAFe values. transparency and courage are the keys here. The scrum guide states transparency is how risk is reduced and SAFe brings risk management into the PI sessions making them transparent.

Once these risks are identified, and the initial ROAMing judgements made, what then? This is now the responsibility of the scrum masters/team coaches and the RTE. We also differentiate the idea of team level risks and ART level risks.

Let’s look at the team level risks, or worries. While these can be identified or felt by anyone, the team coach is the person who needs to both record and progress these events. They are called team risks as they have no need to include people from outside the teams to deal with them. I always like to have visuals in the team area and one of them will be the risk/worry backlog, owned by the team coach. A quick mention at the standup of any changes to this backlog would fall to the team coach to mention. The idea of ROAMing risks is still in scope, with the team coach owning many of the items on this backlog.

Bearing in mind that a risk or worry that happens can result in a problem, and that problems usually surface in software as delays, which can impact other teams, The risk/worry backlog should be shared at the scrum of scrums. A risk that causes a knock on to the other teams, is an ART risk. A discussion with the RTE will explore if the risk should be promoted to an ART level risk, or if it should be left at the team level and monitored. The issue here is if the team can deal with the risk within itself or if ART level, or other, resources are needed to mitigate or own the risk.

Just as the team level risks are managed by the team coach, the RTE manages ART level risks. This starts with the ROAMed risks from the PI planning event, but is a living backlog after that. The RTE will also make sure that the other sync meetings are informed of the risk backlogs and also that any new items are included.

I refer to the backlog items as risks/worries and this is a result of my personal experience with risk management as a younger engineer. A mention of the word risk to some project managers resulted in a feeling that this wasn’t my concern, or that I shouldn’t use language like that as it would scare the grownups. They would inform me that risks had to be managed and that there was a cost in this. They also needed the risk professional to get involved, not only more cost, but also a feeling that the project manager would invite further scrutiny, which they were against. The agilist in me pushes back against this as it goes against my need for transparency and involvement. This is why I use the term worry.

A worry is something that is bothering a team member. It may or may not be a true risk but it does need to be surfaced and discussed. Some worries reflect the normal areas of unknowns that exist in user stories and some may be true risks. Consider the following worries.

I don’t feel well, I may be coming down with a cold
I am not getting good vibes about a dependency we have with another team
There is a lot of new technology in the iteration backlog
This code is worst than I expected it to be
I don’t like my laptop
I want to go the the XYZ conference so that I can learn about our new technology
Someone keeps stealing my lunch
Someone outside the building asked me some questions about our systems here

And so on. Some of these worries may simply need a quiet word and others should be on the register. As a servant leader the team coach should be able to progress this and any other list knowing which parts need to be shared and progressed. Preparing mitigation for a risk is real work and does require that an enabler be placed into the backlog for the team or ART that is doing the work.

So how do we make decisions about risks and the mitigations that apply? Let’s see if the SAFe principles provide any help here.

1: Take an economic view: This seems a good choice as the likely hood and impact of risks have clear economic implications. If a risk is not very likely and has low impact, then accept would seem a good choice. But other values here imply we should invest in mitigations and resolutions.

2: Apply systems thinking: This is a must. A risk is unlikely to be contained and may be involved in a number of dependencies. Think in terms of the big picture when analysing the impact.

3: Assume variability and preserve options: The mitigation should provide set thinking and perhaps allow a number of options.

8: Unlock the intrinsic motivation of individuals: encourage people to use their creativity when planning your way through the risks.

9: Decentralise decision making. Let a team level risk stay at the team level. It is their responsibility to inform the ART and escalate if needed.

While you may be able to relate risks to all of the principles these are the ones I feel are more important to this matter. Recall that practices evolve from these principles, therefore informing the way you approach risk management.

To take the discussion further I asked ChatGPT about software risk management. This is the result.

Software risk management is the process of identifying, assessing, and controlling risks associated with software development and maintenance. It is an important aspect of software engineering that helps to ensure the successful completion of software projects and the delivery of high-quality software products.

The software risk management process typically includes the following steps:

Risk identification: The first step in software risk management is to identify potential risks associated with the software project. This can be done through brainstorming sessions, risk checklists, and other methods.
Risk assessment: Once the risks have been identified, they must be assessed in terms of their likelihood and potential impact on the project. This information can be used to prioritize risks and develop mitigation strategies.
Risk mitigation: The next step is to develop and implement strategies to mitigate the identified risks. This may involve modifying the software development process, adding additional resources, or taking other actions to reduce the likelihood or impact of the risk.
Risk monitoring and control: Finally, the software development team must monitor the project and take action to control risks as they arise. This may involve modifying the risk mitigation strategies, adjusting project schedules, or taking other actions to ensure the successful completion of the project.

Effective software risk management requires a combination of technical expertise, project management skills, and risk assessment and analysis capabilities. By identifying and addressing potential risks early in the software development process, teams can minimize the likelihood of project delays, cost overruns, and other problems that can arise when risks are not properly managed.

While the word project may be outdated, much of this is what we would expect. Let’s simply comment on some of this.

Risk identification is recorded during PI planning and is an ongoing process as described above. I take this from the stage of a worry right up to the more serious issues such as the office space being used being consumed by fire. If the environment uses home working then we still need to concern ourselves with issues such as cyber attacks of many sorts. Risks may be different in your environment, but they are still very real.

Risk assessment can be done by the teams, but this is where some formal risk management skills can be useful. The results of this applied to the risk backlog before items are placed in the team or ART backlogs. This is where ROAMing can be applied as that helps categorise and prioritise risks. Any risk in the accepted category would be low priority due to analysis of the likelihood and impact assessments.

Risk mitigation is the result of the stories or enablers that are identified during the risk assessment. As with any story this will involve the whole team to work through. Not all risks will have mitigations, but those that do will involve work from the teams which will result in demos being delivered. Consider who the stakeholders are for these demos.

The various sync meetings and the DSUs are places where risks are monitored and where meet afters can be identified for further analysis. I did enjoy the article by Rebecca Davis on Agile risk management here https://scaledagileframework.com/agile-risk-management/ and can recommend it as further reading. Not many organisations will have a risk master role, but the job of risk master belongs to the RTE and team coaches.

In general risk management is seen as a boring subject. But I disagree and due to the disasters that are lurking out there encourage a robust and repeatable approach. Hey, it is part of software engineering after all.

Comments

Leave a Reply Cancel reply