By Songhua, Senior Expert in Alibaba
It goes without saying that effective defect analysis reduces the number of bugs in your code. But are defects bad? Defects identified during software development are extremely valuable, but many organizations suffer the cost and consequences of defects without tapping into their value.
The value of defects lies in the opportunity for learning and growth when the defects are found and corrected. Learning from defects can quickly improve an organization's ability, reduce the possibility of future defects, save on costs, and facilitate success. Nonetheless, effective defect analysis and the consequent follow-up actions require effective methods and the necessary support from the organization.
Recently, we hosted a workshop on defect analysis.
"Is it a good thing to have defects?" This was the first question I asked the participants when the workshop started.
"Of course it's a bad thing."
"Good or bad, there will still be defects. I don't think it matters whether it is good or not. It just happens."
"That sounds about right. However, defects cause trouble. I don't like things that give me trouble."
It is true that no one enjoys defects. Defects increase development costs and prolong the development cycle. However, defects are inherent in software development, they will always exist. Instead of focusing on whether a defect exists in a software or not, teams should focus on the quantity of defects and understand why these defects happen.
Software development is a process of eliminating uncertainties. It is quite different from traditional manufacturing processes used for industrial production. Industrial production aims to achieve its goal of zero defects gradually by eliminating all uncertainties during the production process. Therefore, concepts like Six Sigma has been very effective in industrial production.
The process of software development is exactly the opposite. Uncertainty is a given in every software development project. It is quite common for all the relevant issues and details to become apparent only towards the end of a project. Therefore, instead of pursuing zero defects, it makes more sense for us to focus on reducing the impact of defects. For example, if a defect is identified right when it occurs (at the time of or even before injection), the cost of the defect is almost zero and it basically has no impact on the project.
If the Six Sigma approach, widely used in industrial production, was created for production systems, what is an equivalent approach that can help avoid defects in development? Such an approach surely needs to encapsulate the processes and tools of software development. However, in my opinion, its most important element should be people, the core entity involved in the making of software. Constantly learning from defect analysis is the only way to avoid squandering the opportunities presented by defects.
Many development teams already have some sort of defect analysis in practice. I studied the defect analysis conducted by a certain team and found that defects were analyzed as follows:
I think the reviewers who determined the causes of the above defects truly believed that the code was not good enough, code review was not performed effectively, or the business scenarios were not analyzed thoroughly. The only problem? Such analysis will not lead to actionable solutions.
Was the problem really caused by the failure to consider more details? Or was it the result of incompetence in perceiving the details? If one didn't review the code effectively, what should be done to ensure the effectiveness of the next review? If one didn't analyze the business scenarios comprehensively, what would a comprehensive analysis entail?
These statements of causes are too broad to produce practical and effective improvements. Therefore, the same defects are likely to occur again. Every repeated defect is a new mistake to be learned from.
However, going too far into the specifics when analyzing defects has its own problems. For example, a defect analysis may read something like "Service A should not call Service B", or "Service A should consider abnormal conditions when calling Service B". Such comments tell you how to fix one specific defect, but do not lead to generalizable practices that can improve the quality of the code.
Therefore, the most effective type of defect analysis should yield insights that lead to systematic and actionable results. Good defect analysis should be able to be applied to avoid similar situations in the future.
We have come up with five key points to help you turn defect analysis into a learning opportunity. They are:
Although defect analysis is very important, the developers are too busy to spare time for frequent analysis. How about we do one analysis every two months?
Don't worry. The most economical approach is to conduct the analysis in a timely manner. You will do fine just by setting aside a mere 15 minutes for defect analysis every time you can.
The best time for defect analysis is when a defect has been repaired. Analyze the problem when your memory is fresh and learn from it as soon as possible. If you analyze a defect from two months ago, analysis will be much more difficult because you will have to refresh your memory of the issue and its context. You might not even remember the matter correctly.
Therefore, it is essential that we find a way to conduct defect analysis as soon as possible. A relatively effective approach is to set checkpoints in the process. When a defect is set to the "Repaired" or "Closed" state, the next mandatory checkpoint in the process should be defect analysis. Such a mechanism can promote prompt analysis work.
Who is responsible for defect analysis? The individual who causes the defect, or the entire team?
On the one hand, it is sometimes too much trouble and effort to convene the entire team for defect analysis. For example, if a team of eight members only analyzes the online defects in the later stage of the project and spends a mere 15 minutes on each defect, it would take a staggering 12,000 minutes, or 200 hours, to analyze 100 defects. In this case, the input and output are out of proportion.
On the other hand, the one who repairs a defect understands the defect best. However, this approach runs the risk of overly specific analysis, which reduces the effectiveness of defect analysis.
Our solution is: perform defect analysis in a team of two.
Have the employee who solved the defects be the leader, assisted by a partner. Working in pairs, coworkers can complement each other with their respective abilities. In addition, they will have to come to an agreement on the issue, which will increase the quality of the analysis. If necessary, the pair can decide to bring other colleagues into the process.
Regular Team Discussion
Periodic team discussions of important defects and their analysis results serve to accept the work of the analysis teams and facilitates the dissemination of knowledge among the team for mutual learning.
The purpose of defect analysis is to improve work performance and the key is to address "unknown unknowns". Obviously, not every defect merits an in-depth analysis. However, if we judge the need for defect analysis on a case by case bases, this will involve difficult decisions and the conclusions may be unreliable. Therefore, our approach is to analyze all defects and use a do-not-analyze list to rule out the defects that will not be analyzed. In other words, any defect that is not on the do-not-analyze list must be analyzed. The do not analyze list varies from team to team. So each team should maintain its own list. For example, a list may include:
This process is similar to panning for gold. You must specify what you don't need to avoid being overwhelmed by unimportant defects and get better at recognizing important defects. In fact, every defect analysis process will expand the do-not-analyze list and reduce the number of defects that require analysis in the future. This helps developers focus more on important issues and work more efficiently.
The key to defect analysis is to generate insights that are valuable and useful. There are many sophisticated methods for generating profound insights. The most typical one is the "Five Whys" method. In addition, famous tools such as the fishbone diagram also come in handy. Due to space constraints, this article will use only one example to illustrate how we generate deep insights in real-world defect analysis. We cannot introduce the different methods in detail.
The defect is as follows (this example is moderately abstracted without affecting the defect description):
A user's virtual device has certain affiliated resources. When the user deletes the device, its affiliated resources should be released. However, in a specific scenario, the affiliated resources are not released.
Here is the code:
void releaseResources (resoure_id){
if (failedOfHardwareResourceRelease(resource_id)){
writeLog("resource release failed");
}
}
The conversation regarding this defect went as follows:
"What is the cause?"
"We did not consider scenarios in which the resources were not released successfully during the requirement analysis phase."
"OK. Requirement analysis is the issue and an area of improvement. But more importantly, when was the last and the most direct opportunity to identify this problem?"
"When we wrote the code."
"Did we notice this issue when we wrote the code?"
"We did. That's why it was logged, but we didn't think carefully and figure out what to do with it. This shows that the responsibility for this piece of code was not clearly defined."
"Maybe we can add a rule to the programming specification to stipulate that, when an exception occurs, it should be logged and the details and the solution of the exception should also be clarified with the individual in charge. In the future, we will know there is an issue when an error is logged but no other actions are taken."
The most direct way to evaluate how thoroughly an analysis is conducted is to see whether the result is "actionable". If a result is not actionable, oftentimes it means the analysis dis not abstract the issue.
It is not easy to establish a learning-oriented organization. In addition to the mental model and methods discussed above, an organizational mechanism is usually the key to success. Here are some tips:
These tips are quite straightforward. However, with regard to intellectual assets, I still need to emphasize that the specific analysis results may be improvements to processes, programming habits, programming specifications, code review checklists, design capabilities, or the introduction of some new engineering practices, such as the requirement for instantiation. However, the results can be generally divided into two categories:
Both the introduction of instantiation practices and establishment of an automated testing mechanism can be considered short-term actions. Such specific short-term actions require defined accountability and schedules. In addition, to ensure they are put into practice, these actions must be managed as other work, such as requirement management.
These are practices that entail continuous attention, such as making a list of common issues used in code review or adopting certain design ideas, such as contractual design or defensive programming. Such rules require continuous attention and must be constantly maintained as part of the team's assets. If possible, such rules should be developed into tools as soon as possible to reduce the mental demands on employees and improve operability.
The more assets of this type a team maintains, the fewer defects it needs to analyze in the future. Of course, this is common among all assets.
The real world is complicated, so there is no universal approach that can solve all of the problems associated with software development. However, there are common mental models, rules, and ideas that can be useful. And in this article, we have discussed several tips focusing on learning through defect analysis.
With an appropriate approach and controllable investment of time, defect analysis can help an organization accumulate valuable skills and experience. What is learned in this process can prove of immense value later. Being busy is an excuse. Taking the time to fix one bug now, rather than waiting for it to cause problems in the future, will ultimately save you time.
Through defect analysis, we can produce the following outputs:
Most importantly, by eliminating various blind spots, we can build up our abilities, achieve smoother development, and advance toward the goal of zero defects.
Seven Suggestions for Efficient and High-Quality Code Review
Alibaba F(x) Team - March 9, 2022
Alibaba Cloud Community - February 14, 2022
Alibaba Clouder - July 15, 2020
Alibaba Clouder - September 7, 2020
Alibaba Clouder - September 18, 2020
Alibaba Clouder - September 18, 2020
Offline SDKs for visual production, such as image segmentation, video segmentation, and character recognition, based on deep learning technologies developed by Alibaba Cloud.
Learn MoreAlibaba Cloud (in partnership with Whale Cloud) helps telcos build an all-in-one telecommunication and digital lifestyle platform based on DingTalk.
Learn MoreThis technology can accurately detect virus mutations and shorten the duration of genetic analysis of suspected cases from hours to just 30 minutes, greatly reducing the analysis time.
Learn MoreBuild superapps and corresponding ecosystems on a full-stack platform
Learn More