Fuzzy system for Defect Prioritization
IntroIt is normally the case that a mix of words and phrases from general spoken language as well as domain-specific vocabulary is used to formulate rules that express knowledge and facts in a certain domain. While being quite expressive and easily understandable, this linguistic nature of expression is quite hard to model using boolean logic (which is oriented towards algebra of Finite Set Theory) where membership of a Set is determined based on whether a statement is True or False. In this particular case, an example of such a statement could be "if a software defect is severe and occuring frequently, then it should be fixed immediately.
In the example above, the italicized terms severe, frequently, and immediately cannot be sharply defined, although we humans can easily make sense of this type of information and use it in our decision-making process based on our judgement of the situation. These so-called "fuzzy notions" are diametrically opposite of terms such as occurs more than 5 times a day or within 48 hours which is what we ordinarily deal with in when we create subsets in mathematical terms. However, when we speak of the subset of severe or frequent defects in a given set of defects, it may not be easily possible to decide whether a defect is in that subset or otherwise.
Generally in these use cases a Yes/No response is applied to the problem, which leads to "information loss" since the "degrees" of severity, frequency, or priority are not taken into account. So, even though reality is expressed in quite imprecise terms since the time immemorium, the prevalent method has been to apply precise True/False statements to model it. In contrast, Fuzzy logic provides a mechanism to mimic human reasoning and is applied here to prioritize defects based on the fuzzy notions of severity and frequency of occurrence.
As mentioned above, the concept of Fuzzy logic is based on fuzzy set theory, which provides a mechanism for approximate reasoning when observations are expressed in linguistic terms. Fuzzy logic provides the mathematical objects that can model this vagueness by introducing the concept of partial degrees of membership in a set.
Target Domain - software defect prioritizationThe notion of severity and density of occurrence of a software defect in any stage of software development is quite imprecise and has a certain degree of impact on the service provided by a software. Determining severity depends on the judgement of a domain-expert or quality analyst. After a severity criterion is determined, the next important attribute assessed is the number of occurrences or reporting (affected users). Again, the perception of density of occurrence depends on the scale of the usage of a software, e.g., the impact of complaints from a few users in a large-scale application used by hundreds of thousands of users will be very different from an application that is relatively smaller in its scale.
Based on these afore-mentioned criteria, a defect fixing priority is ascertained, i.e., if it would be treated as a Priority 1 (P1) defect and applied as a hotfix or be assigned a medium priority and will be a part of the current release or making a call to fix it in a future release in cases of less severe ones.
Since Fuzzy logic formulates processes of approximate reasoning and uncertain ideas expressed by linguistic variables, it seems to be quite a good fit in the application area of software defect prioritization. For most systems, it could be quite an expense to fix all the defects in the near term, and indeed some defects might never be addressed. So, quality analysts must choose wisely among a set of competing defects that need to be fixed, especially in financially constrained environments like early-stage startups. A Fuzzy system could go a long way in the creation of such a decision model.
Advantage of using Fuzzy system in this domainThere are primarily 3 forces that are influencing the recent trends in software systems and methods:
- 1. Agility, which is the capability to adapt.
- 2. Necessity to deliver new products/features/services more frequently (nightly builds etc.,) and quickly.
- 3. Striving to be cost-effective on a continuous basis.
Software Quality assurance must continuously balance these forces. To that end, a continuous monitoring, assessment, and prioritization of software defects based on the formalized concepts of Fuzzy logic can power the policy decision model by codifying the knowledge of domain experts leading to massive process efficiencies.
Since Fuzzy Logic enables the capability to fit a degree of membership based on the necessities and variances of the domain, creating and maintaining such a decision model is both agile and cost-effective.
Identifying Linguistic variablesFuzzy System in the Software Quality assurance domain can use the ubiquitous linguistic concepts such as High, Medium, and Low in both Severity of defects as well as Occurrence in case of Inputs and Priority Scales like P1, P2, and P3 for the Outputs. These linguistic variables and domain concepts have been used to develop the Membership functions.
Application of using Fuzzy system in this domainSeverity of a software defect is the degree of the impact surface defined based on the type of
defect (e.g., a surprise bug in User Interface, an API defect or something severe like data loss) and the range of user roles that the bug is affecting.
If defect types were assigned a range of values based on their impact then they could be of 5 types:
- \(\text{0.10}\) = Cosmetic errors, Spelling mistakes etc.
- \(\text{0.25}\) = User Interface defects, surprising behavior in the User Interface based on some inputs or navigation etc.
- \(\text{0.50}\) = API issues, like crashes (or returning 500), race conditions, performance issues (both in User Interface and API response) etc.
- \(\text{0.75}\) = Feature missing, Blockers and Show stoppers.
- \(\text{1.00}\) = Data loss, Security violation etc.
The affected surface can be derived by getting the ratio of Roles affected to the Total number of Roles in a system.
So, $$\text{affected\_group}= \frac{\text{roles\_affected}}{\text{total\_number\_of\_roles}}$$ and $$\text{impact\_surface}= \text{affected group}\times\text{defect type}$$ where \(\text{defect\_type}\) could take values from the set \(\{0.10, 0.25, 0.5, 0.75, 1.00\}\) based on its type defined above.
So, \(\text{impact\_surface}\) will produce a value between 0 and 1 that is usually defined by a linguistic variable such as less severe, medium severity, and highly severe.
Now, the following member functions can be defined on the notion of impact surface.
$$\text{impact\_surface}_{low}(x) = \begin{cases} 1, &x <= 0.25 \\ (0.5-x)/0.25, &0.25 < x < 0.5 \\ 0, &x >= 0.5 \end{cases}$$
The rest of the member functions for \(\textit{medium}\) and \(\textit{high}\), \(\text{impact\_surface}\) are as follows:
$$\text{impact\_surface}_{medium}(x) = \begin{cases} 0, &x <= 0.25 \\ (x-0.25)/0.25, &0.25 < x <= 0.5 \\ (0.75-x)/0.25, &0.5 < x < 0.75 \\ 0, &x >= 0.75 \end{cases}$$
$$\text{impact\_surface}_{high}(x) = \begin{cases} 0, &x <= 0.5 \\ (x-0.5)/0.25, &0.5 < x < 0.75 \\ 1, &x >= 0.75 \end{cases}$$
The plot for the \(\text{impact\_surface}\) functions above is shown in Fig. 1 below.
Similarly, the occurrence of defects in production will also be taken into account as explained above, and defined in the following membership functions for \(\text{occurrences}\).
$$\text{occurrence}_{low}(x) = \begin{cases} 1, &x <= 2 \\ (3-x), &2 < x < 3 \\ 0, &x >= 3 \end{cases}$$
The rest of the member functions for \(\textit{medium}\) and \(\textit{high}\), \(\text{occurrences}\) are as follows:
$$\text{occurrence}_{medium}(x) = \begin{cases} 0, &x <= 2 \text{ or } x >= 5 \\ (x-2), &2 < x < 3 \\ (5-x), &4 < x < 5 \\ 1, &3 <= x <= 4 \end{cases}$$
$$\text{occurrence}_{high}(x) = \begin{cases} 0, &x <= 4 \\ (x-4), &4 < x < 5 \\ 1, &x >= 5 \end{cases}$$
Fig. 2 below shows the plot of membership functions of \(\text{occurrences}\) defined above.
PipelineThe Fuzzy system in the target domain of Software Quality is a Multiple Input Single Output system. The Pipeline of the Fuzzy system consists of the following parts:
1. Linguistic Variable inputs.
2. Fuzzification of the inputs into Fuzzy sets by using Membership Functions.
3. Creating Fuzzy IF-THEN rules based on the knowledge of the domain experts
4. Fuzzy Inference System that uses the Fuzzy Rules on the Fuzzy sets by using MIN function when it encounters an AND and a MAX function when it finds an OR.
5. After the Fuzzy Inference System maps a region in the Output based on the IF-THEN rules and Fuzzy output member, the output is then Defuzzified.
6. The Defuzzification is done using Center of Gravity (centroid) function to generate a Crisp output.
The diagram in Fig. 3 illustrates the Pipeline explained above diagrammatically.
Output of the Fuzzy system in this domain and IF-THEN Rules The Output of the Fuzzy System applied to the target domain is to ascertain a Priority of a defect. The Priority value will decide if the defect will be fixed as a HotFix in the Production, or in the Release from the Current Sprint (usually 10 working days), or will be put in Backlog for future consideration.
Accordingly, the domain concept of P1, P2, and P3 has been applied to create the following Membership functions for the Output.
$$\text{priority}_{P1}(x) = \begin{cases} 1, &x <= 5 \\ (10-x), &5 < x < 10 \\ 0, &x >= 10 \end{cases}$$
$$\text{priority}_{P2}(x) = \begin{cases} 0, &x <= 5 \text{ or } x >= 20 \\ (x-5)/5, &5 < x < 10 \\ (20-x)/5, &15 < x < 20 \\ 1, &10 <= x <= 15 \end{cases}$$
$$\text{priority}_{P3}(x) = \begin{cases} 0, &x <= 15 \\ (x-15)/5, &15 < x < 20 \\ 1, &x >= 20 \end{cases}$$
The Plot for the Output Membership functions is shown in Fig. 4 below.
IF-THEN Rules
IF \(\text{impact\_surface}\) is low and \(\text{occurrence}\) is low then \(\text{Priority}\) is P3
IF \(\text{impact\_surface}\) is low and \(\text{occurrence}\) is medium then \(\text{Priority}\) is P3
IF \(\text{impact\_surface}\) is low and \(\text{occurrence}\) is high then \(\text{Priority}\) is P3
IF \(\text{impact\_surface}\) is medium and \(\text{occurrence}\) is low then \(\text{Priority}\) is P3
IF \(\text{impact\_surface}\) is medium and \(\text{occurrence}\) is medium then \(\text{Priority}\) is P2
IF \(\text{impact\_surface}\) is medium and \(\text{occurrence}\) is high then \(\text{Priority}\) is P1
IF \(\text{impact\_surface}\) is high and \(\text{occurrence}\) is low then \(\text{Priority}\) is P2
IF \(\text{impact\_surface}\) is high and \(\text{occurrence}\) is medium then \(\text{Priority}\) is P1
IF \(\text{impact\_surface}\) is high and \(\text{occurrence}\) is high then \(\text{Priority}\) is P1
The output will be Defuzzified using a Center of Gravity function to get a Crisp Output.