Software measurement

Defining Software Measures

How to accurately define software measures (according to the ISO/IEC/IEEE 15939:2017)

9 min readJun 17, 2022

OK, all. Prepare for a dense article because I’m about to summarize everything you should know about software measurement definitions. I’ll tell you that most of this content was taken from the ISO/IEC/IEEE 15939:2017 “Systems and software engineering — Measurement process”. But truth be told, most of this standard came from the Practical Software Measurement book and the GQM, so I guess it’s all good. :-)

This standard covers the software measurement process and the software measurement definitions. However, it really doesn’t do a lot for what measures you should choose for your organization. I think that practices such as KPIs and OKRs cover it very well, and I intend to write about these very soon. For now, I want to talk about how to define a measure because many organizations end up measuring inaccurately due to poor measure definitions. I want to emphasize that when I talk about accuracy, I’m not talking about being extremely precise but having a good idea of how imprecise your numbers are.

Another common mistake that I see is companies that at different moments measure different things thinking they are the same (like comparing story-point velocities of different teams or measuring the performance executing a process that sometimes doesn’t include significant activities).

Without extending much more, I bring you a summary of what I think are the most important parts of this excellent standard, focusing on what can help you when defining and documenting measures.

Definitions

Derived measure: measure that is defined as a function of two or more values of base measures.
Base measure: measure defined in terms of an attribute and the method for quantifying it.
Indicator: measure that provides an estimate or evaluation of specified attributes derived from a model with respect to defined information needs.
Measurement method: logical sequence of operations, described generically, used in quantifying an attribute with respect to a specified scale. The type of measurement method depends on the nature of the operations used to quantify an attribute. Two types can be distinguished:
subjective: quantification involving human judgment; and
objective: quantification based on numerical rules.
Scale: ordered set of values, continuous or discrete, or a set of categories to which the attribute is mapped.

Note that ISO doesn’t use the term “metric”. Each source material uses metric and measure with different meanings, so you must pay attention to the definitions. Always try to map to the concepts of base and derived measures here, and it will be much easier to understand other texts.

Scale

The type of scale depends on the nature of the relationship between values on the scale and the measurement method. For example, subjective measurement methods usually support only ordinal or nominal scales. Four types of scale are commonly defined:

Nominal: the measurement values are categorical. For example, the classification of defects by their type does not imply order among the categories. Operations supported: Counting values.
Ordinal/Likert: the measurement values are rankings. For example, the assignment of defects to a severity level is a ranking. Many people try to do math with this scale, attributing numbers to the values. You have probably seen this with “ratings”. Imagine you are researching how much people like a particular wine. They taste it and rank it among “Strongly dislike”, “Dislike”, “Indifferent”, “Like”, and “Strongly like”. Then some magician turns these into numbers from 1 to 5 and calculates an average, reaching the conclusion that everyone liked it 3.5 in 5. Well, first of all, there’s nothing to say that the distance between any 2 of these values is the same as any other 2. That’s for another scale type (Interval). Second, having most people liking your product around the average is very different from having a group of lovers and a group of haters. Operations supported: Counting values and sorting values.
Interval: the measurement values have equal distances corresponding to equal quantities of the attribute. For example, cyclomatic complexity has a minimum value of one, but each increment represents an additional path. The value of zero is not possible. Operations supported: Counting values, sorting values, and since it has equidistant intervals, you can also perform addition and subtraction.
Ratio: the measurement values have equal distances corresponding to equal quantities of the attribute where the value of zero corresponds to none of the attribute. For example, the size in terms of the number of requirements is a ratio scale because the value of zero corresponds to no requirements and each additional requirement defined represents an equal incremental quantity. Operations supported: Counting values, sorting values, addition, subtraction, division, and multiplication.

Criteria for selecting measures

Many different combinations of base measures, derived measures, and indicators may be selected to address a specific information need. You can measure how much of the software was built using story points, use case points, function points, lines of code, etc. This is just one example of how many different measures you can employ for one single information need. When deciding which one you want to employ, you should consider some criteria such as:

relevance to the prioritized information needs;
feasibility of collecting the data in the organizational unit;
ease of data collection;
extent of intrusion and disruption of staff activities;
availability of appropriate tools;
protection of privacy;
potential resistance from data provider(s);
number of potentially relevant indicators supported by the base measure;
evidence (internal or external to the organizational unit) as to the measure’s fitness for purpose or information need, and its utility; and
The costs of collecting, managing, and analyzing the data at all levels should also be considered. Costs include the following:
Measures utilization costs: associated with each measure are the costs of collecting data, automating the calculation of the measure values (when possible), analyzing the data, interpreting the analysis results, and communicating the information products;
Process Change Costs: the set of measures may imply a change in the development process, for example, through the need for new data acquisition;
Special Equipment: system, hardware, or software tools may have to be located, evaluated, purchased, adapted or developed to implement the measures; and
Training: the quality management/control organization or the entire development team may need training in the use of the measures and data collection procedures. If the implementation of measures causes changes in the development process, the changes needs to be communicated to the staff.

Measurement method

I know I said I was going to focus on defining the measures, not actually measuring as a process or method. However, how you measure will impact what you are measuring and vice-versa. As I said before, there are many alternatives to fulfill an information need. It’s important to understand which measures will make it easier to ensure some key aspects of your measurement process.

Accuracy of a measurement procedure

Accuracy is the extent to which the procedure implementing a base measure conforms to the intended measurement method. An accurate procedure produces results similar to the true (or intended) value of the base measure.

Measurement procedures implement the measurement methods described for base measures. These procedures may produce results different from what was intended due to problems such as a systematic error in the procedure, random error inherent in the underlying measurement method, and poor execution of the procedure.

The actual human procedure or automated implementation of a base measure may depart from the measure’s definition. For example, a static analysis tool may implement a counting algorithm differently from how it was originally described in the literature. Discrepancies also may be due to ambiguous definitions of measurement methods, scales, units, etc. Even good measurement procedures may be inconsistently applied, resulting in the loss of data or the introduction of erroneous data.

Subjective methods depend on human interpretation. The formulation of questionnaire items, for example, may leave respondents uncertain about the question and even bias the responses. Clear and concise instructions help to increase the accuracy of surveys.

Accuracy can be enhanced by ensuring that, for example:

the extent of missing data is within specified thresholds;
the number of flagged inconsistencies in data entry are within specified thresholds;
the number of missed measurement opportunities are within specified thresholds (e.g., the number of inspections for which no data were collected);
all base measures are well‐defined and those definitions are communicated to data providers. Poorly defined measures tend to yield inaccurate data. The repeatability and reproducibility of the underlying measurement method (see below) may also limit the accuracy achievable by a measurement procedure.

Repeatability of a measurement method

Repeatability is the degree to which the repeated use of the base measure in the same Organizational Unit following the same measurement method under the same conditions (e.g., tools, individuals performing the measurement) produces results that can be accepted as being identical. Subjective measurement methods tend to experience lower repeatability than objective methods. Random measurement error reduces repeatability.

Reproducibility of a measurement method

Reproducibility is the degree to which the repeated use of the base measure in the same Organizational Unit following the same measurement method under different conditions (e.g., tools, individuals performing the measurement) produces results that can be accepted as being identical. Subjective measurement methods tend to experience lower reproducibility than objective methods. Random measurement error reduces reproducibility.

Establishing a measure

If a picture speaks a thousand words, an example may speak a couple of hundreds at the very least. This is a good example of how to document a measure.

Information Need: Estimate productivity of future project

Measurable Concept: Project productivity

Relevant Entities:

Requirements implemented by past projects
Effort expended by past projects

Attributes:

User Stories (requirements)
Worked days (effort)

Base Measures:

Project X Requirements
Project X Days of Effort

Measurement Method:

Acceptance criteria items in all user stories
Add the worked days of all people for Project X

Type of Measurement Method:

Objective
Objective

Scale:

Integers from zero to infinity
Real numbers from zero to infinity

Type of Scale:

Ratio
Ratio

Unit of Measurement:

Criteria
Day

Derived Measure: Project X Productivity

Measurement Function: Divide Project X Requirements Implemented by Project X Days of Effort

Indicator: Average productivity

Model: Compute mean and standard deviation of all project productivity values.

Decision Criteria: Computed confidence limits based on the standard deviation indicate the likelihood that an actual result close to the average productivity will be achieved. Very wide confidence limits suggest a potentially large departure and the need for contingency planning to deal with this outcome.

As you can see, choosing which measure to employ and documenting it may be pretty straightforward once you know the aspects that need to be considered and the attributes that you need to define. This will help tremendously with interpreting it correctly and ensuring consistency over time across different organizational units, departments, and teams. I’d say this is the easy part of software measurement. Deciding WHAT to measure is the big problem.

I’ve written a nice article about unexpected negative outcomes of measurement recently that you will want to take a look at. It’s intended to help to avoid pitfalls such as “what you should not measure” but does not provide a lot of insights into what you should measure. Stay tuned for more articles on this matter, as I intend to write about software measurement and organizational alignment soon.

If you like this story, hit the clapping hands at the end so I know what you want to read about.
I don’t make a dime with the blog. If you want to support the creation of more content, share the blog with your coworkers and follow it to be notified of new stories!

Cheers!