Use Historical Data for More Accurate Software Project / Release / Sprint Etimation

Software cost estimation is difficult  and still not a genuine profession yet. Many organizations still use the ‘expert estimate’ in some form, which are basically opinions without a solid basis of data. Also, the practice of planning poker – assigning story points to backlog items – is in fact a form of an Expert estimate. Story points are very useful on a team level, but not suitable for high-level project planning and forecasting. Unfortunately, human estimates are very likely to be optimistic, and many projects, agile and traditional, start with unrealistic estimates: team too small, duration to short and effort/cost estimate too low. Check for instance the research of Daniel Kahneman (,_Fast_and_Slow).

Steve McConnell’s great book ‘Software Estimation, Demystifying the black art’ shows us that (optimistic) expert estimates in fact are likely to result in non-linear overruns of cost and schedule. Some reasons for this are the extra management attention, stress in the team (more defects, lower maintainability) and ‘adding people to a late project only makes it later’. Pessimistic estimates result in linear extra costs due to Parkinson’s law (people will find some way to use extra hours when the work could have been completed earlier). Parametric estimates, based on functional size, relevant historical data and parametric models result in a more realistic estimate, and therefore in no or limited extra costs. The next figure shows these ideas.


Even in the world of agile software development, overruns are very common, although disguised by the fact that the scope of the functionality delivered is variable, while the cost and schedule are rather fixed. However, overruns become evident when the desired functionality, or minimum viable product, is not ready in time and extra sprints are required to fulfill the requirements regarding minimum functionality. Great news for suppliers, as the risk they ran in fixed price projects is gone, but not so good news for the customer organizations.

As most companies in the IT industry have not reached a higher-level maturity in their software estimation processes, necessary to be able to use parametric estimation, overruns are still very common. In the following model, constructed by Dan Galorath, it becomes clear that organizations need to reach at least maturity level 2 to be able to use parametric estimation.

The International Software Benchmarking Standards Group (ISBSG) helps organizations to improve their estimation process maturity by providing industry data of completed projects, releases and sprints. The current version of the database contains over 8000 projects, which can be obtained in an Excel spreadsheet. It helps organizations that don’t have their own historical data available to get some idea about the productivity that is average for their type of project. When the size is known, the ISBSG data can be used to filter the historical data in such a way that the relevant similar projects are listed and the most likely productivity can be chosen based on that.

Check the site for a free sample of the data available. As the data is collected from  industry, not all fields are filled in, but the main fields like effort, duration, size and defects are usually available.