Pantheon Integrated Solutions
Leveraging R to extract insights in Earned Value Management
by: Evan Hubener
In project management, Cost Performance Reports (CPRs) are one of the tools that can be utilized to track the performance of a project throughout its lifecycle. While CPR contents can vary across industries, typically they contain Earned Value Management (EVM) metrics, presented for each of the project’s work activities, which are tied to cost and schedule performance; additional information provided in the CPR may be specific to the project contract, like the target price, management reserve, or overhead costs. For Department of Defense programs, one purpose of the CPR is to report and keep managers informed about their cost and schedule objectives, but for analysts and cost estimators, CPRs are a valuable source of data that can be used to inform cost and schedule estimates for both current and future projects.
Tracking performance over time or using historical data to estimate cost or schedule performance for comparable projects is not easy to do, given the ridged structure of a CPR. It requires an entirely different data structure, but provided that the CPRs are organized in a stable format, a programming language can be used to develop a data processing system. The primary advantages of utilizing a programing language for this type of task are twofold – speed and reliability, and one way to achieve this is by utilizing the R programming language.
R is an open-source programming language, developed in the early 1990s, primarily as a tool for statisticians. Since then, there have been many innovations in the packages available for R, as well as the release of Integrated Development Environments (IDEs), like R Studio, that have made the language quite popular among data users. In addition, the language allows users to create custom functions, and these can be combined with functions from existing packages to process data efficiently and accurately.
The basics of cost reporting
As previously mentioned, general contract and project data are in the CPR header or footer, but below the header, the individual project elements are provided, and they typically are presented in a table format, with an example shown in table 1.
This type of layout, while easy to read, is not easy to analyze, especially as the table may be presented as part of a larger program or a collection of projects. These may be split across multiple vendors, each with different inputs; this can result in tens of pages worth of data and many additional elements that need to be captured. For an analysis to be performed, the data must first be read and processed. Then, once the data is tidy, it can be loaded into a database, and metrics can be created.
Utilizing R’s functions to read and process data
If CPRs are provided in a spreadsheet, the data can be read using one of the functions from R’s readxl package. This package is a collection of functions that can be used to read both .xls and .xlsx formats. File names, sheet names, cell ranges, and column names can all be passed as arguments to ensure that the correct data is read into R.
The apply functions belong to the base-R language and are similar in functionality to for-loops, but they are optimized to work with vectors, and in addition, they are generally faster and less code intensive. They are also flexible and can be used in custom functions that enable R to read along many spreadsheet tabs to gather the necessary data.
Once the data are loaded into the R work environment, the power of the dplyr package can be deployed to wrangle and tidy the data. This package has dozens of useful functions, but the core functions are familiar to anyone who has worked with data: select(), filter(), arrange(), mutate(), and summarize(). These functions help programmers extract or calculate exactly the data that they need. Depending on how the data are reported, validation procedures can be built into the code, ensuring that the totals equal the sum of the parts; these internal checks can be used to inform project managers if there are discrepancies in the reporting.
Communicating the results
Once the data for a given CPR has been processed, it can be loaded into any database of choice. From there, reporting metrics can be created, which can show current or to-date performance, historical performance for similar projects, or highlight areas of concern and performance changes across time. Examples are shown in figures 1 and 2, which were created with the ggplot2 package using fictional data. Figure 1 presents monthly hours over time, and figure 2 tracks cumulative project progress across time, presenting the Earned Value, Planned Value, and Actual Cost, along with standard CPI and SPI performance indicators.
While there are many ways to build a data infrastructure, by combining flexibility, versatility, and efficiency, the R programming language is an excellent tool for making sense of Cost Performance Reports. From custom functions, to powerful statistical tools and sharp graphics, the language is a tool that helps analysts process data and communicate results, providing program and project managers with timely and actionable insights.