Abstract Title

Development and Implementation of a Very Efficient and Scalable Program Slicing Approach

Abstract

Program slicing is a commonly used approach for understanding and detecting the impact of changes to software. The idea is quite simple, given a variable and the location of that variable in a program, tell me what other parts of the program are affected by this variable. The approach has been used successfully for many years for various software maintenance tasks. For example, slicing was used to help address the Y2K problem by identifying parts of a program that could be impacted by changes on date fields. The concept of program slicing was originally identified by Weiser as a debugging aid. He defined the slice as an executable program that preserved the behavior of the original program. Weiser’s algorithm traces the data and control dependencies by solving data-flow equations for determining the direct and indirect relevant variables and statements. Since that time a number of different slicing techniques and tools have been proposed and implemented. These techniques are broadly distinguished according to the type of slices such as: static versus dynamic, closure versus executable, inter-procedural versus intra-procedural, and forward versus backward.

The calculation of a program slice is, with few exceptions, based on the notion of a Program Dependence Graph (PDG) or one of its variants, e.g., a System Dependence Graph (SDG). Unfortunately, building the PDG/SDG is quite costly in terms of computational time and space. As such slicing approaches generally do not scale well and while there are some (costly) workarounds, generating slices for a very large system can often take days of computing time. Additionally, many tools are strictly limited to an upper bound on the size of the program they can slice.

The work here addresses this limitation by eliminating the time and effort needed to build the entire PDG. In short it combines a text-based approach with a lightweight static analysis infrastructure that only computes dependence information as needed (aka on-the-fly) while computing the slice for each variable in the program. The slicing process is performed using the srcML format for source code that was developed here at Kent State University. The srcML format provides direct access to abstract syntactic information to support static analysis. While this lightweight approach will typically never match the accuracy of generating a full PDG/SDG and doing full pointer analysis, etc. it provides a fairly accurate picture of a program slice in an extremely short time comparatively for large systems (i.e., we found up to four orders of magnitude increase in speed for large systems). These techniques have been realized in a software tool called srcSlice that produces program slices in the context of every variable in an entire software system. The tool works for both the C and C++ programming languages and is highly scalable (e.g., can produce slices of a 2 million line program in minutes).

Specifically, my research has been to enhance the slicing capabilities and expand the accuracy of the tool and approach. The srcSlice tool currently does not have any type resolution capabilities and no functionality to determine the type of variables. In program analysis, the final type of a variable is crucial for accuracy. The challenge has been to develop a separate component of the tool that provides more complete type resolution for every variable in the program, along with other details of that variable. This involved the development of algorithms to efficiently extract the type information and integrate that information into the computation of the slice(s).

This very fast and scalable, yet slightly less accurate, slicing approach is extremely useful for a number of reasons. Developers will have a very low cost and practical means to estimate the impact of a change within minutes versus days. This is very important for planning the implementation of new features and understanding how a change is related to other parts of the system. It will also provide an inexpensive test to determine if a full deep more expensive analysis of the system is warranted. Lastly, we feel a fast slicing approach could open up new avenues of research in metrics and the mining of histories based on slicing. That is, slicing can now be conducted on very large systems and on entire version histories in very practical time frames. This opens the door to a number of experiments and empirical investigations previously too costly to undertake.

Research Category

Computer Science/Mathematics

Primary Author's Major

Computer Science

Mentor #1 Information

Dr. Jonathan Maletic

Presentation Format

Poster

Start Date

11-3-2015 1:00 PM

End Date

11-3-2015 5:00 PM

Research Area

Computer Sciences | Physical Sciences and Mathematics

This document is currently not available here.

Share

COinS
 
Mar 11th, 1:00 PM Mar 11th, 5:00 PM

Development and Implementation of a Very Efficient and Scalable Program Slicing Approach

Program slicing is a commonly used approach for understanding and detecting the impact of changes to software. The idea is quite simple, given a variable and the location of that variable in a program, tell me what other parts of the program are affected by this variable. The approach has been used successfully for many years for various software maintenance tasks. For example, slicing was used to help address the Y2K problem by identifying parts of a program that could be impacted by changes on date fields. The concept of program slicing was originally identified by Weiser as a debugging aid. He defined the slice as an executable program that preserved the behavior of the original program. Weiser’s algorithm traces the data and control dependencies by solving data-flow equations for determining the direct and indirect relevant variables and statements. Since that time a number of different slicing techniques and tools have been proposed and implemented. These techniques are broadly distinguished according to the type of slices such as: static versus dynamic, closure versus executable, inter-procedural versus intra-procedural, and forward versus backward.

The calculation of a program slice is, with few exceptions, based on the notion of a Program Dependence Graph (PDG) or one of its variants, e.g., a System Dependence Graph (SDG). Unfortunately, building the PDG/SDG is quite costly in terms of computational time and space. As such slicing approaches generally do not scale well and while there are some (costly) workarounds, generating slices for a very large system can often take days of computing time. Additionally, many tools are strictly limited to an upper bound on the size of the program they can slice.

The work here addresses this limitation by eliminating the time and effort needed to build the entire PDG. In short it combines a text-based approach with a lightweight static analysis infrastructure that only computes dependence information as needed (aka on-the-fly) while computing the slice for each variable in the program. The slicing process is performed using the srcML format for source code that was developed here at Kent State University. The srcML format provides direct access to abstract syntactic information to support static analysis. While this lightweight approach will typically never match the accuracy of generating a full PDG/SDG and doing full pointer analysis, etc. it provides a fairly accurate picture of a program slice in an extremely short time comparatively for large systems (i.e., we found up to four orders of magnitude increase in speed for large systems). These techniques have been realized in a software tool called srcSlice that produces program slices in the context of every variable in an entire software system. The tool works for both the C and C++ programming languages and is highly scalable (e.g., can produce slices of a 2 million line program in minutes).

Specifically, my research has been to enhance the slicing capabilities and expand the accuracy of the tool and approach. The srcSlice tool currently does not have any type resolution capabilities and no functionality to determine the type of variables. In program analysis, the final type of a variable is crucial for accuracy. The challenge has been to develop a separate component of the tool that provides more complete type resolution for every variable in the program, along with other details of that variable. This involved the development of algorithms to efficiently extract the type information and integrate that information into the computation of the slice(s).

This very fast and scalable, yet slightly less accurate, slicing approach is extremely useful for a number of reasons. Developers will have a very low cost and practical means to estimate the impact of a change within minutes versus days. This is very important for planning the implementation of new features and understanding how a change is related to other parts of the system. It will also provide an inexpensive test to determine if a full deep more expensive analysis of the system is warranted. Lastly, we feel a fast slicing approach could open up new avenues of research in metrics and the mining of histories based on slicing. That is, slicing can now be conducted on very large systems and on entire version histories in very practical time frames. This opens the door to a number of experiments and empirical investigations previously too costly to undertake.