Defining and implementing refactorings is a nontrivial task since it is difficult to define preconditions to guarantee that the transformation preserves the program behavior. Therefore, refactoring engines may apply incorrect transformations in which the resulting program does not compile, preserve behavior, or follow the refactoring definitions. These engines may also prevent correct transformations due to overly strong preconditions. We find that 84% of the test suites of Eclipse and JRRT are concerned to detect those kinds of bugs. However, the engines still have them. Researchers have proposed a number of techniques for testing refactoring engines. Nevertheless, they may have limitations related to the bug type, program generation, time consumption, and number of refactoring engines necessary to evaluate the implementations. We propose and implement a technique to scale testing of refactoring engines. We improve expressiveness of a program generator and use a technique to skip some test inputs to improve performance.
Moreover, we propose new oracles to detect behavioral changes using change impact analysis, overly strong preconditions by disabling preconditions, and transformation issues. We evaluate our technique in 28 refactoring implementations of Java (Eclipse and JRRT) and C (Eclipse) and find 119 bugs. The technique reduces the time in 96% using skips while missing only 6% of the bugs.
Additionally, it finds the first failure in general in a few seconds using skips. Finally, we evaluate our proposed technique by using other test inputs, such as the input programs of Eclipse and JRRT refactoring test suites. We find 31 bugs not detected by the developers.