Overview
In program debugging, finding a failure is only the first step; what about locating and correcting the fault? Can we automate the whole process? The AutoFix project targets the automatic generation and validation of fixes for software faults. As program faults are discrepancies between the implementation and the specification, a fix may involve changes to the implementation, the specification, or even both. In view of this, AutoFix exploits dynamic program analysis techniques to understand the fault cause, synthesize potential fixes to either the implementation or the contracts, and validate the synthesized fixes. In our experimental evaluations, AutoFix was able to automatically propose high quality fixes to real world program faults.AutoFix is a joint project between the Chair of Software Engineering of ETH Zurich and the Software Engineering Chair of Saarland University.
Fixes to the implementation
Figure 1.a shows a faulty routine in classTWO_WAY_SORTED_SET
.
The routine move_item
takes an argument v
, which
must be a non-Void
element stored in the set, and moves it to the position to the left of the
internal cursor (attribute index
).
To
this end, the routine saves the cursor position into a local variable idx
, moves index
to where v
occurs in the set, removes this occurrence of
v
, restores index
to its initial position, and
inserts v
to the left of
the cursor. The implementation shown fails with a violation of go_i_th(idx)
's precondition
whenever the internal cursor initially points to the last valid cursor
position count + 1
(see
Figure 2). In fact, the removal of one element in the set invalidates the
position stored in idx
,
expressible as count + 2
with respect to the value of count
after removal. A closer look reveals that the routine is incorrect
even in cases when this precondition violation does not occur: if v
appears to the left of the
internal cursor initially, the sequence of operations moves v
to an incorrect "off-by-one" position. Figure 1.b shows a fix suggested by
AutoFix; the fix corrects the routine even for the cases when it does not
trigger a precondition violation invoking go_i_th(idx)
. move_item
(v: G) |
move_item (v: G)
|
|
(a) Faulty routine | (b) Fixed routine | |
Figure
1. A real fault from TWO_WAY_SORTED_SET.move_item |
Figure 2. The effect of calling
remove
in move_item
when index = count + 1
and
after
hold
initially: the following invocation of go_i_th(idx)
triggers a precondition violation.Figure
3 depicts the steps AutoFix takes to propose fixes to the
implementation. Particularly, AutoFix runs AutoTest to
automatically generate passing and failing test cases for the
input Eiffel classes, extract a set of expressions from the text
of the classes' routines, and compute the expression dependence
score edep between expressions—measuring their syntactic
similarity—and the control dependence score cdep between program
locations—measuring how close they are on the control-flow graph.
It then computes the dynamic score dyn between expressions,
measuring how more often an expression is mentioned in failing
rather than passing test cases, and combines the three scores
(edep, cdep, dyn) into the score fixme, which determines a global
ranking of expressions. In the end, AutoFix enumerates possible
fixing actions for the expressions with highest fixme score, and
generates candidate fixes by injecting the fix actions into the
faulty routine. The candidate fixes that pass all the regression
test suite are considered valid.
Figure 3: How AutoFix works.
We applied AutoFix to over 200 faults in four code bases, including both open-source software developed by professionals and student projects of various quality. AutoFix successfully fixed 86 (or 42%) of the faults; inspection shows that 51 of these fixes are genuine corrections of quality comparable to those competent programmers would write. More information about the experimental evaluation of AutoFix can be found in our publication in TSE 2014.
Fixes to the specification
While most debugging techniques focus on patching implementations, there are numerous bugs whose most appropriate corrections consist in fixing the specification to prevent invalid executions---such as to define the correct input domain of a function. We present here a fully automatic technique that fixes bugs by proposing changes to contracts (simple executable specification elements such as pre- and postconditions). The technique relies on dynamic analysis to understand the source of buggy behavior, to infer changes to the contracts that emend the bugs, and to validate the changes against general usage. We have implemented the technique in a tool called AutoFix, which works on programs written in Eiffel, and evaluated it on 44 bugs found in standard data-structure libraries. Manual analysis by human programmers found that AutoFix suggested repairs that are directly deployable for 25% of the faults; in most cases, these contract repairs were preferred over fixes for the same bugs that change the implementation.
Let us briefly demonstrate how AutoFix propose fixes to contracts using an example. The example targets a bug of routine (method) duplicate in class CIRCULAR,which is the standard Eiffel library implementation of circular array-based lists. To understand the bug, Figure 4 illustrates a few details of CIRCULAR's API. Lists are numbered from index 1 to index count (an attribute denoting the list length), and include an internal cursor that may point to any element of the list. Routine duplicate takes a single integer argument n, which denotes the number of elements to be copied; called on a list object list, it returns a new instance of CIRCULAR with at most n elements copied from list starting from the position pointed to by cursor. Since we are dealing with circular lists, the copy wraps over to the first element. For example, calling duplicate (3) on the list in Figure 5 returns a fresh list with elements <C, D, A> in this order.
1 class
CIRCULAR [G]
2 3 make (m: INTEGER) 4 require m >= 1 5 do ... end 6 7 duplicate (n: INTEGER): CIRCULAR [G] 8 do 9 create Result.make (count) 10 ... 11 end 12 13 count: INTEGER -- Length of list |
|
|
Figure 4. Some implementation details of CIRCULAR. | |
Figure 5. A circular list of class CIRCULAR: the internal cursor points to the element C at index 3. |
make
(m: INTEGER) require m ≥ 1 duplicate (n: INTEGER):CIRCULAR [G] do if count > 0 then create Result.make (count) else create Result.make (1) end ... |
make
(m: INTEGER) require m ≥ 1 duplicate (n: INTEGER):CIRCULAR [G] require count > 0 do create Result.make (count) ... |
make (m: INTEGER) require m ≥ 0 duplicate (n: INTEGER): CIRCULAR [G] do create Result.make (count) ... |
(a) Patching the implementation. | (b) Strengthening the specification. | (c) Weakening the specification. |
Figure 6: Three different fixes for the bug of Figure 4. |
How should we fix this bug? Figure 6 shows three different possible repairs, all of which we can generate completely automatically. An obvious choice is patching duplicate’s implementation as shown in Figure 6a: if count is 0 when duplicate is invoked, allocate Result with space for one element; this satisfies make’s precondition in all cases. Our AutoFix tool targets fixes of implementations and in fact suggests the patch in Figure 6a. The fix that changes the implementation is acceptable, since it makes duplicate run correctly, but it is not entirely satisfactory: CIRCULAR’s implementation looks perfectly adequate, whereas the ultimate source of failure seems to be incorrect or inadequate specification. A straightforward fix is then adding a precondition to duplicate that only allows calling it on non-empty lists. Figure 6b shows such a fix, which strengthens duplicate’s precondition thus invalidating the test case exposing the bug. The strengthening fix has the advantage of being textually simpler than the implementation fix, and hence also probably simpler for programmers to understand. However, both fixes in Figures 6a and 6b are partial, in that they remove the source of faulty behavior in duplicate but they do not prevent similar faults—deriving from calling make with m = 0—from happening. A more critical issue with the specification-strengthening fix in Figure 6b is that it may break clients of CIRCULAR that rely on the previous weaker precondition. There are cases—such as when computing the maximum of an empty list—where strengthening produces the most appropriate fixes; in the running example, however, strengthening arguably is not the optimal strategy.
A look at make’s implementation (not shown in Figure 4) would reveal that the creation procedure’s precondition m ≥ 1 is unnecessarily restrictive, since the routine body works as expected also when executed with m = 0. This suggests a fix that weakens make’s precondition as shown in Figure 6c. This is arguably the most appropriate correction to the bug of duplicate: it is very simple, it fixes the specific bug as well as similar ones originating in creating an empty list, and it does not invalidate any clients of CIRCULAR’s API. The AutoFix tool described in this paper generates both specification fixes in Figures 6b and 6c but ranks the weakening fix higher than the strengthening one. More generally, AutoFix outputs specification-strengthening fixes only when they do not introduce bugs in available tests, and it always prefers the least restrictive fixes among those that are applicable.
AutoFix works completely automatically: its only input is an Eiffel program annotated with simple contracts (pre- and postconditions and class invariants) which constitute its specification. After going through the steps described in the rest of this section, SpecFix’s final output is a list of fix suggestions for the bugs in the input program. Figure 7 gives an overview of the components of the AutoFix technique. AutoFix is based on dynamic analysis, and hence it characterizes correct and incorrect behavior by means of passing and failing test cases. To provide full automation, we use the random testing framework AutoTest to generate the tests used by AutoFix. The core of the fix generation algorithm applies two complementary strategies: weaken (i.e., relax) a violated contract if it is needlessly restrictive; or strengthen an existing contract to rule out failure-inducing inputs. AutoFix produces candidate fixes using both strategies, possibly in combination. To determine whether the weaker or stronger contracts remove all faulty behavior in the program, AutoFix runs candidate fixes through a validation phase based on all available tests. To avoid overfitting, some tests are generated initially but used only in the validation phase (and not directly to generate fixes). If multiple fixes for the same fault survive the validation phase, AutoFix outputs them to the user ordered according to the strength of their new contracts: weaker contracts are more widely applicable, and hence are ranked higher than more restrictive stronger contracts.
Figure 7. An overview of how AutoFix works. |
We performed a preliminary evaluation of the behavior of AutoFix by applying it to 44 bugs of production software. The overall goal of the evaluation is corroborating the expectation that, for bugs whose “most appropriate” correction is fixing the specification, AutoFix can produce repair suggestions of good quality. A more detailed evaluation taking into account aspects such as robustness and readability of the produced fixes belongs to future work.
All the experiments ran on a Windows 7 machine with a 2.6 GHz Intel 4-core CPU and 16 GB of memory. We selected 10 of the most widely used data-structure classes of the EiffelBase (rev. 92914) and Gobo (rev. 91005) libraries—the two major Eiffel standard libraries. We ran AutoTest for one hour on each of the 10 classes. This automatic testing session found 44 unique faults consisting of pre- or postcondition violations. We ran AutoFix on each of these faults individually, using half of the test cases (generated for each fault in the one-hour session) to generate the fixes and the other half in the validation phase. AutoFix produced valid fixes for 42 faults and proper fixes (i.e. fixes that correct the fault without introducing new faults) for 11. The average figures per fault are: 106.4 minutes of testing time and 7.2 minutes of fixing time (minimum: 4.1 minutes, maximum: 30 minutes, median 6.2 minutes). The testing time dominates since AutoTest operates randomly and thus generates many test cases that will not be used (such as passing tests of routines without faults).
User's manual
The user's manual of AutoFix is available here.
Downloads
AutoFix is part of the Eve integrated verification environment, the research branch of EiffelStudio.
- Instructions to install and compile Eve from source
- Binary distributions of EVE
- Virtual machine: an Ubuntu virtual machine with Eiffel development environment and EVE executable is available here. You will need VMWare Player (6.0.2 build-1744117 or later) to launch the virtual machine. (User account: AutoFix-Demo; Password: autofix)
- Screencast: a short video that demonstrates the usage of AutoFix is available here.
Funding
The following subjects have contributed to funding the AutoFix project:- Hasler-Stiftung (Grant no. 2327)
- Deutsche Forschungsgemeinschaft (Ze509/4-1)
- Swiss National Science Foundation (Project 200021-134976)
Publications
- Automated Program Repair in an Integrated Development Environment.
Yu Pei, Carlo A. Furia, Martin Nordio, and Bertrand Meyer. ICSE 2015. - Automated Fixing of Programs
with Contracts.
Yu Pei, Carlo A. Furia, Martin Nordio, Yi Wei, Andreas Zeller, and Bertrand Meyer. IEEE Transactions on Software Engineering, 2014. - Automatic Program Repair by
Fixing Contracts.
Yu Pei, Carlo A. Furia, Martin Nordio. FASE 2014. - Code-based Automated Program
Fixing.
Yu Pei, Yi Wei, Carlo A. Furia, Martin Nordio, Bertrand Meyer. ASE 2011. - Automated Fixing of Programs
with Contracts.
Yi Wei, Yu Pei, Carlo A. Furia, Lucas S. Silva, Stefan Buchholz, Bertrand Meyer and Andreas Zeller. ISSTA 2010.