On Static Code Analysis

Introduction

For a long time I thought that the static code analysis for a Java project was a waste of time. We know how to code correctly and nicely, we have the compiler, we have the unit tests, integration tests etc. why need to bother with something else?

When I started to get interested in this domain, many people in my team kept telling me about Sonar. I looked into it and I found it interesting how this tool manages to put together all sorts of aspects of the code quality. And then I dug deeper to see how the guys are actually using it. Because it’s cool to see the numbers and charts, but it’s even better if you act on them. And there I realized that the tool is somehow misused – people kept looking at those numbers, talked to their manager about the increasing “technical debt”, but still doing nothing, because the delivery of the project on time always took priority. And the debt kept increasing until people completely lost interest in the tool because it was too depressing.

My first experience with PMD, Findbugs and Checkstyle

I was amazed to see that finally there are mainly 3 tools that are used by everybody: Checkstyle, PMD and Findbugs. They are also ultimately used by Sonar as well.  Furthermore, the tools seem to overlap in different areas, so in the you have to use the three of them and make sure that the common areas are configured similarly.

Once I figured out what is the purpose of all three, I went back to my original problem – how to make use of them so that my colleagues would not get back to the same behavior: look at the number of detected problems -> see it’s too big -> do nothing.

The solution was to activate all the three plugins via Maven. The rules for all three were put in an external artifact, so people would add it in their pom file. Once activated for a project, the build is broken when an error is found. So this time people HAVE to fix the problem like any compilation problem, not to procrastinate it for a “stabilization phase”, (well it’s not 100% true, because you can still disable each of the tools for a given part of the code – maybe this should not be possible as well).

Everytime I suggested to the guys to enable it for some old project, people were afraid that it might take ages until the code gets compliant to all the rules. So to have a better idea of how long this endeavor may take, I did it for the open source project I work on – ST-JS.

  • Findbugs: 42 errors – 2h10 to fix
  • Checkstyle: 133 errors – 1h25 to fix
  • PMD: 311 errors – 11h15 to fix

So in total: 14h50 (almost two days)

Just to give you an idea of the code size of the project (SLOC using the linux command sloccount) : 7757

Even though the project has decent test coverage, the experience allowed me to find two corner cases that were potential bugs (using Findbugs) and some other dangerous situations.

Fixing the problems signaled by PMD  was the most painful of all, especially the part regarding the Cyclomatic Complexity. I confess we were having even 18 complexity for a given method (!).  Even though the parts of code seemed small and not very complex , several of these blocks together in the same method looked quite ugly. But that’s why I believe that you need something that would break your build, because we didn’t go to 18 in the first version, but incrementally by fixing one bug after another (or adding a new small feature).

When you fix these bugs, it’s important to have a good test coverage because when you start refactoring to lower the complexity, it’s incredibly easy to break the code! So be careful!

Now hoping that I convince you that it’s a good fight to take time to add static code analysis to your projects, let’s explore some thoughts on how we can be even better at it.

A word about ST-JS

We built ST-JS to add compile time error detection to JavaScript – so you write your code in Java that’s translated in JavaScript. But with static code analysis we can go even further, much further than where similar tools for JavaScript would take you.

Bug By Example

Findbugs and PMD are great to help you to asses the current status of a project and keep a certain level of confidence in your code’s quality. But in my opinion they both fail in making the life easy for the teams who’d like to continuously improve the quality of their code in a sustainable manner.

Creating new rules in Findbugs seems very difficult. In PMD it seems a bit easier using the XPATH framework, but it’s still quite hard and unnatural. Additionally even existing XPATH rules have problems because they match on class’ simple name instead of the full name. And generics are not well taken into account (try having for example a class called Statement or Thread in any package you want 🙂 )

While looking at the PMD documentation, you see that most of the rules come with examples. So I came up with the following idea: why not describing the rules by using examples? (bug by example).

The idea is to write compilable code snippets using a specific library to describe the bugs and then to detect those patterns in your code.

import static org.bugby.wildcard.Wildcards.someTypedValue;
import org.bugby.annotation.BadExample;

@BadExample
public class DontCallThreadRun {
 public void someCode() {
 someTypedValue(Thread.class).run();
 }
}

Making easy for the developers to create new rules is like creating unit tests for code yet to be written! Can you imagine delivering your libraries together with some rules that would statically check if it’s wrongly used!

Currently there are around 700-800 rules in Findbugs + PMD. But I believe that if teams and the community contribute to the rule database it shouldn’t be unusual to see enterprise projects with 5000-10000 rules enabled!

The Bugby project is still in its inception, proof of concept phase. Your help to move faster would be greatly appreciated!

%d bloggers like this: