Monday, July 19, 2010

Optimizing, Obfuscating, and Shrinking your Android Applications with ProGuard

Obfu-what? Right, there's a lot of technical terms there, and you may not know what they mean. I'm going to describe a way for you to shrink the size of your Android applications in half, optimize them to make them run faster, and obfuscate them to make it harder for others to reverse engineer your code.

What we'll do is use a Java program called ProGuard to apply its magic to your program's code, during the build process. To do this we'll use an Ant script to build the program, and add our extra steps into the regular build process.

Why do it?

The short answer is that your code will be smaller and faster. How much smaller and faster depends, but in general you'll find your code will be a lot smaller and a little faster. There are three key functions that ProGuard will do. Much of the text below is taken from the ProGuard website.

You may hear the term obfuscation to describe all three processes. Actually, obfuscation is just one form of the processes that a program such as ProGuard does. Instead of saying "shrink, obfuscate, and optimize", we'll just use the simple term of obfuscation to describe all three in this blog.

Shrinking

Java source code (.java files) is typically compiled to bytecode (.class files). Bytecode is more compact than Java source code, but it may still contain a lot of unused code, especially if it includes program libraries. Shrinking programs such as ProGuard can analyze bytecode and remove unused classes, fields, and methods. The program remains functionally equivalent, including the information given in exception stack traces.

For a realistic example, take the following code:
    if (Config.LOGGING)
    {
        TestClass test = new TestClass();
        Log.d(TAG"[onCreate] testClass=" + test);
    }
The above code is a typical scenario during development. You create code like this to help debug and test your code. Before releasing the final product, though, you set Config.LOGGING to false, so it doesn't execute. The problem is, this code is still in your application. It makes it bigger, and may cause potential security issues by including code which should never be seen by a snooping hacker.

Shrinking the code solves this problem beautifully. The code is completely removed from the final product, leaving the final package safer and smaller.

Obfuscation

By default, compiled bytecode still contains a lot of debugging information: source file names, line numbers, field names, method names, argument names, variable names, etc. This information makes it straightforward to decompile the bytecode and reverse-engineer entire programs. Sometimes, this is not desirable. Obfuscators such as ProGuard can remove the debugging information and replace all names by meaningless character sequences, making it much harder to reverse-engineer the code. It further compacts the code as a bonus. The program remains functionally equivalent, except for the class names, method names, and line numbers given in exception stack traces.

Optimizing

Apart from removing unused classes, fields, and methods in the shrinking step, ProGuard can also perform optimizations at the bytecode level, inside and across methods. Thanks to techniques like control flow analysis, data flow analysis, partial evaluation, static single assignment, global value numbering, and liveness analysis, ProGuard can do things such as perform over 200 peephole optimizations, like replacing x * 2 with x << 1. The positive effects of these optimizations will depend on your code and on the virtual machine on which the code is executed. Simple virtual machines may benefit more than advanced virtual machines with sophisticated JIT compilers. At the very least, your bytecode may become a bit smaller.

Using Ant to build your project

When you create your Android application, or build it, there are many steps involved. First, a Java compiler compiles the source files (i.e. the textual .java files) into Java bytecode (i.e. .class files). Then, a tool in the Android SDK turns the Java bytecode into Dalvik bytecode (i.e. .dex files). Finally, all of the resources and code are packaged into a single ZIP file, which is an .APK file. Since ProGuard works with Java bytecode, we want to run ProGuard on the class files that are created by the Java compiler, before the build process converts the Java bytecode into Dalvik bytecode. This isn't possible with the regular Eclipse method of creating Android packages (at least, not that I know of), but it's a cinch if you use Ant to build your application. It doesn't take long to create an Ant build script to build your existing Android application. See the instructions on my blog post here. Or, you can just download the sample at the end of this blog.

Adding ProGuard to the Ant build script

Download the latest ProGuard distribution. Inside, find the library, and put it in a convenient location in your project directory, such as proguard/. For example, in the latest version as of this writing (4.5.1 distribution), I copied lib/proguard.jar from the distribution ZIP file into my source tree as proguard/proguard.jar. Now, we add the script to the Ant build file, build.xml.

<!-- ================================================= -->
    <!-- Obfuscation with ProGuard -->
    <!-- ================================================= -->
 
    <property name="proguard-dir" value="proguard"/>
 <property name="unoptimized" value="${proguard-dir}/unoptimized.jar"/>
 <property name="optimized" value="${proguard-dir}/optimized.jar"/>
 
 <target name="optimize" unless="nooptimize">
  <jar basedir="${out.classes.dir}" destfile="${unoptimized}"/>
 
  <java jar="${proguard-dir}/proguard.jar" fork="true" failonerror="true">
   <jvmarg value="-Dmaximum.inlined.code.length=16"/>
   <arg value="@${proguard-dir}/config.txt"/>      
   <arg value="-injars ${unoptimized}"/>
   <arg value="-outjars ${optimized}"/>
   <arg value="-libraryjars ${android.jar}"/>
  </java>     
 
  <!-- Delete source pre-optimized jar -->     
  <!--delete file="${unoptimized}"/-->
 
  <!-- Unzip target optimization jar to original output, and delete optimized.jar -->
  <delete dir="${out.classes.dir}"/>
  <mkdir dir="${out.classes.dir}"/>
  <unzip src="${proguard-dir}/optimized.jar" dest="${out.classes.dir}"/>
 
  <!-- Delete optimized jar (now unzipped into bin directory) -->
  <delete file="optimized.jar"/>
 
   </target>
To have the build call the optimize Ant target between the Java compiler and dex compiler, we change the dex target as so:

Android 7 and below:
<!-- Converts this project's .class files into .dex files -->
<target name="-dex" depends="compile,optimize">
Android 8 and above: Uncomment the -post-compile target, and add this:
<target name="-post-compile">
  <antcall target="optimize"/>
</target>

Configuring ProGuard

Now we'll tell ProGuard how it can work with our Android application. Create a file called proguard/config.txt, which is referenced in the above Ant script. The following is taken from the ProGuard manual, although -libraryjars, -injars, and -outjars is passed in via the Ant build script instead of here.
-target 1.6 
-optimizationpasses 2 
-dontusemixedcaseclassnames 
-dontskipnonpubliclibraryclasses 
-dontpreverify 
-verbose 
-dump class_files.txt 
-printseeds seeds.txt 
-printusage unused.txt 
-printmapping mapping.txt 

# The -optimizations option disables some arithmetic simplifications that Dalvik 1.0 and 1.5 can't handle. 
-optimizations !code/simplification/arithmetic 

-keep public class * extends android.app.Activity 
-keep public class * extends android.app.Application 
-keep public class * extends android.app.Service 
-keep public class * extends android.content.BroadcastReceiver 
-keep public class * extends android.content.ContentProvider 

-keep public class * extends View { 
public <init>(android.content.Context); 
public <init>(android.content.Context, android.util.AttributeSet); 
public <init>(android.content.Context, android.util.AttributeSet, int); 
public void set*(...); 
}

# Also keep - Enumerations. Keep the special static 
# methods that are required in enumeration classes.
-keepclassmembers enum  * {
    public static **[] values();
    public static ** valueOf(java.lang.String);
} 
Note that we have added a few configurations which make extra output, such as -verbose and -printusage unused.txt. You may remove these if you don't like the extra output cluttering your build process.

Results

Now we're ready! When you run ant release from the command line, you will see the optimizer run. Here is the output from the test project, included below, when the build property config.logging is true:
>ant release
...
     [java] Shrinking...
     [java] Printing usage to [blog\obfuscation\proguard\unused.txt]...
     [java] Removing unused program classes and class elements...
     [java]   Original number of program classes: 8
     [java]   Final number of program classes:    2
Because we configured ProGuard with -printusage unused.txt, we can see what was removed from our code:
com.androidengineer.obfu.Obfuscation: 
    private static final java.lang.String TAG 
com.androidengineer.obfu.R 
com.androidengineer.obfu.R$attr 
com.androidengineer.obfu.R$drawable 
com.androidengineer.obfu.R$layout 
com.androidengineer.obfu.R$string 
com.androidengineer.obfu.TestClass: 
    private static final java.lang.String TAG 
That's pretty cool. You can also look at proguard/unoptimized.jar and proguard/optimized.jar. We can see that it removed many classes which were just placeholders for constants, and it removed the string TAG variables used by our logging code by replacing the references with the actual string constants.

In addition, if we build the application by changing the build property config.logging to false, we get an even further reduction in size. The best part about it is, it removes all of our debugging code.
>ant release
...
     [java]   Original number of program classes: 8
     [java]   Final number of program classes:    1
You can see that one more class was removed, TestClass. Because it is only used when Config.LOGGING is true, it is completely removed from the final build during the obfuscation process. So feel free to leave all of the debugging code you want in your source, because it can be removed during the build.

Of course, with our simple test project, our results are skewed, because it is not a typical Android application. proguard/unoptimized.jar is 4,959 bytes and proguard/optimized.jar is 646 bytes. But on an application I work with, which has over a thousand classes, I've seen a literal 50% reduction of code size. Well worth the trouble of setting this build up, in my opinion.

ClassNotFoundExceptions

There may be some cases where you get a ClassNotFoundException when running your application which has been obfuscated with ProGuard. In this case, you need to edit the config.txt file to tell ProGuard to keep the class in question. For example,

# Keep classes which are not directly referenced by code, but referenced by layout files. 
-keep,allowshrinking class com.androidengineer.MyClass 
{ 
*** (...); 
} 
This scenario is rare. I have seen it happen when I reference the child of an Android View class in an Android layout file, such as MyButton extends Button, but the class is not referenced in regular code. More information can be found in the ProGuard documentation.

Update for Android SDK versions 7 and above

Google updated the Ant scripts in the later SDK versions. They changed the name of a key variable, $[android-jar}, to ${android.jar}. This caused the builds to break. The solution is to define them both if they do not exist:
 <!-- In newer platforms, the build setup tasks were updated to begin using ${android.jar} instead of ${android-jar}.  This makes them both compatible. -->
<target name="target-new-vars" unless="android-jar">
<property name="android-jar" value="${android.jar}"/>
</target>

<!-- Be sure to call target-new-vars before anything else. -->   
<target name="config" depends="target-new-vars,clean">

The sample project file below has been updated.


Update for Android SDK versions 8

Well, it turns out Google changed the ant build files again. This time, though, they actually made it pretty darn easy. The build.xml file is much smaller this time. They've added a nifty new section:
<!-- extension targets. Uncomment the ones where you want to 
     do custom work in between standard targets -->
<!--
    <target name="-pre-build">
    </target>
    <target name="-pre-compile">
    </target>

    [This is typically used for code obfuscation.
     Compiled code location: ${out.classes.absolute.dir}
     If this is not done in place, override 
     ${out.dex.input.absolute.dir}]
    <target name="-post-compile">
    </target>
-->
All we have to do is uncomment the -post-compile Ant target, and add our obfuscation Ant target to it.
<target name="-post-compile">
    <antcall target="optimize"/>
</target>
The complete build file is here.

Using Google's License Verification Library (LVL)

For those of you using the Google licensing service, License Verification Library, you will want to keep an additional class from being obfuscated in the additional library. Be sure the following is in your proguard/config.txt file.
-keep class com.android.vending.licensing.ILicensingService

Sample Application

The sample application is a simple Hello World application, but it includes the custom build script and ProGuard library as described in this tutorial. First, you must run "android update project -p ." from the command line in the project's directory to let the tools set the SDK path in local.properties. Then you can turn on and off logging by changing the value of config.logging in build.properties. Finally, run ant release to build the application, which will create the obfuscated and signed .apk file. If you have any trouble, you may want to review the previous blog post about setting up Ant builds.

Project source code - obfuscation.zip (600 Kb)

Build file for Android API level 8 and above:build.xml (4.52 Kb)