Saturday, April 16, 2011

Performance Optimization of Silverlight Applications using Visual Studio 2010 SP1

Today I decided to work on some miscellaneous performance optimization tasks for my Sterling NoSQL Database that runs on both Silverlight and Windows Phone 7. The database will serialize any class to any level of depth without you having use to attributes or derive from other classes, and despite doing this and tracking both keys and relationships between classes, you can see it manages to serialize and deserialize almost as fast as native methods that don't provide any of the database features:

Note: click on any smaller-sized image in this post to see a full-sized version in case the text is too small for the sample screenshots.

Obviously the query column is lightning fast because Sterling uses in-memory indexes and keys. I decided to start focusing on some areas that can really help speed up the performance of the engine. While there are a lot of fancy profiling tools on the market, there really isn't much need to look beyond the tools built into Visual Studio 2010. To perform my analysis, I decided to run the profile against the unit tests for the in-memory version of the database. This will take isolated storage out of the picture and really help me focus on the issues that are part of the application logic itself and not artifacts of blocking file I/O.

The performance profiling wizard is easy to use. With my tests set as the start project, I simply launched the performance wizard:

I chose the CPU sampling. This will give me an idea of where most of the work is being performed. While it's not as precise as knowing time spent in functions, when a sample rate goes down you can be reasonably assured that the function is executing more quickly and less time is being spent there.

Next, I picked the test project.

Finally, I chose to launch profiling so I could get started right away and finished the process.

The profiler ran my application and I had it execute all of the tests. When done, I closed the browser window and the profiler automatically generated a report. The report showed CPU samples over time, which wasn't as useful to me as a section that reads "Functions doing the most individual work." There are where most of the time is spent. Some of the methods weren't a surprise because they involved reflection, but one was a list "contains" method which implies issues with looking up items in a collection.

Clicking on the offending function quickly showed me where the issues lie. It turns out that a lot of time is being spent in my index collections and key collections. This is expected as it is a primary function of the database, but this is certainly an area to focus on optimizing.

Clicking on this takes me directly to the source in question, with color-coded lines of source indicating time spent (number of samples):

Swapping my view to the functions view, I can bring the functions with the most samples to the top. A snapshot paints an interesting picture. Of course the contains is a side effect of having to parse the list, which uses the equals and hash codes. Sure enough, you can see Sterling is spending a lot of time evaluating these:

So diving into the code, the first thing I looked at was the Equals method. For the key, it basically said "what I'm being compared to must also be the same type of key, and then the value of the key I hold must also be the same."

public override bool Equals(object obj)
{
    return obj is TableKey<T, TKey> && ((TableKey<T, TKey>) obj).Key.Equals(Key);
}

Because these keys are mainly being compared via LINQ functions and other queries on the same table, I'm fairly confident there won't be an issue with the types not matching. So, I removed that condition from the keys and indexes and ran the performance wizard again to see if there were any significant gains.

This time the list stayed on the top, but the table index compare feature also bubbled up:

One nice feature of the profiler is that you can compare two sessions. I highlighted both sessions and chose the comparison feature:

The report actually showed more samples now for the index, still remaining one of the most problematic areas of code:

Obviously, my issue is with the comparison of keys. In Sterling you can define anything as your key, so Sterling has no way of necessarily optimizing the comparison. Or does it? One comparison guaranteed to be fast is integer comparison. All objects have a hash code. Instead of always comparing the key values, why not evaluate the hash code for the key, then compare that first to eliminate any mismatches? Then only if the hash codes are equal, have Sterling compare the actual keys themselves as well to be sure.

I made those tweaks, and the new equals function:

public override bool Equals(object obj)
{
    return obj.GetHashCode() == _hashCode && ((TableKey<T, TKey>) obj).Key.Equals(Key);
}

Notice the fast check for the hash code first, then evaluation down to the key only if the hash code check succeeds. What's nice about this approach is that Sterling probably won't be doing much unless something interesting like a match happens in the first place, so the extra check only takes place when that "interesting event" happens. All of the boring events now benefit from a faster comparison as they are quickly rejected.

That's the theory, anyway. What did the profiler say? I executed a final profiler run. This time there was definite improvement in the equality function and the list lookup:

Of course, the only caveat here is that the samples aren't statistically significant. The improvement is slight, but with a significant margin of error. Fortunately, I was able to run the profiler multiple times in the old and new, and consistently saw improves in the functionality, so there is some improvement and even a little counts as this is the area Sterling spends the most of its time. After evaluating this, a HashSet<T> may out perform the list, but also is not available on the Windows Phone 7 version, so I have some more tweaking and testing to do.

I also managed to find a few other nuggets, including converting the reflection cache to use dynamic methods and taking an expensive operation that was evaluating types and caching the lookup in a dictionary instead. After tweaking and checking these improvements in, next up will be looking at the isolated storage driver and potential bottle necks there.

As you can see, the built-in profiling is very powerful and a useful tool to help improve application performance and identify issues before they become serious.

Jeremy Likness