Thursday, September 10, 2009

Pragmatic Reflection on Singletons

So today I was wading through some code that gets called quite a bit. It is in a process that might be hit thousands of times per second. It uses a pipeline pattern so there are several objects to "new up" and place in the pipeline.

Being performance-minded I originally was tempted to follow the singleton pattern so that I didn't have to create those objects every call and then chain them together. This would be quite expensive, right?

Of course, singletons done right can be powerful, but done wrong can create issues ... you must be very careful about wiring up your properties correctly and understand synchronization between threads, etc. Unfortunately, more often than not (and I've been guilty of this myself) I see the pattern used as a poor man's cache rather than for a real business purpose. The question really becomes, when does this pattern truly make sense? ... and when is it misused or even overkill?

The first thing to do is address the idea that creating objects is expensive. I created a simple console application that would iterate millions of times and simply create an array out of two properties accessed from a class. This is purely an artificial construct just to get to the root of what we do every day: we get a class, somehow (through factory, inversion of control containers, a singleton pattern, or even with goood old-fashioned new keyword), then we do something with that class (access properties, execute methods, etc).

One pass follows the singleton pattern. I do something ugly on purpose, but it essentially makes a singleton, then makes a nested singleton that is referenced by the parent (it's ugly how I do it, but this is a quick and dirty example). The point is that every iteration, when I get the singleton, I'm getting the same set of nested objects.

The next pass news up the objects every time - I'm created a class with a nested class, then referencing their properties, and making a new copy of both classes each pass. Just for kicks and giggles I even added another pass that uses System.Reflection to activate the classes to show the expense of reflection.

Again, keep in mind this is an artificial case but it helps us better understand the cost of either accessing a singelton or newing up an object each time.

Here is my test:

using System;

namespace Activation
{
    public interface INestedWidget
    {
        INestedWidget NestedWidget { get; set; }
        string Identifier { get; }
    }

    public class WidgetA : INestedWidget
    {
        private static readonly WidgetA _instance; 

        static WidgetA()
        {
            _instance = new WidgetA();
        }

        public virtual INestedWidget GetInstance()
        {
            return _instance;
        }

        public INestedWidget NestedWidget { get; set; }
        
        public virtual string Identifier
        {
            get { return "Widget A"; }
        }        
    }

    public class WidgetB : WidgetA
    {
        private static readonly WidgetB _instance; 

        static WidgetB()
        {
            _instance = new WidgetB();
        }

        public override INestedWidget GetInstance()
        {
            return _instance;
        }
        public override string Identifier
        {
            get
            {
                return "Widget B";
            }
        }
    }

    class Program
    {
        static void Main()
        {
            const int ITERATIONS = 99999999; 
            Console.WriteLine("Here we go...");

            DateTime start = DateTime.UtcNow;
            WidgetA parent = new WidgetA();         
            parent.GetInstance().NestedWidget = new WidgetB().GetInstance();
            
            for (int x = 0; x <= ITERATIONS; x++)
            {
                INestedWidget widget = parent.GetInstance();
                string[] identifier = new string[] {widget.Identifier, widget.NestedWidget.Identifier};                
            }
            DateTime finish = DateTime.UtcNow;
            TimeSpan interval = finish - start;

            long ms = interval.Ticks/TimeSpan.TicksPerMillisecond;

            Console.WriteLine("Took me {0} using singletons, {1} per ms", interval, ITERATIONS/ms);                

            start = DateTime.UtcNow; 
            for (int x = 0; x <= ITERATIONS; x++)
            {
                INestedWidget iParent = new WidgetA() {NestedWidget = new WidgetB()};
                string[] identifier = new string[] { iParent.Identifier, iParent.NestedWidget.Identifier };                
            }
            finish = DateTime.UtcNow;
            TimeSpan secondInterval = finish - start;

            ms = secondInterval.Ticks/TimeSpan.TicksPerMillisecond;

            Console.WriteLine("Took me {0} using new objects, {1} per ms", secondInterval, ITERATIONS/ms);

            start = DateTime.UtcNow;
            for (int x = 0; x <= ITERATIONS; x++)
            {
                INestedWidget iParent = (WidgetA) Activator.CreateInstance(typeof (WidgetA));
                iParent.NestedWidget = (WidgetB) Activator.CreateInstance(typeof(WidgetB));
                string[] identifier = new string[] { iParent.Identifier, iParent.NestedWidget.Identifier };
            }
            finish = DateTime.UtcNow;
            TimeSpan thirdInterval = finish - start;

            ms = thirdInterval.Ticks/TimeSpan.TicksPerMillisecond;

            Console.WriteLine("Took me {0} using activation, {1} per ms", thirdInterval, ITERATIONS/ms);

            Console.ReadLine();
        }
    }
}

When I run it, it's pretty much as expected. On my machine, it takes 4 seconds to spin through the singletons, 7 seconds (wow, over 50% longer) to new up the objects, and a whopping 30 seconds (over 6 times as long) to use reflection. That proves without a doubt that creating the objects is way too expensive and I should use that singleton, right?

Well ... maybe not.

The real question here is whether or not 99,999,999 objects is a realistic test case.

For me, the more important piece of information is more frequency based ... remember, I'm getting thousands of requests per second, so what can I do in a millisecond? As it turns out, quite a bit. Even when creating a new object every time, I can create over 13,000 of those nested instances every millisecond. That means if my test case ended there, I should be able to start handling millions of requests per second without faltering ... even without resorting to the singleton.

Even the activator gives me a good run ... and I included that for the dependency injection fans because ultimately what a lot of the engines are doing is reflection-based.

So here's the crunch ... what I really need to worry about when I'm managing my requests is what I'm doing from the request coming in to the point I'm done with that. This might be code logic, making calls to my data access layer, etc. In fact, if I truly break down what's inside of a "request" I might find that newing the object or even activating it is really 1% of the entire request. So why am I using static classes just to address 1% of the problem when there are so many other potential pitfalls?

This is where I see the common mistake ... most likely, I have some semi-static data I've loaded and want to keep, so the singleton gives me a convenient cache. If the database calls take 500ms then suddenly I get REAL big savings keeping a copy of the class around rather than making it new every time.

But there is the real rub ... is that the right solution? I would say ... NO.

The cache is a concern that belongs somewhere else. Depending on my architecture it may "live" in the data access layer or in one of my providers, but the point is that my "consuming" class shouldn't be concerned and shouldn't have to be implemented differently out of a concern for how the underlying data is persisted and retrieved. In other words, I should be able to make a new class every time if my class is the "do something fantastic" class and simply leans on the "do something with the database class."

I really have a few options here. My ORM framework may supply a cache layer, in which case I will always go out and make my data request but sometimes it will come from the local cache and sometimes it will come from disk and take longer. My provider layer that sits on top of the data layer might manage this for me.

Finally, I might even take a look at aspect-oriented programming and think about the cache as a "cross-cutting" concern. Perhaps I have a policy that controls the caching ... but that's a story for a different day.

The bottom line is, if I am using the pattern, I better well know what and why ... using it for a cache when it's not the class that is fetching the data violates separation of concerns, because now I'm not concerned about the data I am working with, but how I actually get and hold onto the data. That belongs in the data layer. Perhaps THAT layer might use a singleton somewhere.

I'm very interested in your comments/feedback about where you feel the singleton pattern makes sense and why, and more importantly ... what are some ways you manage the concept of singleton? Is that a design aspect, or an implementation aspect? In other words, does it make sense to have a "GetInstance" method on my interface? Or should I make everything "singleton-ignorant" then give it a lifetime policy using a dependency injection framework? If you are mostly using it for caching/performance reasons, who realy should own the cache, and how does one truly separate caching of data as a concern?

Jeremy Likness