Tuesday, September 15, 2009

Ref Keyword for Reference Types

The Ref keyword is well known. It indicates that you are passing a reference, not a value, to a method. That means that if the method modifies the value, the changes will be apparent to the calling method as well.

Where I see a lot of confusion, however, is what happens when dealing with reference types. It is common to say that methods pass objects by reference, but that's not entirely true.

First, a pop quiz. Without actually running the code, what do you think this code snippet will produce?

using System;

namespace ByRef
{
    internal sealed class MyClass
    {
        public MyClass(int value)
        {
            Value = value;
        }

        public int Value { get; set; }
    }

    internal class Program
    {
        private static void _SwapByValue(MyClass myClass)
        {
            myClass = new MyClass(5);
        }
   
        private static void _SwapByRef(ref MyClass myClass)
        {
            myClass = new MyClass(5);
        }
        
        private static void Main(string[] args)
        {
            MyClass testclass = new MyClass(4);
            _SwapByValue(testclass);
            Console.WriteLine(testclass.Value);
           
            MyClass testclass2 = new MyClass(4);
            _SwapByRef(ref testclass2);
            Console.WriteLine(testclass2.Value);            

            Console.ReadLine();
        }
    }
}

We'll come back to that in a minute.

When you make a method call, all of the variables you pass are copied to the stack. This is the first place some people get confused. If I have an integer:

int x = 5;
CallMethod(x);

x is on my stack. A copy of the value of x ("5") is made and also placed on the stack. We now have two stack entries: "my x" and "the method's x."

But what about this?

MyClass myClass = new MyClass(5);
CallMethod(myClass);

The important thing to remember is that myClass is really a reference to the instance. So the first line gives us two allocations in memory: a block of heap that contains the class, and a local stack pointer to the class. So when we call the method, a copy is made just as in the value type example. In this case, it is a copy of the reference. So now I have my class in the heap, my local reference, and a copy of my local reference being passed to the method.

This is why the first case in the above example will print "4". All the method did was to change the reference on the method's stack to point to a new allocation in the heap - the new instance of MyClass. When the method returns, the copy is forgotten. The new instance (5) becomes orphaned, and is eventually garbage collected. The local reference still points to (4).

The second case is more interesting. As we mentioned, the ref keyword forces a reference, not a copy, to be passed. So this:

int x = 5;
MyMethod(ref x);

Skips making the copy. It simply gives the method access to the "5" value on the stack (some people mistakenly believe that the 5 is somehow boxed or unboxed, but that doesn't happen ... it simply isn't copied). If you change it, you change the same stack reference the calling program has, and therefore it will recognize the change.

What about with our object? This is where understanding the ref keyword is, well, key. Remember that when we new the (4) class, we have two important pieces: the class on the heap, and the pointer to the class (reference) on the stack. When we call the method with the ref keyword, it is not allowed to make a copy. Therefore, the method gains access to the same reference on the stack as the calling program. When it makes a new class (5), the stack reference is changed to point to this new instance. Now, the references point to (5) and (4) is orphaned and will be subject to garbage collection. This is why the second example shows "5" — the reference has been updated.

So, as you can see, objects are not "passed by reference" by default. Instead, when a reference type is passed in a parameter list, the reference is passed by value. This allows you to modify the object. If you want to change the reference, then you must ref it!

Jeremy Likness