Thursday, August 6, 2009

Pipeline and Yield in C#

The pipeline pattern (sometimes also referred to as pipes and filters) has many useful applications. C# makes implementation even easier with the yield keyword.

Download the Source (5.42 KB)

Pipeline is similar to chain of responsibility. The main difference is that in the chain, each "link" passes something to the next until one knows what to do with it, then the process stops. In pipeline, something is passed to every chain in the link and potentially modified with each pass. This is where "pipes and filters" come into play because one link may filter while other links in the pipe may add.

I was coding this pattern for a project where we had a set of complex rules to generate a path for a file. It was really a pipeline because we start with a root path, then keep adding subfolders and structure based on certain rules, and finally filter the path to have clean characters and then save the file. As I was designing this little module, I was presented with an interesting challenge.

You can think of the pipeline in really two ways. One way is that you have a pipeline manager that is aware of all of the links in the pipeline and manages their interaction. In this scenario, you would simply have a List<T> and then the pipeline manager would iterate each node, taking the result and passing it to the next.

The second is that your pipeline manager simply "feeds" the pipe, but the execution is really managed by each node. In other words, each node will do something, then pass the result to the next node and so on. This more of the traditional view and is in line with the concept as it originated years ago in Unix machines with literal input and output pipes.

The problem I had was when wiring up the nodes on the pipe, many nodes were conditional. The challenge is that I have to keep track of the root node, because that's the call that starts the whole pipeline processing, as well as the "current" node because that's what I add the next node to.

My code was becoming a little convuluted trying to keep track of the current node, next node, etc. That's when I decided to turn to my friend yield. With yield, you can conditionally insert elements into an IEnumerable collection. This was perfect for me: I could simply check my condition, then yield return the new node else fall through to the next. My pipeline controller would then use a generic algorithm and simply iterate the collection and register each subsequent node with the previous.

The generic algorithm looks like this:

INode root = null;
INode previous = null; 

foreach(INode node in _GetNodes())
{
    if (root == null)
    {
        root = node;
    }
    else
    {
        previous.Register(node);
    }
    previous = node;
}

Basically, I keep track of the root to call it later and the previous node so I can register the next one. The "magic" happens in _GetNodes because it has a return type of IEnumerable<INode>.

In order to demonstrate this, I made a very simply example. My pipeline is a phrase pipeline. Each node takes in a string, adds something to it, then passes it to the next node. I have four phrases: "Hello," "Hola," "Goodbye," and "Adios." To demonstrate the conditional chaining in the pipeline, my controller will accept two booleans, one to determine whether or not I include Spanish in the phrase, and another to determine whether or not I say, "Goodbye."

The interface IPhrase declares a simple Execute method that takes in a string and returns a string. It also declares Register to register the next node in the pipeline.

public interface IPhrase
{
    string Execute(string input);
    void Register(IPhrase nextPhrase);
}

In keeping with SOLID and DRY principles, developers shouldn't be concerned with how each node is registered or even with the fact that they need to call the subsequent node. In order to facilitate this, I created a base class. The base class takes care of the Register method. It also grabs the Execute method and wires in the call to the next node. Because I don't allow you to override this (because then you'd have to call base.Execute to get the base behavior) I provide an abstract method called _Execute that you can use to do your work. This means that the derived methods only have to focus on _Execute ... there is no need to worry about whether or not another node was registered or even how to call that node. The base class looks like this:

public abstract class BasePhrase : IPhrase
{
    private IPhrase _nextPhrase;

    protected abstract string _Execute(string input);

    public string Execute(string input)
    {
        string retVal = _Execute(input);

        if (_nextPhrase != null)
        {
            retVal = _nextPhrase.Execute(retVal);
        }

        return retVal; 
    }

    public void Register(IPhrase nextPhrase)
    {
        _nextPhrase = nextPhrase;
    }
}

Notice how we call the abstract method to modify the string, then pass it on to the next node if it exists. Now a "link" in the pipeline chain can look like this:

public class PhraseHello : BasePhrase
{
    protected override string _Execute(string input)
    {
        return string.Format("{0}Hello ", input);
    }
}

As you can see, the PhraseHello node simply appends "Hello " to whatever string is passed to it. The PhrasePipeline ties this all together. It has an Execute method that takes in the booleans (do we want Spanish, and do we want Goodbyes?), wires up the pipeline and then returns the call from the root. The following diagram illustrates the layout (click on the image to see a more detailed/expanded view):

Pipeline Example

The actual controller method looks like this:

public static class PhrasePipeline
{       
    public static string Execute(bool spanish, bool goodbye)
    {            
        IPhrase root = null;
        IPhrase previous = null; 

        foreach(IPhrase phrase in _GetPhrase(spanish, goodbye))
        {
            if (root == null)
            {
                root = phrase;
            }
            else
            {
                previous.Register(phrase);
            }
            previous = phrase;
        }

        return root == null ? string.Empty : root.Execute(string.Empty);
    }
    
    private static IEnumerable<IPhrase> _GetPhrase(bool spanish, bool goodbye)
    {
        yield return new PhraseHello();

        if (spanish)
        {
            yield return new PhraseHola();
        }

        if (goodbye)
        {
            yield return new PhraseGoodbye();

            if (spanish)
            {
                yield return new PhraseAdios();
            }
        }
    }
}

As you can see, the static class simply takes in the conditions. It then iterates the call to _GetPhrase and registers each subsequent phrase modifier with the previous. Finally, it simply calls the root node if it exists (we could easily pass in a "root" string here instead of string.empty).

The _GetPhrase method demonstrates the power of the yield command. In a nutshell, we are given an enumerable collection that we can loop through with a foreach as you see in the main method. However, the way yield works is that the element is inserted into the collection, then control is passed back to the caller. When the caller asks for the next element, code resumes after the previous yield statement.

In our example, every call guarantees me at least one element: PhraseHello. The subsequent calls are conditioned. If "Spanish" is false but "goodbye" is true, the code first evaluates the expression, skips it, goes into the "goodbye" section, and returns the PhraseGoodbye node. Now our "pointer" is ready to check whether or not we include Spanish.

Implementing pipelines this way provides incredible flexibility. You can have nodes make decisions about inserting child nodes, or even pass the chain to various controllers and factories that make decisions and add links as well. This, combined with a practical application of the yield keyword makes the code clean, readable, and easily extendable.

Homework ... want to take it further? You can easily abstract the pipeline itself by having INode<T> (hint: this example would become IPhrase : INode<string>) and make a Controller<T> to manage it. Also, don't limit yourself by thinking T has to be something like a string or a class ... it can easily be it's own IEnumerable<U> as well!

Jeremy Likness

6 comments:

  1. This strikes me as a little convoluted, surely to actually be a pipeline you ought to wire up everything at the start and each IPhrase ought to know whether it should modify the passed values or otherwise based on the conditions (spanglish, goodbye) being passed along the pipeline? It seems as though the logic in _GetPhrase would get very complicated very quickly in a real world example.

    ReplyDelete
  2. Definitely a little contrived for the example but the idea is you can chain multiples. I will admit with Managed Extensibility Framewokr (MEF) this pattern becomes far less convuluted and a bit more interesting!

    ReplyDelete
  3. Yeah, MEF (and possibly general IoC containers) creates some interesting new opportunities. I'm not sure it would be too easy to handle controlling the ordering of the filters using MEF though.

    ReplyDelete
  4. I've thought about that. I had a similar pattern in a project I worked on. I didn't have to order all of the filters, but there was a group that had priority over another group, etc, so I tagged them with metadata and sorted them on the import before executing down the chain.

    ReplyDelete
  5. I am in the process of implementing something like this.
    How can i extend to this to take different parameters?

    ReplyDelete
  6. "Many nodes were conditional" - if this is the case, you are much better off using a routing slip. That allows you to do everything that pipes and filters do, while conditionally modifying your route through the nodes, or break out all together

    ReplyDelete