SOLID and DRY

No, this isn't a commercial for deodorant.

It's funny how many developers become so entrenched in their company and projects that they seldom venture outside to see what is going in the bigger world of Information Technology. I see this often in my interviews: a senior developer has worked a company for several years, quickly rose to the top and is the "go to" guy and now is ready to be an "architect." The only problem is that they only know how to work on their VisualBasic 6.0 system and think the best architecture is pulling everything as ADO XML and using stylesheet transformations to drive the application (I say this tongue-in-cheek because this is a framework I worked with and implemented in my past).

I can think of two good examples for this. The first is test-driven development (TDD). You can ask a question about TDD in an interview, and while there are many people who have never heard of it, you'll get a few that say, "Yes, absolutely, I follow TDD." "How, exactly?" "Oh, we write unit tests." If you think that writing unit tests means you are practicing TDD, you might want to do a little more research. The second example is generics - most people are familiar with generics in C#. What are they for? They are for lists, of course! (Again, if you feel that generics are only good for typing lists to prevent boxing, you'll want to research a bit deeper). Oh, and I just thought of a third: delegates are just for events, right?

My point is that sometimes we get stuck in what works for our system and it works well, but it's not always the right or best solution and we need to constantly challenge ourselves as software engineers (or whatever we call ourselves today) to stay on top of what's out there.

I decided to post this article in my C#er : IMage blog because I feel SOLID and DRY are very sound principles for software development and yet I constantly see people tripping over these concepts.

Let's tackle DRY first:

Don't Repeat Yourself

It's a simple principle, really.

I am writing an application I need to validate that a field follows the specific pattern for an IP address. So I plug in a textbox, attach a custom regex validator control and I'm off to the races. One year later we have a huge application with IP addresses all over the place, and we discover that the regular expression we Googled to validate IP addresses has a bug! Now it's search and replace and a LOT of unit testing.

I should have caught myself the second time I was adding a regex validator. It's easy to think, "I'm writing nice code because I'm not hand-rolling a regular expression validation, I'm using the validator." But even something like that lent itself to create an "IP textbox control" with the validator embedded. Then, instead of repeating myself, I can drop in that control. If it changes, I change it once.

You can repeat yourself by decorated code with string literals instead of collapsing them into constants, by having little utility methods and not saying to yourself, "Wow, I just wrote that method twice ... time to move it into a class." It's a powerful principle that will help you write scalable, maintainable code. You don't need to give it a fancy name like "Single Point of Truth" to appreciate that if you find yourself repeating something, anything, it's a good candidate to break out into a separate, reusable class.

This takes us to SOLID, which is a study in and of itself, so I'm going to be brief here. The intent is just to introduce you to the principle so you can study it yourself and learn from it, because I see a lot of developers who would benefit from understanding it.

SOLID is an acronym for:

Single Responsibility Principle
Open/Closed Principle
Liskov Substitution Principle
Interface Segregation Principle
Dependency Inversion

Single Responsibility Principle

A class should have one, and only one reason to change.

This is fairly close to DRY, no? We say: keep it simple. A common pattern for this is Model-View-Controller (MVC). Instead of combining presentation concerns, data persistance concerns, and business logic all in the same class, you break it out so each class focuses on its own. Why is this so important? Because each thing the application does is not only a potential point of failure, but also a point of change. In other words, "doing something" might be "doing something different" down the road.

A common example of this is writing a drop down class. It calls the database, does an "order by" and then renders the control. Then you find out another page needs this same control, but as a textbox that has a "dynamic search" feature. So you copy the control, use some cool AJAX, and are looking good until you are informed that states will no longer be stored in the database, but must be managed via XML. Now you have TWO controls to change because of the ONE change of storing data.

If you followed the Single Responsibility Principle, you would have probably had four classes: one to fetch the data (persistance management), one to sort it (business logic), one to bind the data to a drop down and another for the fancy "look ahead" control (presentation). When the switch came, you'd only have to change it in one place.

This principle also makes it easier to scale development teams. Have you heard the saying, "9 women can't make a baby in 1 month?" With this principle, you can scale the speed of delivery for your application by breaking it into smaller, more maintainable units of work.

Open/Closed Principle

Open for extension, closed for modification.

This simply states that you should be able to extend what a class does without modifying its core behavior. It's open for extension, but closed for modification. Again, let's take an example. You might have a class that represents a user in the system, and you've determined that fundamentally a user consists of a login, a password, and an email address. Later, you find out some users can have contact information. Eventually you learn that some users are administrators and have additional information such as which parts of the application they can manage.

A class that violates these principles would require you to have a bloated class with lots of flags indicating what/who the user is. Every time you had to change or add something, you'd need to change that original class.

A more stable approach would be to put your fundamentals into a base class. Now you can have a ContactUser : BaseUser and extend the user to have contact information. Then you can have a AdminUser : BaseUser or perhaps an AdminUser : ContactUser. You are extending, not modifying the base class. I mentioned generics earlier: this is a perfect example where generics can be used to define some base functionality (for example, in your data access layer, opening and closing the connections) while your extensions strongly type the behavior for a specific class.

The Liskov Substitution Principle

Derived classes must be able to substitute for their base classes.

Why is this so important? Think about the example above with different contacts. If I have a function that validates a login, then I should be able to simply check the username and the password of the class. To do this, I don't need to know if the class is a base user, a contact user, or an admin user. All of these classes extend base user, so I will code my login against base user. The contact user behaves exactly as the base user.

What if I decide to write another class that displays user information? I make it UserInfo<T> but then break the principle by doing this:

if (typeof(T) == typeof(BaseUser) { return ((BaseUser)T).Email; }
else if (typeof(T) == typeof(ContactUser) { return ((ContactUser)T).Phone; }
...
else { throw new Exception("Could not determine the type."); }

What just happened? It might work on the surface, but suddenly I have a very difficult class. This class now needs to understand every possible derived type of T in order to do its job. How can the application possibly scale if I have to keep coming back to the same place to make changes and keep up to date? I might hire a new developer who goes off and builds a AccountingUser ... and then blows up the system because they did not update UserInfo.

A better way would be to override the ToString method. Each class can implement its own version of what "display information" is. Any other object can handle BaseUser (not caring what the derived type is) and display useful information simply by invoking ToString().

Now things are starting to get interesting because we're going to tackle one of the most popular "buzz words" going around today: dependency injection / inversion of control ... stay tuned, because we have two more principles to cover!

Saturday, May 23, 2009

SOLID and DRY