Thursday, September 27, 2012

Modern Web and Desktop App Development

Free stuff! Got your attention? Read on to learn more …

Last year I focused on completing my book about enterprise solutions using Silverlight. Since the release of Designing Silverlight Business Applications, much has changed. The book itself was well-received (currently well-rated at Amazon, thanks to my readers who take the time to leave their feedback) but the market itself shifted as executives concerned with the unclear future of Silverlight began to focus more heavily on HTML5. Technology moves fast so companies have had enough time to process the fact that HTML5 was a lot of marketing and hype but failed to deliver on the promise of writing once and running everywhere. Part of the uncertainty was compounded by the release of Windows 8 and the shift to WinRT.

I’ve been fortunate to have projects that have allowed me to focus on both aspects of leading edge technology. In addition to working “behind the scenes” with Wintellect founders Jeff Prosise and Jeffrey Richter on Windows 8 as it evolved over the past year, I’ve also been heavily involved in a large enterprise web-based application that is built on HTML5 technologies and follows a Single Page Application (SPA) paradigm which means lots of client-side JavaScript code. The project integrates libraries like Backbone.js, RequireJS, AmplifyJS, and leverages controls and MVVM features from Kendo UI. We are using the latest Visual Studio 2012 and TFS Server 2012 features to manage an agile project with a large development team located literally around the world.

I learned the hard way that JavaScript is difficult to manage and scale almost a decade ago. It was one of the reasons I shifted to Silverlight and was delighted to discover the teams could produce quality code about 4x faster than using the traditional web stack. The focus has shifted back to JavaScript but the problem with scaling applications based on JavaScript is not unique so a number of amazing libraries and solutions have been created to make it easier to control the quality of applications that have a heavy client JavaScript component.

On the other side of the spectrum, as I work with companies moving their existing technologies from Silverlight and WPF to Windows 8 and smartphones, the common question is how to leverage as much of the existing codebase as possible. I’ve had plenty of experience working both with the Portable Class Library (PCL) and the team at Microsoft driving that project to help build portable assemblies that can provide 80% of the functionality required across multiple platforms without recompiling. I think the PCL is one of the least understood and most underused features of Visual Studio 2012.

The reason I’m sharing all of this is because I’m excited to present on both of these topics next week at Wintellect’s own Devscovery conference. This is an event that is unique in many ways. First, the focus is less on presenting and more on training … it’s a subtle difference but the sessions dive deep into technologies including Windows 8 and WinRT, HTML5, jQuery, JavaScript, ASP.NET MVC, .NET 4.5, C# 5, Visual Studio 2012, advanced debugging and testing, and more. You can view the full agenda here.

The second reason this event is unique is because of the expert presenters. It’s one of the few places you’ll find Jeffrey Richter, Jeff Prosise, and John Robbins under the same roof and not only attend sessions but meet with them one-on-one to get your questions answered. You can view the full list of presenters here. Add to that a powerful keynote and what’s not to like? I’ll be there as well, presenting on Enterprise JavaScript and the Portable Class Library.

I mentioned free stuff at the beginning of this blog post. Registration is still open for this event, but I realize for some it may be short notice, so I’ve got a special offer. If you can swing the travel, we can swing the entrance fee. I have two free passes to give away on a first come, first serve basis. These are for new participants who have not yet registered but are interested in attending the event. You have to be committed and able to attend, so if you think you can swing it and would like the pass, please tweet the hash tag #DevHouston12 with your pass request and the link to the agenda (http://bit.ly/Pvityp) … for example, “I would like a pass to #DevHouston12 – check out the agenda http://bit.ly/Pvityp” – you must include the hash tag and a link that ends up on the agenda page. Our marketing director will monitor the hash tag and the first two to tweet and verify they can attend will get the passes.

Thanks!

Tuesday, September 18, 2012

Entity Framework: Expressing the Missing LINQ

I have worked on quite a few projects that use the Entity Framework. It is a powerful ORM and does quite a lot out of the box. I've worked with code-first, database-first, and every other flavor in between. If I were to name the one reason I believe developers enjoy working with LINQ the most, it would be the support for LINQ via LINQ to Entities.

Download Source Code for this post

What's not to like? You can query data in a very easy, straightforward manner that is consistent. Unfortunately, the Entity Framework can wreak havoc on an otherwise stable web application if it is not handled with care. There are a number of "gotchas" you will run into (updateable materialized views, anyone?) ranging from improper use of the context used to access the database to "features" of LINQ that can become defects in production. In this post I will focus on two very subtle LINQ problems I see people run into quite often.

It Will Be Deferred

The first is probably the easiest to understand and the fastest to catch in testing. Consider a very contrived model called Thing:

public class Thing
{
    public int Id { get; set; }
    public string Value { get; set; }
}

The code-first definition for a database of things is simple enough:

public class SampleContext : DbContext 
{
    public DbSet<Thing> Things { get; set; }

    protected override void OnModelCreating(DbModelBuilder modelBuilder)
    {
        modelBuilder.Conventions.Remove<PluralizingTableNameConvention>();
    }
}

I could use an initializer to then seed the data but I got really lazy and created a controller instead that does this (not recommended for production code but will work perfectly well to illustrate the points in this blog post):

public class ThingsController : ApiController
{
    public ThingsController()
    {
        using (var context = new SampleContext())
        {
            if (context.Things.Any()) return;

            for (var x = 0; x < 1000; x++)
            {
                var thing = new Thing
                                {
                                    Value = Guid.NewGuid().ToString()
                                };
                context.Things.Add(thing);
            }
            context.SaveChanges();
        }

    }
}

As you can see, I'm ensuring we have at least 1000 things to deal with. You'll notice this is an ApiController that makes it very easy to expose a REST service on top of the collection. In fact, I'm going to do just that - first with a method that returns the full list:

public IEnumerable<Thing> GetThings()
{
    using (var context = new SampleContext())
    {
        return (from thing in context.Things select thing);
    }
}

The Get convention will automatically map this to /api/Things and then I can get a nice list of them by navigating to the local URL, correct? Not quite. This is the first and most common mistake made: forgetting about deferred execution. The query is passed back to the controller, which faithfully tries to serialize it by enumerating the list ... only by that time, you have left the using block for the context and therefore the connection is closed. This will fail every time until you find a way to force execution before disposing the context ... the easiest way is by converting it to a list like this:

public IEnumerable<Thing> GetThings()
{
    using (var context = new SampleContext())
    {
        return (from thing in context.Things select thing).ToList();
    }
}

Casting to a list ensures it is enumerated (and thus executed) immediately to populate the list.

Expressive Functions

The second issue is far more subtle. It is very easy to get excited about using LINQ to access your queries, and it's not uncommon to have a strategy that involves passing in filters and order clauses to your repository. Consider a basic LINQ query that looks like this:

var query = from thing
                in context.Things
            orderby thing.Value                            
            select thing;

Now, what if you wanted to dynamically filter this based on various options? You might go down this path:

Func<Thing,bool> filter;

You can then assign the filter:

filter = thing => thing.Value.Contains("-e");

And attach it to the query, execute it, then return the result:

var modifiedQuery = query.Where(filter);
return modifiedQuery.ToList();

In fact, if you pull down the sample code from this link and use the following URL: http://localhost:XXXX/api/Things/?useFunction=true you will see the result (replace XXXX with your port number), something like this:

[{"Id":229,"Value":"031662bd-14be-4562-9b34-e13ab193b112"},{"Id":330,"Value":"04a35727-9b64-4b5d-99fd-e421fe7340d7"}...]

If you compare this to the full result set, you'll see the filter worked fine. Many developers will be satisfied at this point and move onto other things — even integration tests for the filter will likely pass. But is this doing what you want? If you are like me, you never trust the ORM. You always want to know how it is interpreting what you send so you will profile and trace and verify results. If you run a trace on the above code (in the sample project, it will write to the console in case you don't have a profiler handy) you'll find the query that is passed to SQL looks like this:

SELECT 
[Extent1].[Id] AS [Id], 
[Extent1].[Value] AS [Value]
FROM [dbo].[Thing] AS [Extent1]
ORDER BY [Extent1].[Value] ASC

At this point you probably see the problem. We're dealing with 1000 records in this example, but what would happen if we had 1000000? The problem is the query is loading all records from the database, then filtering the resulting list. No matter how clever your filter is, you are always pulling the entire table and then using LINQ to Objects to filter it in memory. Probably won't scale, you think?

The solution is very simple. What you created above was a function that is passed to the query. The LINQ to Entities provider doesn't know how to map a function to the database, so it handles the part it understands and then applies the function. (As a reader was kind enough to comment below, technically the filter never gets passed to the Entity Framework ... there is an extension method for expressions, but not functions, so the query is cast to an enumerable after the call to EF and then the filter is applied.) There is only one change you need to make for this to work:

Expression<Func<Thing,bool>> filter;
That's it! Change the definition. You can assign and execute it exactly the same way:
filter = thing => thing.Value.Contains("-e");
var modifiedQuery = query.Where(filter);
return modifiedQuery.ToList();

While the returned result is the same, the SQL is slightly different. Hit the URL http://localhost:XXXX/api/Things?useFunction=false instead, and what you'll find is:

SELECT 
[Extent1].[Id] AS [Id], 
[Extent1].[Value] AS [Value]
FROM [dbo].[Thing] AS [Extent1]
WHERE [Extent1].[Value] LIKE N'%-e%'
ORDER BY [Extent1].[Value] ASC

Now the filter is being passed to SQL, so it can do what SQL is good at: filtering and ordering records. It's a subtle but important difference. In both cases, you designated the filter using a lambda expression, but the first forced it into a function whereas the second method loaded it into an expression tree. The LINQ to SQL provider does know how to parse an expression tree and map it to SQL, so you get the desired result by using LINQ to Entities and SQL rather than LINQ to Objects as a fallback.

Summary

The bottom line is that ORM tools make it very easy to interact with data, and even easier to have side effects that can lead to performance issues and production defects. It is very important to understand the technology you are working with and dig into the nuances of how LINQ works under the covers and the relationship between delegates, lambda expressions, and expression trees. I've heard it come up in interviews that I sometimes ask "academic" or "textbook" questions, but more often than not those questions are related to real world scenarios and knowing the answer is the difference between 20 and 1000000 records in a single database call.

Download Source Code for this post

Jeremy Likness