Friday, July 2, 2010

Introducing Sterling, the Object-Oriented Database for Silverlight and Windows Phone 7

Today, I'm excited to share a project that I've been working on for some time now. The project, called Sterling (for Sterling Silverlight, of course!) is a very light weight approach to handling serialization and deserialization to and from isolated storage. I call it an object-oriented database because it provides LINQ to Object queries over keys and indexes.

I guess the timing works well ... I was recently honored with the 2010 Microsoft Most Valuable Professional (MVP) award for my work with Silverlight. Much of this award is about what we do for the community, and this is one of my first major efforts to put a utility and open source project out there. Lots of hours went into this but I hope there will be great benefits for those of you who find value in using it.

Sterling is not yet released and I don't anticipate we'll have an alpha ready until late July. I work on many projects for my company, so this effort has been primarily late nights or snatches during lunch and breaks and will probably continue to be so. I'm excited that some members of the community have offered to reach out and help push development forward.

First, let me emphasize that Sterling is not intended as a replacement for a robust, transactional database system. There are some commercial and non-commercial solutions that are optimized for performance and massive scale. I wrote Sterling to be a simple serialization engine to avoid having to deal with the goop of writing my own serializers all of the time. I also know that even if you have a simple list of contacts, querying the contact names is going to be a priority over deserializing all of them, so I built a key and index infrastructure to facilitate storing key items in memory so you can bind to lists and combo-boxes without de-serializing the entire object.

Having said all of this, while a lot of more powerful features may not exist, my goal was to hit 80% of the needs with 20% of the footprint. Not only is Sterling incredibly lightweight, but extensible and available on Windows Phone 7.

Here are the key goals I had in mind:

Non-intrusive

I don't like mangling my classes to persist them. I didn't want to force anyone to inherit from a base class or decorate the class with attributes to make it work. I believe I've achieved this goal. Defining a "table" to sterling is as simple as passing a type and a lambda expression that returns the key:

public override List<ITableDefinition> _RegisterTables()
{
 return new List<ITableDefinition>
                       {
                           CreateTableDefinition<Contact,string>(c => c.Email)
                       }
}

I can even do this for sealed classes or third-party classes - whatever your code can see, Sterling can, too. Sterling automatically serializes the same values that the BinaryWriter class supports. If you need something custom, no problem - just define a serializer of your own and register it with Sterling.

Lightweight

I wanted Sterling to fit on the Windows Phone 7 and also to facilitate building projects without bloating them. I purposefully kept it lightweight. Currently the DLL weighs in at only 70 kilobytes, which I believe is very trivial compared to some solutions I've seen.

Flexible and Portable

Again, these are both features that collapse into the previous items. By keeping it flexible, I can accommodate needs I didn't know at design time. There is a very loose interface for logging and for extending serialization. This allows you to do pretty much whatever you like, without having to think about the underpinnings of setting up tables and folders and checking if they exist, etc. Portability means it was easy to build on Windows Phone 7 and should be very easy to bring forward into future versions of Silverlight.

The Reference Application

The easiest way to get to know Sterling is by the reference application. I will warn you that it takes a long time to build because of the time it takes to initially serialize. I decided a good test case would be the USRDA nutrient database. It contains over 500,000 data elements. I had to write some parsers to take the source text-based database files and turn them into data objects that Sterling could deal with, but once the conversion is done, you can see the power of how Sterling operates.

The left column shows a custom logger I made that is spitting out Sterling log information. The data models were straightforward. A food group is a general "category" for foods to fall under. The food description is an actual food item, and contains a list of nutrient data elements (things like calories, protein, vitamin A, etc). These point to a nutrient definition.

public class FoodGroup
{
    public int Id { get; set; }
    public string GroupName { get; set; }
}

public class NutrientDefinition
{
    public int Id { get; set; }

    public string UnitOfMeasure { get; set; }

    public string Tag { get; set; }

    public string Description { get; set; }

    public int SortOrder { get; set; }
}

public struct NutrientDataElement
{
    public int NutrientDefinitionId { get; set; }
    public double AmountPerHundredGrams { get; set; }
}

public class FoodDescription
{
    public FoodDescription()
    {
        Nutrients = new List();
    }

    public int Id { get; set; }

    public int FoodGroupId { get; set; }

    public string Description { get; set; }

    public string Abbreviated { get; set; }

    public string CommonName { get; set; }

    public string Manufacturer { get; set; }

    public string InedibleParts { get; set; }

    public double PctRefuse { get; set; }

    public string ScientificName { get; set; }

    public double NitrogenFactor { get; set; }

    public double ProteinCalories { get; set; }

    public double FatCalories { get; set; }

    public double CarbohydrateCalories { get; set; }

    public List&t;NutrientDataElement> Nutrients { get; set; }
}

Because I have a struct for the nutrient information, I had to provide a custom serializer:

public class FoodSerializer : BaseSerializer  
{
    public override bool CanSerialize(Type targetType)
    {
        return targetType.Equals(typeof (NutrientDataElement));                
    }

    public override void Serialize(object target, BinaryWriter writer)
    {
        var data = (NutrientDataElement)target;
        writer.Write(data.NutrientDefinitionId);
        writer.Write(data.AmountPerHundredGrams);
    }

    public override object Deserialize(Type type, BinaryReader reader)
    {
        return new NutrientDataElement
                    {
                        NutrientDefinitionId = reader.ReadInt32(),
                        AmountPerHundredGrams = reader.ReadDouble()
                    };
    }
}

As you can see, fast and easy to do. Defining the tables took a little bit of thought. I wanted a "covered index" for food groups so I wouldn't have to de-serialize them at all. For the food descriptions, I needed an index on description and food group for fast filtering and sorting. Finally, the nutrient definitions provided an index for unit of measure and sort order (so they sort consistently in each food item).

public class FoodDatabase : BaseDatabaseInstance
{
    public override string Name
    {
        get { return "Type Database"; }
    }

    public const string FOOD_GROUP_NAME = "FoodGroup_GroupName";
    public const string FOOD_DESCRIPTION_DESC_GROUP = "FoodDescription_Description_Group";
    public const string NUTR_DEFINITION_UNITS_DESC = "NutrientDefinition_Units_Description";
    public const string NUTR_DEFINITION_SORT = "NutrientDefinition_Sort";
        
    protected override List<ITableDefinition> _RegisterTables()
    {
        return new List<ITableDefinition>
                    {
                        CreateTableDefinition<FoodGroup, int>(fg => fg.Id)
                            .WithIndex<FoodGroup, string, int>(FOOD_GROUP_NAME, fg => fg.GroupName),
                        CreateTableDefinition<FoodDescription, int>(fd => fd.Id)
                            .WithIndex<FoodDescription, string, int, int>(FOOD_DESCRIPTION_DESC_GROUP,
                                                                            fd =>
                                                                            Tuple.Create(fd.Description, fd.FoodGroupId)),
                        CreateTableDefinition<NutrientDefinition,int>(nd=>nd.Id)
                            .WithIndex<NutrientDefinition,string,string,int>(NUTR_DEFINITION_UNITS_DESC,
                            nd=>Tuple.Create(nd.UnitOfMeasure,nd.Description))
                            .WithIndex<NutrientDefinition,int,int>(NUTR_DEFINITION_SORT,
                            nd=>nd.SortOrder)
                    };
    }
}

The main view model allows you to select groups and enter search terms, displays the food items, and then will show a chart breaking out nutrition information when you click on the food item. The food group list is queried using the index like this:

public IEnumerable<FoodGroup> FoodGroups
{
    get
    {
        return DesignerProperties.IsInDesignTool
                    ? _samples.AsEnumerable()
                    : from fg in
                            SterlingService.Current.Database.Query<FoodGroup, string, int>(
                                FoodDatabase.FOOD_GROUP_NAME)
                        select new FoodGroup {Id = fg.Key, GroupName = fg.Index};
    }
}

Note that I have a default list for design-time, otherwise I run the actual query.

When you hover over a food group, I supply a tool-tip to show the number of food items in that group. I decided to go ahead and store these in a dictionary after querying the first time, but only because I know they won't change. The converter looks like this:

private readonly Dictionary<int,int> _foodCounts = new Dictionary<int,int>();

public object Convert(object value, Type targetType, object parameter, CultureInfo culture)
{
    var count = DesignerProperties.IsInDesignTool ? 500 : 0;
            
    var foodGroup = value as FoodGroup;
    if (foodGroup != null && !DesignerProperties.IsInDesignTool)
    {
        if (_foodCounts.ContainsKey(foodGroup.Id))
        {
            count = _foodCounts[foodGroup.Id];
        }
        else
        {
            count =
                (from index in
                        SterlingService.Current.Database.Query<FoodDescription, string, int, int>(
                            FoodDatabase.FOOD_DESCRIPTION_DESC_GROUP)
                    where index.Index.Item2.Equals(foodGroup.Id)
                    select index).Count();
            _foodCounts.Add(foodGroup.Id,count);
        }
    }

    return string.Format("There are {0} food items in this food group.", count);
}

Here you can see I'm providing a design-time default. If I'm not in the designer, then I go ahead and calculate the amount and save it using the index I created. The index has two values (description and food group key) so I use a Tuple to access the value. None of this requires any de-serialization because I'm only touching the index.

Food group counts

The search view model uses the same index to pull food items. The query is a bit more complex. I'm not allowing you to search all 10,000 food items. You must narrow it to a category or enter at least three characters of a search (the search is a containing search, not a "starts with" or "ends with").

public IEnumerable<FoodDescriptionIndex> SearchResults
{
    get
    {
        if (DesignerProperties.IsInDesignTool)
            return _sampleDescriptions.AsEnumerable();

        if (_currentGroup != null)
        {
            if (string.IsNullOrEmpty(_searchText) || _searchText.Length < 3)
            {
                var query1 = from fg in
                                    SterlingService.Current.Database.Query
                                    <FoodDescription, string, int, int>(
                                        FoodDatabase.FOOD_DESCRIPTION_DESC_GROUP)
                                where
                                    fg.Index.Item2.Equals(_currentGroup.Id)
                                select
                                    new FoodDescriptionIndex {Id = fg.Key, Description = fg.Index.Item1};

                return query1.Count() == 0 ? _noResults.AsEnumerable() : query1;
            }

            // group and search text)
            var query2 = from fg in
                        SterlingService.Current.Database.Query
                        <FoodDescription, string, int, int>(
                            FoodDatabase.FOOD_DESCRIPTION_DESC_GROUP)
                    where
                        fg.Index.Item2.Equals(_currentGroup.Id) &&
                        fg.Index.Item1.ToUpperInvariant().Contains(_searchText.ToUpperInvariant())
                    select
                        new FoodDescriptionIndex {Id = fg.Key, Description = fg.Index.Item1};
                        
            return query2.Count() == 0 ? _noResults.AsEnumerable() : query2;
        }

        if (string.IsNullOrEmpty(_searchText) || _searchText.Length < 3)
            return _noResults.AsEnumerable();

        var query3 = from fg in
                    SterlingService.Current.Database.Query
                    <FoodDescription, string, int, int>(
                        FoodDatabase.FOOD_DESCRIPTION_DESC_GROUP)
                where
                    fg.Index.Item1.ToUpperInvariant().Contains(_searchText.ToUpperInvariant())
                select
                    new FoodDescriptionIndex { Id = fg.Key, Description = fg.Index.Item1 };
        return query3.Count() == 0 ? _noResults.AsEnumerable() : query3; 
    }
}

Food items

Notice how I'm using a "covered index" (covered for the items I need, which include the key and description) to query and return a list of types that are bound to the list box. You can see when you run the example this happens very fast. Finally, when you click on the item, I am always deserializing rather than trying to cache a ton of objects.

The food description context class synchronizes the current food item between view models. Notice when it is passed a value, it loads the new item:

public class FoodDescriptionContext : BaseNotify 
{
    public static FoodDescriptionContext Current = new FoodDescriptionContext();

    public FoodDescription CurrentFoodDescription { get; private set; }

    private int _foodDescriptionId; 

    public int FoodDescriptionId
    {
        get { return _foodDescriptionId; }
        set
        {
            _foodDescriptionId = value;
            CurrentFoodDescription = SterlingService.Current.Database.Load<FoodDescription>(value);
            RaisePropertyChanged(()=>CurrentFoodDescription);
            RaisePropertyChanged(()=>FoodDescriptionId);
        }
    }

}

This isn't thread-safe, but does it need to be? The user can only click on one item at a time.

Food detail

Finally, the whole engine is configured using an application service.

There's a lot more to it but I wanted to get this out there and have people start looking at it to provide me with feedback. While there is not yet a release, you can visit the Sterling Codeplex site to download the code (pre-alpha, so use at your own risk, right?) and build/test/integrate on your own. Let me know what you like and don't like and if you are interested in being a serious beta tester, and we'll see what we can do to release a solid version 1.

(PS, I took a simple list project for the Windows Phone 7 just to prove the engine works there - I simply save the list to the database then bind to the query - but I'm looking for a more comprehensive example there, so volunteers are welcome to work on that as well!)

Jeremy Likness