Sunday, June 5, 2011

The Sterling NoSQL Database in a Mango World

I was at the MIX 2011 event. There was an "Open Source Fest" before the event and I came to showcase the Sterling NoSQL Database project that I run (and code most of, although there have been numerous enhancements and patches now added by a growing team of fantastic supporters). I was happy to speak to several people who hadn't heard of the database before, not because it was unknown but because of the opportunity to introduce it to others.

There seems to be a common impression that Sterling is a database specifically designed to fill a hole on the Windows Phone 7 platform - namely, the lack of a database. That's actually not entirely true. The database was originally created to fill that hole for Silverlight in the web browser. It happened that the platform shared so much with the phone that it was an easy port (perhaps a few dozen lines of code to tweak, mostly to close the gap between the Silverlight 4 version and the Silverlight 3-ish that the phone presented at the time).

Once on the phone, it filled another gap I hadn't anticipated: the dreaded tombstone operation. People were struggling to find a consistent, easy way to serialize state and Sterling, able to handle nearly any type and depth of object graph and capable of serializing it without all of the ceremony of deriving from a custom class or splashing attributes through all of your code, was happy to oblige. It was definitely the phone that pushed its popularity.

Once people began downloading it, I started to receive a barage of questions. "What about on the server? I want something super lightweight that is object-based, not relational." "What about the local file system instead of isolated storage for out of browser applications?"

I began working on these changes. In the meantime, I wrote an article for MSDN about Sterling on the phone, was interviewed by Jesse Liberty about it and even conducted an MSDN geekSpeak about Sterling. And then it was announced: "Windows Phone 7.1 dubbed Mango will have SQL CE."

You'd almost think I'd invested everything I owned in Sterling by the way people tiptoed around me at the conference. "Jeremy, did you ... um ... hear?" "What are you going to do?"

I've been asked that enough that I wanted to write a quick little post to share what's going on. The momentum for Sterling has been picking up. If you check out the release notes for the 1.5 version, you'll see that most of the changes had nothing to do with the phone at all. In fact, 1.5 was all about reach: I brought Sterling to the server by popular demand, refactored the serialization model so the persistence model could be completely separate and based on "drivers," and then wrote drivers for memory, isolated storage, file system for elevated trust, isolated storage on the phone, and the local file system for the desktop or .Net 4.0 version. The serialization routine was optimized and extended to handle more cases than ever before and some refactoring removed some dependencies to make it even more flexible to work with your existing types (for example, you can now specify what you would like your "ignore this property" attribute to be, and proxy it to, say, an existing XmlIgnore attribute or something completely different).

The truth is, I'm excited that SQL CE has come to the phone. I never doubted it would, and I never had a goal to compete with it. That is years of brilliant developers using experience across a vast array of applications in the field to tweak and improve and create a compact, blazingly fast means of persistence on a small footprint. Not my niche. People may even be more surprised when they hear me say if it works for them, seems simple and easy and their applications use it without issue, then GO FOR IT. I'm not trying to "sell" you on Sterling (that would be rather tough for a free open source project, no?)

So what IS the story for Sterling on the phone in a Mango world? I think it's quite powerful, personally.

Is it a 2 million row story? No. 9 times out of 10 when you have that much data on your phone I think you're writing it the wrong way. What user could possibly process, need, or use that much data and absolutely HAVE to access it when offline? Most of the time a service model with a local cache holding the juicy tidbits should do just fine (and I bet you can guess what I think a good candidate for that cache will be). In the 1 out of 10 cases where it has to be that way, yes, please, find a better way to do it because I didn't write Sterling for that.

The story Sterling brings to the phone is this: extremely simple. If you already have classes that contain other classes and they reference (even with circular references) or are five levels deep and contain custom enumerations based on certain value types, great. Pass the top level object to Sterling and I'm confident it will handle it. If not, write a custom serializer for the tough parts and you're good to go.

The story is a model/behavior-based one. Sterling is about turning an object graph into bytes and back, with in-memory insights into your data to use fast queries and navigate where you need to go. You can share models between the client and server, take the bytes from one database and plop it somewhere else. The classes can have tons of behavior and Sterling won't mind - it navigates the data and worries about bringing that back. Imagine tombstoning with SQL CE and defining a set of tables with name/value pairs that you must convert into strings and back every time ... now imagine having a Tombstone<T> with a SetObject and GetObject<T> method to put any type of class there and save it ... and it works. That's the goal.

Finally, the story is about lightweight and flexible. Take a look at the binaries. Sterling is tiny - less than a few hundred kilobytes of data and lines of code in the hundreds, not tens of thousands. It has numerous extension points to allow you to plug in encryption, compression, create a custom driver that works with Azure or define your own triggers and database constraints. It's all there in a compact framework.

I'm not saying these things to brag and by no means is the system perfect. There's a lot of work to do but the community has really stood behind the implementations of Sterling to date and helped drive features. One benefit for Sterling I hadn't even imagined is the ability to create shared libraries because the API on the client, server, phone, or browser can be 100% consistent even if the underlying method of persistence is actually memory, isolated storage or the file system.

So I do think Sterling will continue to fill a gap that SQL CE doesn't - and I see some hybrid applications that use SQL CE to index references to objects and Sterling to actually serialize their complex graphs. The best of both worlds - there's no reason why the two can't work together.

What's on the road map for Sterling?

You can take a look and vote, add your own features and see the popularity of existing features here. Based on that and internal work, here's what some ideas are that the team is considering for the 2.0 release (and this will be influenced mostly by the votes on the features page, so make sure your voice is heard!)

  • Lazy loading of child objects - exploring a way to do this without using proxies or dynamic types
  • Read-only version - build a massive database, save it as a resource and then instead of saving to isolated storage, use a read-only driver to access it immediately (so no delay moving the resource into isolated storage, for example)
  • Azure drivers to use table storage
  • Internal optimizations to use a single file on disk rather than multiple files as the current version does
  • More hooks into the serialization engine to serialize types and get byte arrays without having to participate in saving
  • Dynamic table, key, and index definitions that will build these on the fly and persist them between sessions
  • Better concurrency for file-system based applications
  • Built-in synchronization (possibly through Microsoft's Synch Framework) for "sometimes connected" applications
  • Facades to expose Sterling as a cache engine
  • LINQPad support
  • MonoTouch support (same API, different platform)
  • Schema support - handling classes and types that change over time out of the box
  • ...and I haven't forgotten, I need to update the extensive 1.0 documentation to reflect the new 1.5

These are just a few items. Obviously I'm just one person with a lot of projects. The team is slowly growing as is the audience using it, so I appreciate both your ideas and your assistence if you decide you'd like to be part of the team and can help out. I think there is a rich future and right now is a great time as the team decides what 2.0 will look like.

Take care and check out the latest version - 1.5 - right here.

Jeremy Likness

6 comments:

  1. Jeremy

    Would Sterling be suitable for an embedded scenario in a desktop application? If so, is there any information about the storage size limits, performance with fairly large object graphs (let's say 10-50 MB or so). Also is there any information about how to handle changes to the serialised objects?

    Many thanks

    Sean

    ReplyDelete
  2. Sterling was specifically designed to handle the scenario of using isolated storage and traveling with the client, so your scenario should be fine. There are so many variances of what "50 megabytes" of data means depending on how complex your object graph is - 50 MB of images is far different than 50 MB of latitude/longitude coordinates in a tracking application, so it's tough to say either way. I'm not intimidated by 50 MB and know customers have pushed to 10k-50k rows in applications. Sterling is NOT designed currently to handle millions of rows. Storage size is limited based on the available quotas (isolated storage) or disk space (elevated trust driver for OOB applications). What do you mean about changes? Sterling handles updates fine, even deep on the object graph, and has a predicate you can send to mark things as dirty for avoiding redundant saves.

    ReplyDelete
  3. The larger objects will contain blobs and not be a large graph of smaller items. When the "row count" grows we will be archiving off the less used data, so we don't want millions of rows, maybe a few thousand.

    What I mean by changes is how do I handle the scenario when there is data out there on the client's machine and I update the application with a new object structure. How do I go about converting the old data into the new structure?

    ReplyDelete
  4. Here is more discussion around that:
    http://sterling.codeplex.com/discussions/239428

    ReplyDelete
  5. I just ported my 7.0 app to Mango - I have tried the new SQL CE and it was OK - to a point. I have decided to retain Sterling as I already have a large number of classes containing lists of objects in place. Also, unit tests on speed were most definitely in the favour of Sterling.

    Keep up the fantastic work Jeremy!

    ReplyDelete