Sunday, November 07, 2010

SQL Azure Federation: Horizontal Scaling in Cloud !!

  One of the concerns I hear a lot about Azure is the need for users to select the DB size when signing up for a SQL Azure instance. The maximum DB size that you can sign up for is 50 GB currently, and  makes lot of peopel worried about the scalability of SQL Azure. 50GB may be good enough for most of small and medium size web applciations, it's nowhere near what many large websites, LOB applications and data warehouses need.

     Microsoft's solution to this problem was something called  "Sharding".  Sharding is a technique that has been in existence for long time now and supports horizontal partitioning of Databases. Essentailly, it requires you to create a bunch of Azure DBs, treat each one as a separate partition, and programmatically direct your query to the correct partition “shard”. If it's a complex query, you will have to do the hard work of breaking it up based on your partition key, and redirect to right partition, and merge the result back.
   Here is one article explains how to sclae out SQL Azure using Horizontal partitioning. The partitioning logic is implemented in Data Access Layer  using LINQ. Painful!!

  Although this solution works, but expecting developers to write their own  logic to manage partitions, redirect queries, etc is little over the edge. Since most of the on-premise DBs provide this feature out of box, it was a big hurdle in SQL Azure adoption by large companies. Microsoft unvieled a much elegant solution duirng PDC.
   It's called "SQL Azure Federation" . It is planned to be released in early 2011.

   SQL Azure will provide support for explicit horizontal partitioning complete with support for new T-SQL keywords and commands like CREATE/USE/ALTER FEDERATION and CREATE TABLE...FEDERATE ON. Once you have setup the right federation key, re-directing queries to correct "shard" is taken care by the SQL Azure Engine.

  This new feature essentially makes the 50GB size limitation almost irrelevant for most of the data storage requirements. This coupled with the elastic "provisioning" nature of the cloud will make it a compelling alternative for many organizatiosn out there who are dealing with large datasets and scalability issues. I, for one, cannot wait to try this out.

Here is the actual session by "Lev Novik" from the PDC titled "Building Scale-Out Database Solutions on SQL Azure":

Cheers!!

No comments: