Monday, March 30, 2009

Open Cloud Manifesto

The Register just put up details of the Open Cloud Manifesto. This document can be summarised as "clouds are great, clouds with open-standards would be better". Common standards for security, data interoperability and portability, metering, monitoring and management. This is obviously a good idea "lets vote for open standards", but it is a tall order. Microsoft is miffed because (a) it wasn't invited to the manifesto-drafting tea-party and (b) because it is early to be drafting standards. It may have a point on (b) certainly for the M3 part (metering, monitoring and management). However two concerns are so critical to cloud-users that they deserve special accelerated attention:
  • Data portability ("Your cloud is useful, but I want to be able to get my data back!") - there's a working group dataportability.org that has already garnered some support (Google, Microsoft, etc) mostly driven by the social networks. Myspace, Google and Facebook are attempting to be the central-site that maintains the "record" of your social graph (who you know, what you like, what groups you join, how you communicate). They want your data to sell advertising plain and simple. For this, they provide some neat services, but lets be clear your data is the crown jewels and they are interested enough in allowing other sites to access your friend graph, but in a kind of hands-off way - no caching - kind of way. This isn't really portability. So this concern is still very much valid.
  • Security - how do I control access between me and the cloud and between the clouds themselves that manage my data. Luckily OAuth is now effectively a de facto standard here and so it is ready to be blessed as the official cloud standard.
However, the cloud manifesto goes further than open standards. It its section "Principles of an Open Cloud" point 2 is:
Cloud providers must not use their market position to lock customers into their particular platforms and limiting their choice of providers.
and pigs might fly.



Tuesday, March 24, 2009

SimpleDB

Started playing around with Amazon's DB. It's still in beta and there are a few niggly restrictions, but it is usable and "does what it says on the tin". It's a simple cloud-based storage mechanism for storing single rows of arbitrary data. The emphasis is on simple - single table stuff. So this is not going to be a replacement for your well structured database. However, one of the things that you notice when your system starts getting a lot of traffic is that there is an awful lot of data that is simply flowing over your servers that you would really like to store, report on and build some new cool things with. If only you had somewhere to collate and store this information. Enter Simp[leDB. Sure I could invest time and energy in bringing a data warehouse on line (if I could get it into the production setup) - which would certainly address some of simpleDB's usage restrictions, but this sort of task is exploratory. I want the freedom to start instrumenting and collecting data basically on a whim without incurring significant setup costs. Some stats will definitely be useful, plugging holes for current decision-making, others are more of the "suck-it-n-see" variety. Using simple DB solves two basic problems for me over just using one of the (many) mysql db's available to me:
  • There's an overhead with storing information in a production database. It needs to be properly setup in terms of disaster recovery, this needs to be tested. It needs to have a pruning and archiving schedule negotiated with the support people. Then there's the testing. This is a lot of pro-forma setup stuff which really is not what I need to have to invest in upfront. Sure if this thing takes off or outgrows SimpleDB's restrictions, then I can revisit this, but for now I don't want any roadblocks.
  • Neatly sidestepping organisational boundaries. We have many environments where I would like to gather data, but if I have to negotiate with each environment for storage, then I'm going nowhere fast, but moving the problem outside the network neatly sidesteps these issues. It also provides a natural place for me to centralise the information, something which for network partitioning issues would also be problematic to gain approval for. 
In my mind these are both agile issues (start prototyping quickly and not having to get too much organisational buy-in to get started) which having an external cloud-based mechanism neatly solves.