Big Data – Who will find the patterns of value?

May 8, 2012

Big Data has great promise – no doubt that we are on the brink of discovering innovative ways to make a lot of money due to the ability to consume and analyze a greater volume, velocity, and variety of data than ever before.

The question, however, is who will make these innovative discoveries?

Can the innovation be made by a kid in a garage, a college kid in his dorm room, or a talented IT professional dreaming of a start-up in his free time?

On the one hand, the answer is NO – the college kid in the dorm will NOT be changing the world via big data the way Mark Zuckerberg did with social networking. Why? The reason quite simply is because “Big Data” requires DATA and a lot of it and in potentially many different forms.

Data is not free

In fact, there are billion dollar businesses that are essentially data companies – they sell raw data (and analytics on it). I’ve even worked at one in my time. OK – Some of that raw data is free but much of the data is not. The sheer volume of data is prohibitively expensive to acquire for a college kid to start to experiment with. Additionally, some big data use cases revolve around social data that sites like Facebook can use to make money (data is why Facebook has such a high valuation). Did you think Facebook gives away their data for free? If the college kid can’t even get his hands on the data then the game is over before it even starts.

On the bright side, however, cloud computing infrastructure as a service and pay-as-you-go pricing provides affordable infrastructure for the college kid. Powerful infrastructure is also a pre-requisite for Big Data – but it’s not a barrier for the college kid.

It goes back – again – to access to data.

The majority of the big data discoveries will come from big established companies with the resources to acquire it.

Agree or disagree? Actually, proving me wrong would make me happy.


4 Ways IaaS Cloud Computing Will Reduce Your Costs

February 28, 2012

1. Disaster Recovery – Cloud providers such as Amazon AWS offering “pay as you go” pricing enable reduced cost for disaster recovery. Essentially, one only pays when disaster happens and a recovery is needed. To be more accurate, the 24/7 activity of replication and storage of data from the production environment to the DR environment is the fixed cost. At the same time, however, the application and data servers do not cost a single penny unless a disaster happens, in which case the servers are started up. Even if a disaster lasts for months (e.g. Katrina), this is still considerably less expensive than an in-house data center that must purchase all the hardware upfront for the application and data servers.

2. Batch Computing – Batch applications often follow a predictable “batch window” of high and low processing requirements. For example, nightly batch processes may require 1000 servers of processing to complete it’s processing from 12am-8am before the next business day starts. These 1000 servers must be purchased up front and may not be used (or used very little) during day-time business hours, resulting in a very low CPU utilization rate of 33% (8/24).  With IaaS cloud computing and the ability to scale (or auto-scale) when needed, the CPU utilization rate is theoretically 100% or realistically at least in the 90s. Major savings.

3. Short-term Web Site – For example, a marketing professional may create a dedicated web site for a product. If that web-site is mentioned in a commercial  during the Super Bowl with 100M+ viewers, there a good chance that web site will get hammered, potentially with 100’s or 1000’s or more  unique hits within a few minutes, potentially requiring 100’s or 1000’s of servers. A few days after the Super Bowl, the  marketing web site requires 2 servers for the rest of the year. Again, with a pay as you go and auto-scaling capability, the cost savings in comparison to traditionally purchasing all the equipment up front are through the roof.

4. Test & Dev – Cloud computing is also cost effective for test and development environments that may not need to be running 24/7. Again, pay only for the time the system is running.

IaaS Cloud Computing will not always reduce costs

It’s important to call out that IaaS Cloud computing may not be cost effective for large business steady state workloads for many use cases and may even be more expensive.  I predict, however, this will change due to improved automation capabilities that enable IT Operation teams to perform more efficiently. The technology of automation capabilities are still lacking and not yet mature enough to provide real steady-state savings. Examples of automation capabilities include automated patches, backups, database replication (e.g. Amazon AWS RDS), and the ability to quickly deploy and configure  a complex, integrated environment of web, application,  data and network components components in an automated fashion. Again, the tools exist but are still several years until mainstream adoption in my opinion. Put another way:

Sufficiently mature and integrated automation capabilities will be the tipping point for mainstream enterprise adoption of IaaS Clound Computing. We are still several years away from this reality. Do you agree or disagree? Your thoughts are welcome.