With so much being published on big data every day, it’s easy to get swept up thinking about big data success without considering the challenges of successful big data implementation. Cloud computing is big data’s natural playground, so we thought it would be useful to put together a list of the must-dos with regards to making your big data strategy a success with the cloud.
1. Understand the scope: As Doug Henschen notes, many of the problems associated with getting value out of a big data strategy come from not fully understanding the scope of your goals, and what those really require. Taking a “boil the ocean” approach to a project might seem to suit “big” data applications, but matching resources to benchmarks is the smarter approach to ensuring that investments made in both the underlying infrastructure and the big data use cases yield the kind of results that merit an increased footprint.
2. Build your foundation for efficiency: Big data is only going to get bigger, so provisioning the correct amount of the right cloud resources is key to ensuring that any big data project can achieve the value its investments require. Less than half of current big data projects are being completed, with almost 60% showing difficulties because the scope of the project was not appropriately planned for, an Infochimps survey has found it’s important to match resources to need, especially when different levels of a Hadoop cluster demand different tools to work as efficiently as possible.
3. Understand your need for speed: How fast do you need to turn your data into something? This question might seem simple on the surface, but it actually points to a larger issue that cloud computing, in particular, is well suited to handle. Big data grows exponentially, so the compute resources being brought to bear in organizing and running the data have to be similarly exponential. The only platform to provide that level of scalability is cloud.
4. Get a big bucket: Big data means petabytes upon petabytes — and petabytes — of data. And that points to what is probably the biggest benefit cloud can provide for the purpose of big data — its ability to handle all that information. Depending on your needs, it may be difficult to quantify your data-handling needs right off the bat. You may not have much data on Day 1, or you might have a petabytes that must be stored. It’s important to consider what you are bringing to the table compared to what you expect to eventually have, particularly as you strategize how your infrastructure can store the volumes of data in the future. When you have an open-ended metric such as volume, you’re either in the business of being a storage provider for yourself (which can become exponentially more expensive over time) or you’re going to a cloud tool that lets you consume it as you grow for a more predictable cost (all the while letting somebody else worry about the problem of how to handle the rate of growth).
5. Plan ahead: Can your current infrastructure handle the amount of data your organization is currently collecting? What about next year? The year after that? The reality is that big data is only going to get bigger. The amount of data consumed between 2011 and 2012 grew, and growth in data volume is going to continue its upward climb. Is your infrastructure properly configured to handle the necessary parallel growth that must occur to keep up with the data’s requirements? Often, working with a managed service provider that understands big data requirements is your best bet to planning and implementing an infrastructure that will enable your company to match growth with strategy.
Overall, as Ben Kepes has put it: “Big data is going to be an important industry shift over the next few years, but users need to have products that allow it to harness big data insights rapidly and outside of their existing technology kit.”
By Jake Gardner