Things are changing as concept of Big Data is getting more mature during 2014. Here are few imaginary headlines i wanted to comment on:
- Big Data likes cloud?
- The difficulty of choosing right Hadoop distribution puts project on hold?
- Analytic applications are driving what we can do with Big Data?
- ETL processes and MDM will disappear?
Notice the question marks after each topic.
Big Data likes cloud?
Yes, i think so. I would not imagine the wasted effort while setting up dedicated physical environment for Big Data. Also the speed of implementation and capability to expand environment are much faster with cloud. Many cloud vendors even offer predefined virtual set-ups, which will make the implementation really convenient.
The difficulty of choosing right Hadoop distribution puts project on hold?
How to choose the right one is tricky. No doubt that some consolidation in the market will happen, but that is not excuse good enough for waiting. I don’t see this really as a problem, if you can set up new environment with different distribution and move your activities there quite smoothly. So start fast and learn by doing. If you chose wrong, then choose again and still you have moved forward all the time with richer data and better understanding of it.
Analytic applications are driving what we can do with Big Data?
Not really, if you cannot figure out what you want to do with Big Data, I doubt that any tool will help you to create that insight either. I believe that as more information comes available business needs will drive the quest of finding right connections between different data entities. This will make companies capable to enjoy the real value of their data assets. So innovation comes from the need for learning and trying it out, not through any set of tools.
ETL processes and MDM will disappear?
Well firstly, you need get data in some how. Amount of different sources is just increasing. I believe that we need to introduce some new ways to speed up access to new data sources like log files and such, but many old smaller volume sources remain. Quality assurance and data cleaning remain important to keep core data up to date and usable. Master data management remains important too. You need to have something reliable where to map the information collected from different unstructured sources. This mapping will provide you with the context and makes new information useful.
Most companies have plenty of unstructured data already. It does not have to be called Big. The important thing is that it has not so far been recognized to be useful or organizations have been lacking tools to cope with it. So it is like hidden treasure waiting to be discovered. Now the tools are there and challenge is just to be innovative enough to find good use of this hidden treasure.