I have been working with heavily data related industries quite awhile. Just thinking back, i realized a pattern how focus has shifted over time. Maybe similar development is applicable to other developing areas too.
Starting with “which”
Around millennium the focus of discussion seemed to be heavily on what to use for handling the data. Companies with large data amounts (actually ridiculously small compared to today’s volumes) were comparing database products and hardware. They wanted to understand if the tools observed were capable of handling the volume. Different purposes demanded different solution to reach optimal end result. Licenses were expensive.
Moving to “how”
After problems with volume were solved, the focus moved to how to handle data. During this phase discussion was around the tools to be used. Data was moving from unstructured to structured and back. Main interest still was on selecting suitable applications to do the needed analysis. Also free tools started to appear.
Now we are finally moving to situation where we can actually focus on the end result: what do you want to know from data. Things like machine learning, open data and ever increasing availability of APIs to different analytical functions are making it possible. Now it is not necessary to be “jack of all trades”, but rather understand the big picture and utilizes capabilities that are made available by area specific experts. This should boost the innovation in the data related areas.
Dominant players in the field have changed in every step. Also the way how companies using these services need to equip themselves has developed a lot. Speed of change is ever increasing and as the assets needed in this game are also changing, this requires continuous capability to renew themselves from the vendor side. End user companies, i believe, will be equipped with better capabilities to get more benefits out of the data.
Two interesting areas to observe here in the future are the earning models and centralization of data assets. Earlier you paid for the tools, then for solution or service. Now you can get it free or can you? “Shared data is more valuable” is a nice statement, but if all data is shared, then someone with better capabilities to utilize it will have upper hand in the game. Yes shared data is valuable, but question is who can harvest the value? I suppose you have noticed, that many machine learning environments are now open for you to use. You are only required to deliver your data. The end result is that this environment is getting better after each new data set users deliver. So basically you are paying the free use by making the asset (machine learning environment) of other company better. Hopefully competition in the market remains balanced enough.