I wrote an article for CIO Applications, here's archive of it:
Self-organizing mesh networking and communication comes with a permanent flow of information, massive IoT data streams even classic Big Data frameworks like Hadoop cannot handle anymore in time. Coming along with the art of data, the need for data processing changes with the kind of data creation and ingestion. Most analyses will be done on the edge and during the ingestion stream when the data comes to rest. The data lake should be the central core to store data, but the data needs to get categorized and catalogued together with a proper and well defined schema and data description. The intended use of the gravity such data pools generate needs to be applied as the motor of data driven innovation.
Why? Batched processing helps to predict getting value out of stored data even by analyzing multiple other data points and storage facilities, but not to react in time. And timely information in IoT enables business processes only to have a valuable meaning at the time they occur, to do the job stream processing frameworks like Spark or Kafka are more suitable. Combining both techniques brings unmatched value and impact to the business, driven by the right use of data. Stream processing during the data transportation closes the gap between rapid data and data on rest. Mostly referring to the more costly IoT at edge computing, MQTT enabled stream processing engines deliver high throughput over all kind of compute instances, be it in a local data center, hybrid clouds or in public clouds.
The same is countable for available cloud technology. Every cloud provider has his own IoT solution zoo with his own lock-ins, but often they do not fit to scaling plans either in complexity, missing or not well implemented parts or simply the price model is not comparable to the margin getting from an IoT based product. A combined approach of scalable cloud technology (which fits most) and own development brings the most benefit at an affordable price tag, unspoken of the intellectual property a business gains and holds, instead to bring this to providers and therefore competitors. Independent organisations like “Linux Foundation Edge” provide the most useful insight over Open Source projects and initiatives.
Just dumping data somehow without visions behind does not help to solve the problems companies face on their digital journey, especially when it comes to questions of revenue from IoT projects. Big Data needs to have a nearly perfect data management, data rights and data retention process behind. Only this offers the possibilities to get full advantage of any kind of data, to open new revenues and sales streams and to finally see all data driven activity not as a cost saving project (as the most agencies and vendors promise) but as a revenue creation project. Using modern cloud technologies moves organizations into the data centric world, focusing on business and not operations.
Analyzing the data is the more tricky part here - on the one hand every data point brings valuable input, but on the other hand the unlimited data store also brings vulnerabilities to customer insights. I am a bit concerned about 360 degrees approaches. At first the value part of data collections needs to be questioned: which data is system relevant for support, maintenance or emergency and which is important to generate a sustainable revenue. Using streaming analysis gives valuable input at the point in time the information is needed to make decisions, but also gives the possibility to route data into different data stores. It is always unquestionable that the value of customers is higher than the data gathered, implementing a state-of-the-art data ethic catalogue is one of the main tasks analytics needs to cover.
We move quickly to a so-called interconnected world, always connected systems will dominate our future lives, introducing new business models by combining business areas which were not even in the range of combined business models. The future CIO needs to know what implications the data has, what uncountable values this data can generate but also to weight what threats uncontrollable data collections can cause. Building new data driven business will be the most exciting job in future, things never done before are now possible. Embrace this.