The challenges emerging from digital business, and demand for greater agility, are forcing changes to integration approaches. Integration leaders can assess technology providers by understanding the five key phases of hybrid integration projects and their needs as an organization. In our last installment, we looked at the very first two phases of hybrid integration, which by now are widespread in most organizations. In this piece, we’ll discuss some of the more advanced hybrid integration patterns: data warehousing in the cloud, real-time analytics and machine learning.
Phase 3: Hybrid Data Warehousing with the Cloud
As the volume and variety of data gets bigger, you need to have a strategy to move your data from on-premise data warehouses to newer Big Data resources. But, you ask, there are so many different Big Data processing protocols out there, how does one choose the solution that’s right for them?
While you take the time to decide which Big Data protocols best serve the variety of integration use cases present within your enterprise, start by trying to at least create a Data Lake in the cloud with a cloud-based service such as AWS S3 or Azure Blobs. These cloud-based services can relieve the cost pressures imposed by on-premises relational databases and can be your "staging area" while you decide which Big Data protocol you want to use moving forward—whether it's MapReduce, Spark, or something else. The primary goal of establishing this staging area is so that you can process all this raw data, be it unstructured or structured, using your Big Data protocol of choice and then transfer it into a cloud-based data warehouse such as AWS Redshift or Microsoft Azure SQL Data warehouse. Once your enterprise data has been aggregated, you can enable your line-of-business analysts with Data Preparation tools- to organize and cleanse this data prior to analysis with a cloud analytics tool such as AWS Quicksight, Tableau, or Salesforce Wave Analytics.
Phase 4: Real-time Analytics on Streaming Data
In today’s highly competitive marketplace, companies can no longer afford to work with information that is weeks or even days old - they need insight at their fingertips in real-time. In order to prosper from the benefits of real-time analytics, you need a hybrid integration infrastructure to support it. These infrastructure needs may change depending on your use case—whether it be to support weblogs, clickstream data, sensor data, database logs, or social media sentiment.
There are many real-time Big Data messaging protocols such as Kafka, Storm, AWS Kinesis, and Flume, and to make sure that there is as little latency as possible, some organizations choose to keep their infrastructure on-premises. However, while latency with streaming use cases is definitely a concern, organizations should not let this be a barrier toward moving to the cloud. The best course of action is look to first assess all your data sources in order to judge which ones truly need to remain on-premises versus those that need to be moved to the cloud. For example, most IoT use cases involving sensors with industrial equipment are on-premises, so it’s best to keep your streaming analytics infrastructure on-premises. Same with high-availability databases whose logs you want to collect. For use cases where you're collecting streaming data about systems that are already in the cloud, it’s probably best to keep your infrastructure in the cloud as well and utilize existing services within those ecosystems such as AWS Kinesis or DynamoDB, to set up your streaming infrastructure. That way you are far ahead in your journey towards eventually moving everything to the cloud.
Phase 5: Machine Learning for Optimized App Experiences
Machine learning can bring tremendous value to the applications you build. In the future, every experience will be delivered as an app through mobile devices, whether it's a consumer mobile app, or an enterprise mobile app. The correct hybrid integration infrastructure needs to be architected to provide the ability to discover patterns buried deep within data through machine learning so that these applications can be more responsive to users’ needs. Well-tuned algorithms allow value to be extracted from immense and disparate data sources that go beyond the limits of human analysis. For developers, machine learning offers the promise of applying business-critical analytics to any application in order to accomplish everything from improving customer experience to providing product recommendations to serving up hyper-personalized content.
To make this happen, developers need to:
- Be "all-in" with the use of Big Data technologies and the latest streaming protocols
- Have large enough datasets in order for the machine algorithm to be able to recognize patterns
- Create segment-specific datasets using machine-learning algorithms to target diverse customer segments
- Ensure that whatever mobile app they build has a robust API to draw upon those datasets and provide the end user with whatever information they are looking for in the correct context
In order for companies to reach this level of ‘application nirvana’, they will need to have first achieved or implemented each of the four previous phases of hybrid integration. The right iPaaS solution can properly guide a company through the various phases of hybrid integration so that they are able to successfully reach stage five. When machine learning becomes prevalent within an organization, we can expect to see a lot more data-driven decisions taking place – but to get there, it all starts with the right hybrid integration strategy.