Quantcast
Channel: Blog – Talend Real-Time Open Source Data Integration Software
Viewing all 824 articles
Browse latest View live

Takeaways from the new Magic Quadrant for Data Quality Tools

$
0
0

Gartner has just released its annual “Magic Quadrant for Data Quality Tools.”[1]

While everyone’s first priority might be to check out the various recognitions, I would also recommend taking the time to review the market overview section. I found the views shared by analysts Saul Judah and Ted Friedman on the overall data quality market and major trends both interesting and inspiring.

Hence this blog post to share my takeaways.

In every enterprise software submarket, reaching the $1 billion dollar threshold is a significant milestone. According to Gartner estimates, the market for Data Quality tools has reached it a couple of months ago and “will accelerate during the next few years, to almost 16% by 2017, bringing the total to $2 billion”.

Although Data Quality represents a significant market already, its growth pace indicates that it has yet to reach the mainstream. Other signs that point to this include continued consolidation on the vendor side and, from a demand side perspective, a growing demand for democratization (in particular, lower entry costs and shorter implementations times).

Data quality is gaining popularity across data domains and use cases. In particular, “party data” (data related to customers, prospects, citizens, patients, suppliers, employees, etc.) is highlighted as the most frequent category. I believe demand for data quality is growing in this area because customer-facing lines of businesses are increasingly realizing that data quality is jeopardizing customer-relationship capabilities.  To further illustrate this fact, see the proliferation of press articles mentioning data quality as a key success factor for data-driven marketing activities (such as this one titled Data quality, the secret assassin of CRM). In addition, media coverage appears to reinforce that data quality together with MDM of Customer Data, are “must haves” within CRM and digital marketing initiatives (see example in this survey from emarketer).

The Gartner survey referenced in the Data Quality Magic Quadrant also reveals that data quality is gaining ground across other domains beyond party data.  Three other domains are considered as a priority: financial/quantitative data, transaction data and product data (and this wasn’t the case in last year’s survey).

In my view, this finding also indicates that Data Quality is gaining ground as a function that needs to be delivered across Lines of Businesses.  Some organizations are looking to establish a shared service for managing data assets across the enterprise, rather than trying to solve it on a case by case basis for each activity, domain, use case, etc.  However, this appears to be an emerging practice delivered in only the most mature organizations (and we at Talend would advise to only consider it once you have already demonstrated the value of data quality for some well-targeted use cases). Typically, those organizations are also those that have nominated a Chief Data Officer to orchestrate information management across the enterprise.

In terms of roles, Gartner sees an increasing number involved with data quality especially among the lines of businesses and states “This shift in balance toward data quality roles in the business is likely to increase demand for self-service capabilities for data quality in the future.”

This is in sync with other researches:  for example, at a recent MDM and data governance event in Paris, Henri Peyret from Forrester Research elaborated on the idea of Data Citizenship.

Our take at Talend is that data quality should be applied where the data resides or is exchanged. So, in our opinion, the deployment model would depend on the use case: data quality should be able to move to the cloud together with the business applications or with the integration platforms that process or store the data. Data quality should not however mandate moving data from on premises to the cloud or the other way round for its own purposes.

Last, the Gartner survey sees some interest, but not yet a key consideration for buyers, for big data quality and data quality for the Internet of Things.

“Inquiries from Gartner clients about data quality in the context of big data and the Internet of Things remain few, but they have increased since 2013. A recent Gartner study of data quality ("The State of Data Quality: Current Practices and Evolving Trends") showed that support for big data issues was rarely a consideration for buyers of data quality tools.”

This is a surprising, yet very interesting finding in my opinion, knowing that at the same time other surveys show that data governance and quality are becoming one of the biggest challenges in big data projects. See as an example this article from Mark Smith from Ventana Research, showing that most of the time spent in big data projects relate to data quality and data preparation. The topic is also discussed in a must watch webinar on Big Data and Hadoop trends (requires registration), by Gartner analysts Merv Adrian and Nick Heudecker. An alternative to the highly promoted data lake approach is gaining ground, referred as the “data reservoir approach”. The difference: While the data lake aims to gather data in a big data environment without further preparation and cleansing work, a reservoir aims to focus on making it more consumption ready for a wider audience and not only for a limited number of highly skilled data scientists. Under that vision, data quality becomes a building block of big data initiatives, rather than a separate discipline.

I cannot end this post without personally thanking our customers for their support in developing our analyst relations program.  

Jean-Michel

[1] Gartner, Inc., "Magic Quadrant for Data Quality” by Saul Judah and Ted Friedman, November 26, 2014


Turning a Page

$
0
0

At the end of October, I will be leaving Talend, after more than 7 years leading its marketing charge. It has been quite a ride – thrilling, high octane, wearing at times, but how rewarding.

And indeed, how rewarding it is to have witnessed both the drastic change of open source over the years, and the rise of a true alternative response to integration challenges.

Everyone in the open source world knows this quote from the Mahatma Gandhi:

"First they ignore you, then they laugh at you, then they fight you, then you win."

And boy, do I recall our initial discussions with industry pundits and experts, not all of them were believers. I also remember the first struggles to convince IT execs of the value of our technology (even though their developers where users). And the criticism from “open source purists” about the “evil” open core model.

It would be preposterous to say that Talend has won the battle. But it is clearly fighting for (and winning) its fair share of business. And anyway, what does “winning the battle” mean in this context? We never aimed at putting the incumbents out of business (ok, maybe after a couple drinks, we might have boasted about it), but our goal has always been to offer alternatives, to make it easier and more affordable to adopt and leverage enterprise-grade integration technology.

Over these years, it has been a true honor to work with the founding team, with the world-class marketing team we have assembled, and of course with all the people who have made Talend what it is today. We can all be proud of what we have built, and the future is bright for Talend. The company is extremely well positioned, at the forefront of innovation, and with a solid team to take it forward, to the next step (world domination – not really, just kidding).

This is a small world, and I won’t be going very far I am sure. But in the meanwhile, since I won’t be contributing to the Talend blog anymore, I will start blogging about digitalization– of the enterprise, of society, of anything I can think about, really - and I might even rant about air travel or French strikes every now and then. I hope you will find it interesting.

Digitally yours,

Yves
@ydemontcheuil
Connect on LinkedIn
 

More Action, Less Talk - Big Data Success Stories

$
0
0

The term ‘big data’ is at risk of premature over-exposure. I’m sure there are already many who turn off when they hear it – thinking there’s too much talk and very little action. In fact, observing that ‘many companies don’t know where to start with big data projects’ has become the default opinion within the IT industry.

I however stand by the view that integration and analysis of this big data stands to transform today’s business world as we know it. And while it’s true that many firms are still unsure how and where to begin when it comes to drawing value from their data, there is a growing pool of companies to observe. Their applications might all be different; they may tend to be larger corporations rather than mid-range businesses, but there is no reason why companies of any size can’t still look and learn.

I was thinking this when several successful examples of how large volumes of data can be integrated and analysed came my way this week. The businesses involved were all from different industry sectors, from frozen foods to France’s top travel group.

What they have in common is that consumer demand, combined with the strength of competition in their own particular industry, is driving the need to gain some kind of deeper understanding of their business. For the former, Findus, this involves improving intelligence around its cold supply chain and gaining complete transparency and traceability.

For Karavel Promovacances, one of the largest French independent travel companies, it is more a question of integrating thousands upon thousands of travel options, including flights and hotel beds – and doing it at the speed that today’s internet users have come to expect. A third company, Groupe Flo is creating business intelligence on the preferences of the 25 million annual visitors to the firm’s over 300 restaurants.

Interestingly, the fourth and final case study involves a company which is dedicated to data. OrderDynamics analyses terabytes of data from its big-name retailer customers such as Neiman Marcus, Brooks Brothers, and Speedo, every day to provides real-time intelligence and recommendations on everything from price adjustments and inventory re-orders to content alterations.

As I said, these are four completely different applications from four companies at the top of their own particular games. But these applications are born from the knife-edge competitive spirit they need in order to maintain their positions. A need that drives innovation and inventiveness and turns the chatter about new technologies into achievement.

This drive or need won’t remain in the upper echelons of the corporate world forever. An increasing number of mid-range and smaller companies are discovering that there are open source solutions now on the market that effectively address the challenge of large-scale volumes. And, importantly, that they can tackle these projects cost-effectively.

This is bound to turn up the heat across the mainstream business world. In a recent survey by the Economist Intelligence Unit, 47% of executives said that they don’t expect to increase investments in big data over the next three years (with 37% referencing financial constraints as their barrier). However, I believe this caution will soon give way as more firms learn of the relatively low cost of entry and, perhaps more significantly, as they see competitors inch ahead using a big data fueled business intelligence.

In other words, I expect to hear less talk and rather read more success stories in the months to come. Follow the links below to learn more about real world data success stories in high volume:

Karavel Promovacances Group (Travel and Tourism)

OrderDynamics (Retail/e-Tail)

Findus (Food)

Groupe Flo (Restaurant)

What Is a Container? (Container Architecture Series Part 1)

$
0
0

This is the first in a series of posts on container-centric integration architecture.  This first post covers common approaches to applying containers for application integration in an enterprise context.  It begins with a basic definition and discussion of the Container design patterns.  Subsequent posts will explore the role of Containers in the context of Enterprise Integration concerns.  This will continue with how SOA and Cloud solutions drive the need for enterprise management delivered via service containerization and the need for OSGI modularity.  Finally, we will apply these principles to explore two alternative solution architectures using OSGI service containers.

Containers are referenced everywhere in the Java literature but seldom clearly defined.  Traditional Java containers include web containers for JSP pages, Servlet containers such as Tomcat, EJB containers, and lightweight containers such as Spring.  Fundamentally, containers are just a framework pattern that provides encapsulation and separation of concerns for the components that use them.  Typically the container will provide mechanisms to address cross-cutting concerns like security or transaction management.  In contrast to a simple library, a container wraps the component and typically will also address aspects of classloading and thread control. 

Spring is the archetype container and arguably the most widely used container today.  Originally servlet and EJB containers had a programmatic API.  Most containers today follow Spring’s example in supporting Dependency Injection patterns.  Dependency Injection provides a declarative API for beans to obtain the resources needed to execute a method.  Declarative Dependency Injection is usually implemented using XML configuration or annotations and most frameworks will support both.  This provides a cleaner separation of concerns so that the bean code can be completely independent of the container API.

Containers are sometimes characterized as lightweight containers.  Spring is an example of a lightweight container in the sense that it can run inside of other containers such as a Servlet or EJB container.  “Lightweight” in this context refers to the resources required to run the container.  Ideally a container can address specific cross-cutting concerns and be composed with other containers that address different concerns.

Of course, lightweight is relative and how lightweight a particular container instance is depends on the modularity of the container design as well as how many modules are actually instantiated.  Even a simple Spring container running in bare JVM can be fairly heavyweight if a full set of transaction management and security modules are installed.  But in general a modular container like Spring will allow configuration of just those elements which are needed.

Open Source Containers typically complement Modularity with Extensibility.  New modules can be added to address other cross-cutting concerns.  If this is done in a consistent manner, an elegant framework is provided for addressing the full spectrum of design concerns facing an application developer.  Because containers decouple the client bean code from the extensible container modules, the cross-cutting features become pluggable.  In this manner, open source containers provide an open architecture foundation for application development.

Patterns are a popular way of approaching design concerns and they provide an interesting perspective on containers.  The Gang of Four Design Patterns[1] book categorized patterns as addressing Creation, Structure, or Behavior.  Dependency Injection can be viewed as a mechanism for transforming procedural Creation code into Structure.  Containers such as Spring also have elements of Aspect Oriented Code which essentially allow Dependency Injection of Behavior.  This allows transformation of Behavioral patterns into Structure as well.  This simplifies the enterprise ecosystem because configuration of structure is much more easily managed than procedural code.

Talend provides an open source container using Apache Karaf.  Karaf implements the OSGI standard that provides additional modularity and dependency management features that are missing in the Java specification.  The Talend ESB also provides a virtual service container based on enterprise integration patterns (EIP) via Apache Camel.  Together these provide a framework for flexible and open solution architectures that can respond to the technical challenges of Cloud and SOA ecosystems.

[1] Gamma Erich, Helm Richard, Johnson Ralph, Vlissides John (November 10, 1994). Design Patterns: Elements of Reusable Object-Oriented Software

 

Takeaways from the new Magic Quadrant for Data Quality Tools

$
0
0

Gartner has just released its annual “Magic Quadrant for Data Quality Tools.”[1]

While everyone’s first priority might be to check out the various recognitions, I would also recommend taking the time to review the market overview section. I found the views shared by analysts Saul Judah and Ted Friedman on the overall data quality market and major trends both interesting and inspiring.

Hence this blog post to share my takeaways.

In every enterprise software submarket, reaching the $1 billion dollar threshold is a significant milestone. According to Gartner estimates, the market for Data Quality tools has reached it a couple of months ago and “will accelerate during the next few years, to almost 16% by 2017, bringing the total to $2 billion”.

Although Data Quality represents a significant market already, its growth pace indicates that it has yet to reach the mainstream. Other signs that point to this include continued consolidation on the vendor side and, from a demand side perspective, a growing demand for democratization (in particular, lower entry costs and shorter implementations times).

Data quality is gaining popularity across data domains and use cases. In particular, “party data” (data related to customers, prospects, citizens, patients, suppliers, employees, etc.) is highlighted as the most frequent category. I believe demand for data quality is growing in this area because customer-facing lines of businesses are increasingly realizing that data quality is jeopardizing customer-relationship capabilities.  To further illustrate this fact, see the proliferation of press articles mentioning data quality as a key success factor for data-driven marketing activities (such as this one titled Data quality, the secret assassin of CRM). In addition, media coverage appears to reinforce that data quality together with MDM of Customer Data, are “must haves” within CRM and digital marketing initiatives (see example in this survey from emarketer).

The Gartner survey referenced in the Data Quality Magic Quadrant also reveals that data quality is gaining ground across other domains beyond party data.  Three other domains are considered as a priority: financial/quantitative data, transaction data and product data (and this wasn’t the case in last year’s survey).

In my view, this finding also indicates that Data Quality is gaining ground as a function that needs to be delivered across Lines of Businesses.  Some organizations are looking to establish a shared service for managing data assets across the enterprise, rather than trying to solve it on a case by case basis for each activity, domain, use case, etc.  However, this appears to be an emerging practice delivered in only the most mature organizations (and we at Talend would advise to only consider it once you have already demonstrated the value of data quality for some well-targeted use cases). Typically, those organizations are also those that have nominated a Chief Data Officer to orchestrate information management across the enterprise.

In terms of roles, Gartner sees an increasing number involved with data quality especially among the lines of businesses and states “This shift in balance toward data quality roles in the business is likely to increase demand for self-service capabilities for data quality in the future.”

This is in sync with other researches:  for example, at a recent MDM and data governance event in Paris, Henri Peyret from Forrester Research elaborated on the idea of Data Citizenship.

Our take at Talend is that data quality should be applied where the data resides or is exchanged. So, in our opinion, the deployment model would depend on the use case: data quality should be able to move to the cloud together with the business applications or with the integration platforms that process or store the data. Data quality should not however mandate moving data from on premises to the cloud or the other way round for its own purposes.

Last, the Gartner survey sees some interest, but not yet a key consideration for buyers, for big data quality and data quality for the Internet of Things.

“Inquiries from Gartner clients about data quality in the context of big data and the Internet of Things remain few, but they have increased since 2013. A recent Gartner study of data quality ("The State of Data Quality: Current Practices and Evolving Trends") showed that support for big data issues was rarely a consideration for buyers of data quality tools.”

This is a surprising, yet very interesting finding in my opinion, knowing that at the same time other surveys show that data governance and quality are becoming one of the biggest challenges in big data projects. See as an example this article from Mark Smith from Ventana Research, showing that most of the time spent in big data projects relate to data quality and data preparation. The topic is also discussed in a must watch webinar on Big Data and Hadoop trends (requires registration), by Gartner analysts Merv Adrian and Nick Heudecker. An alternative to the highly promoted data lake approach is gaining ground, referred as the “data reservoir approach”. The difference: While the data lake aims to gather data in a big data environment without further preparation and cleansing work, a reservoir aims to focus on making it more consumption ready for a wider audience and not only for a limited number of highly skilled data scientists. Under that vision, data quality becomes a building block of big data initiatives, rather than a separate discipline.

I cannot end this post without personally thanking our customers for their support in developing our analyst relations program.  

Jean-Michel

[1] Gartner, Inc., "Magic Quadrant for Data Quality” by Saul Judah and Ted Friedman, November 26, 2014

What Is a Container? (Container Architecture Series Part 1)

$
0
0

This is the first in a series of posts on container-centric integration architecture.  This first post covers common approaches to applying containers for application integration in an enterprise context.  It begins with a basic definition and discussion of the Container design patterns.  Subsequent posts will explore the role of Containers in the context of Enterprise Integration concerns.  This will continue with how SOA and Cloud solutions drive the need for enterprise management delivered via service containerization and the need for OSGI modularity.  Finally, we will apply these principles to explore two alternative solution architectures using OSGI service containers.

Containers are referenced everywhere in the Java literature but seldom clearly defined.  Traditional Java containers include web containers for JSP pages, Servlet containers such as Tomcat, EJB containers, and lightweight containers such as Spring.  Fundamentally, containers are just a framework pattern that provides encapsulation and separation of concerns for the components that use them.  Typically the container will provide mechanisms to address cross-cutting concerns like security or transaction management.  In contrast to a simple library, a container wraps the component and typically will also address aspects of classloading and thread control. 

Spring is the archetype container and arguably the most widely used container today.  Originally servlet and EJB containers had a programmatic API.  Most containers today follow Spring’s example in supporting Dependency Injection patterns.  Dependency Injection provides a declarative API for beans to obtain the resources needed to execute a method.  Declarative Dependency Injection is usually implemented using XML configuration or annotations and most frameworks will support both.  This provides a cleaner separation of concerns so that the bean code can be completely independent of the container API.

Containers are sometimes characterized as lightweight containers.  Spring is an example of a lightweight container in the sense that it can run inside of other containers such as a Servlet or EJB container.  “Lightweight” in this context refers to the resources required to run the container.  Ideally a container can address specific cross-cutting concerns and be composed with other containers that address different concerns.

Of course, lightweight is relative and how lightweight a particular container instance is depends on the modularity of the container design as well as how many modules are actually instantiated.  Even a simple Spring container running in bare JVM can be fairly heavyweight if a full set of transaction management and security modules are installed.  But in general a modular container like Spring will allow configuration of just those elements which are needed.

Open Source Containers typically complement Modularity with Extensibility.  New modules can be added to address other cross-cutting concerns.  If this is done in a consistent manner, an elegant framework is provided for addressing the full spectrum of design concerns facing an application developer.  Because containers decouple the client bean code from the extensible container modules, the cross-cutting features become pluggable.  In this manner, open source containers provide an open architecture foundation for application development.

Patterns are a popular way of approaching design concerns and they provide an interesting perspective on containers.  The Gang of Four Design Patterns[1] book categorized patterns as addressing Creation, Structure, or Behavior.  Dependency Injection can be viewed as a mechanism for transforming procedural Creation code into Structure.  Containers such as Spring also have elements of Aspect Oriented Code which essentially allow Dependency Injection of Behavior.  This allows transformation of Behavioral patterns into Structure as well.  This simplifies the enterprise ecosystem because configuration of structure is much more easily managed than procedural code.

Talend provides an open source container using Apache Karaf.  Karaf implements the OSGI standard that provides additional modularity and dependency management features that are missing in the Java specification.  The Talend ESB also provides a virtual service container based on enterprise integration patterns (EIP) via Apache Camel.  Together these provide a framework for flexible and open solution architectures that can respond to the technical challenges of Cloud and SOA ecosystems.

[1] Gamma Erich, Helm Richard, Johnson Ralph, Vlissides John (November 10, 1994). Design Patterns: Elements of Reusable Object-Oriented Software

 

 

Key Capabilities of MDM for Regulated Products (MDM Summer Series Part 9)

$
0
0

In this “summer series” of posts dedicated to Master Data Management for Product Data, we’ve gone across what we identified as thefive most frequent use cases of MDM for product data. Now we are looking at the key capabilities that are needed in MDM platform to address each of these use cases. In this post, we address the MDM for Regulated Products, which is about using MDM to support compliance to regulations related to products or facilitate data exchange related to products between business partners. MDM for regulated products deals with standard codification. One of the most well established standards bodies for product is GS1, known not only for providing the Global Trade Item Number (GTIN) as the Universal Identifier for Consumer Goods and Healthcare Products, but more generally for standards enabling capture, identification, classification, sharing and traceability of product data between business partners.  According to Wikipedia, “GS1 has over a million member companies across the world, executing more than six billion transactions daily using GS1 standards”.

Complying with such standards mandates your MDM to be comfortable – and easy to deal with - the relatively complex semi-structured data that those standards mandate, such as EDI or XML data formats (for example, GS1 provides lets you choose between the “traditional” EDI format, EANCOM, and GS1 XML). The MDM platform should also allow data mappings between those well-defined standards and internal data structures, and to interactively viewing your products according to both your internal, “proprietary” views and the standardized one. Modeling capabilities such as hierarchy management is important too, and also inheritance to make sure that standards are embedded into your specific data models, and ensure that changes are automatically applied to all the data structures that aspire to conform with a standard.   

This use case also mandates strong capabilities from your data quality components, especially in terms of parsing, standardization, entity resolution and reconciliation. This may be the starting point to get standardized classifications out of your legacy product data, as it product categories may have been initially coded into long freeform text in legacy systems rather than in well-defined and structured attributes.  

Interfaces may be as “basic” as import and export but they could be much more sophisticated when the goal is to connect in real time business partners and regulatory institutions. Security and access control, and other high end capabilities found in an MDM that fully integrates an Enterprise Service Bus and provide capabilities as fault tolerance or audit trails, then become critical.  Workflow capabilities for data authoring and compliance checking might be important as well with that respect.

Continued on Part 10.

Jean-Michel

Key Capabilities of MDM for Product Information Management (MDM Summer Series Part 10)

$
0
0

In this “summer series” of posts dedicated to Master Data Management for Product Data, we’ve gone across what we identified as thefive most frequent use cases of MDM for product data. Now we are looking at the key capabilities that are needed in MDM platform to address each of these use cases. In this post, we address the MDM for Product Information Management, which is about managing the customer facing side of build to stock products.

MDM for Product Information Management needs both a strong front-end and a back-end. The front-end holds the process of referencing product and making sure it holds the most complete and accurate data for efficient distribution. Some consider that it is the role of a Commercial Off-The-Shelf product that comes with pre-configured data models, business processes and workflow ; others prefer to work with a platform that they can customize to their own needs and connect to separate front-end platforms like e-commerce applications. Indeed, this would depend on the gap between the business need and what is proposed out of the box in the packaged software.

In the platform case, the solution must provide strong capabilities in terms of search and easily integrate with third party solutions that provide those kinds of capabilities such as online catalogs, digital asset management solutions, search based applications… Collaborative authoring capabilities, workflows and Business Process Management are important capability to consider, too. Same for the capabilities to categorize products according to classification standards (such as GS1 Global Product Categorization or ecl@ss) or custom hierarchies.

No matter what is the choice for the front end side of PIM, the back end side will be very important too. Its goal is to integrate the incoming data and to make sure that it is quality proofed before the product data is delivered to customer facing activities. Strong rule based data quality capabilities should be provided for that, in order to assess the incoming data, measure and analyze its quality, delegate its stewardship to right stakeholders in case of inaccuracy, and control the overall process through a rule based approach.

Then the solution must connect to the business partners. This can be done through a supplier portal, through open APIs, or by connecting to a Global Data Synchronization Network. Ability to map to standard format and connect to those external exchanges if applicable is then needed.

Jean-Michel

 


Customer Data Platform: Toward the personalization of customer experiences in real time

$
0
0

Big data has monopolized media coverage in the past few years.  While many articles have covered the benefits of big data to organizations, in terms of customer knowledge, process optimization or improvements in predictive capabilities, few have detailed methods for how these benefits can be realized.

Yet, the technology is now mature and proven. Pioneers include Mint in the financial sector, Amazon in retail and Netflix in media. These companies showcase that it is possible today to put in place a centralized platform for the management of customer data that is able to integrate and deliver information in real time, regardless of the interaction channel being used.

This platform, known as Customer Data Platform (CDP), allows organizations to reconstruct the entire customer journey by centralizing and cross referencing interactional or internal data such as purchase history, preferences, satisfaction, and loyalty with social or external data that can uncover customer intention as well as broader habits and tastes. Thanks to the power and cost-effectiveness of a new generation of analytical technologies, in particularly Hadoop and its ecosystem, the consolidation of these enormous volumes of customer data is not only very fast, but also enables immediate analysis.

As well as this data helping improve overall general customer knowledge upstream; importantly, it also helps organizations understand and act upon individual customer needs on a real time basis. In fact, it enables companies to predict a customer’s intentions and influence their journey through the delivery of the right message, at the right time, through the correct channel.

The Pillars of CDP

To achieve this, the Customer Data Platform must be based on four main pillars. The first pillar is about core data management functions around retrieving, integrating and centralizing all sources of useful data. In an ideal implementation, this system incorporates modules for data quality to ensure the relevance of the information, as well as Master Data Management (MDM) to uniquely identify a customer across touch points and govern the association rules between the various data sets.

The second pillar establishes a list of the offers and “conditions of eligibility”, taking into account, for instance, the specifics of the business such as premium pricing, loyalty cards, etc. The third pillar aims to analyze the data and its relationships in order to establish clear customer segments. Finally, the last pillar is concerned with predictability and enabling, through machine learning, the ability to automatically push an offer (or “recommendation”) that is most likely to be accepted by the customer.

These are the four steps that I believe are essential to achieving the Holy Grail or the ultimate in one-to-one marketing. Before companies tackle these types of projects, it is of course absolutely essential they first define the business case. What are the goals? Is it to increase the rate of business transformation, drive customer loyalty, or to launch a new product or service?  What is the desired return on investment? The pioneers in the market are advising companies to develop a storyboard that describes the ideal customer journey by uncovering "moments of truth” or the interactions that have the most value and impact in the eyes of the customer.

The Path to Real Time Success

Once companies have created their Customer Data Platform, they may begin to test various real time implementation scenarios. This would involve importing and integrating a data set to better understand the information it includes and then executing test recommendation models. By tracking the results of various models, companies can begin to refine their customer engagement programs.

The ability to modify customer engagement in real time may at first seem daunting. Especially given that until now information systems have taken great care to decouple transactional functions from analytics. However, the technologies are now in place to make real time engagement a reality. Companies no longer have to do analysis on a small subset of their customer base, wait weeks for the findings, and then far longer before they take action. Today, all the power companies need to connect systems containing transactional information, web site, mobile applications, point of sale systems, CRM, etc., with analytical information, in real time.

In general, making the transition to real time can be completed gradually. For example, companies could start with the addition of personalized navigation on a mobile application or individualized exchanges between a client and a call center for a subset of customers. This has the advantage of quickly delivering measurable results that can grow over time as the project expands to more customers. These early ventures can be used as a stepping stone to building a Customer Data Platform that enables companies to precisely integrate more deeply the points of contact - web, call center, point of sale, etc. - in order to enrich the customer profile and to be able to personalize a broader set of interactions.

Once all the points of contact have been integrated, the company has the information necessary to make personalized recommendations. It is the famous Holy Grail of one-to-one marketing in real time, with the four main benefits being: Total visibility on the customer journey (in other words the alignment of marketing and sales); complete client satisfaction (no need to authenticate) and therefore loyalty; clear visibility into marketing effectiveness and, in the end, increased revenue due to higher conversion rates. Given that we already know that companies employing this type of analytical technique are more successful that the competition that don’t,[1] moving real time becomes a necessity.

What about Privacy?

The respect of privacy should indeed be a key consideration for anyone interested in the personalization of the customer experience. While regulatory concerns might be top of mind, companies must first and foremost consider how their actions may impact their customer relationships. This is a matter of trust and respect, not simply compliance. Without a doubt, there is a lot we can learn here from what has been implemented in the health sector. Beyond getting an accurate diagnosis, people are generally comfortable being open with their doctors because they clearly understand the information will be held in confidence. In fact, physicians have a well-known code of conduct, the Hippocratic Oath. For companies, a similar understanding must be reached with their customers. They need to be upfront and clear what information is being collected, how it will be used and how it will benefit the customer.

 

7 Reasons to “Unify” Your Application and Data Integration Teams and Tools

$
0
0

I recently attended a Gartner presentation on the convergence of Application and Data Integration at their Application Architecture, Development and Integration conference.  During the talk they stressed that “chasms exist between application- and data-oriented people and tools” and that digital businesses have to break down these barriers in order to succeed.  Gartner research shows that more and more companies are recognizing this problem – in fact, 47% of respondents to a recent survey indicated they plan to create integrated teams in the next 2-3 years. 

And yet, very few integration platforms, other than Talend’s, provide a single solution that supports both application integration (AI) and data integration (DI).  It seems that although many people intuitively recognize the value of breaking down integration barriers, many still have a hard time pointing to the specific benefits that will result.  This post outlines my top reasons organizations should take a unified approach.

  1. Stop reinventing the wheel

Separate AI and DI teams can spend as much as 30% of their time re-inventing the wheel or re-creating similar integration jobs and meta data.  With a unified integration tool, you can create your meta data once and use it over and over again.  You can also often avoid re-creating the same integration job.  In many situations, the requirements of an integration job can be met with either style of integration, but with separate teams, you are forced to recreate the same jobs for different projects.

  1. Learn from Toyota, Rapid Changeover Kills Mass Production

History is full of examples where management has opted for specialization to increase throughput.  This works really well in a predictable environment.  A great example of this is Ford’s approach to the model T, where you could have any color you wanted as long as it was black.  They could crank out the cars for less than anyone else with their assembly lines and mass-production approach.  Unfortunately, I have yet to see an IT organization that could successfully predict what their business owners need.  That’s why Toyota’s “one-piece flow” and flexible assembly lines have so dramatically out performed U.S. auto maker’s dedicated production lines. 

  1. Pay only for what you need

    If you’re building two separate integration teams, you’re probably paying your implementation and administration “tax” twice.  You’re buying two sets of hardware and you’re paying people to set up and maintain two separate systems.  This tax is especially big if you need a high availability environment with live backup.  With a unified integration tool, you’re only doing all of this once and the savings can be huge.  At one large Talend customer, they had a team of 10 admins across AI and DI and they were able to cut that to 5 with Talend.
     
  2. Train once, integrate everything

    If you use two separate integration tools, it means you have to have specialists that understand each, or you have to train your people on two completely different and often highly complex tools.  With a unified solution, your developers can move back and forth across integration styles with very little incremental training.  This makes it much easier for your integration developers to stay current with both tools and styles of integration, even if they spend the majority of their time on a single style.  This reduces training costs and employee ramp time while increasing flexibility.
     
  3. Win with speed

    With new types of data (web and social) and new cloud applications, data volumes are exploding in every company, large and small.  This is making the ability to be data-driven a strategic differentiator that separates winners from losers.  A critical part of being data driven is allowing the business to put the right data in the right places as quickly as possible.  With a unified tool, you can start out with one style of integration and be ready to add other integration styles a moment’s notice.  A great example of the need for this is when a data warehousing project starts out with batch data movement and transformation requirements and then later business teams realize they can use this same data to make real-time recommendations to sales.  Without a unified integration solution, this would require two separate integration teams and two separate projects.
     
  4. Do more

Application integration and data integration tools are often stronger at some things and weaker at others.  For instance, data integration tools can include strong data quality and cleansing capabilities that application tools lack.  With a unified solution, each style of integration can benefit from best that the other integration style has to offer. 

  1. Stay aligned

It happens in almost every business.  Executives show up at a meeting and each has their own reports and data that give very different views on the business and it’s almost impossible to reconcile the differences.  The same thing happens with separate integration teams.  Each defines separate rules around prices, revenue and product lines and as a result, it’s very hard to get a consistent view on the business and key performance indicators.  A unified tool allows you to build those rules once and then apply them consistently across every integration job.  This kills many data discrepancies at the root cause.

What do you think? I’m interested to hear from folks that are considering a unified approach, but believe the challenges maybe too great – equally happy to engage those with opposing viewpoints.

Open Source ETL Tools – Open for Business!

$
0
0

With all the hype and interest in Big Data lately, open source ETL tools seem to have taken a back seat. MapReduce, Yarn, Spark, and Storm are gaining significant attention, but it also should be noted that Talend’s ETL business and our thousands of ETL customers are thriving. In fact, the data integration market has a healthy growth rate with Gartner recently reporting that this market is forecasted to grow 10.3% in 2014 to $3.6 billion!

Open source ETL tools appear to be going though their own technology adoption lifecycle and are running mission critical operations for global 2000 corporations, which would suggest they are at least in the “early majority” adoption stage. Also based on their strong community, open standards and more affordable pricing model, open source ETL tools are a viable solution for small to midsize companies.

I would think the SMB data integration market, which has been underserved for many years, is growing the fastest. Teams of two or three developers can get up-to-speed very quickly and get a fast ROI over hand-coding ETL processes. Many Talend customers are reporting a huge savings on their data integration projects over hand-coding, e.g. Allianz Global Investors states that Talend is “proving to be 3 times faster than developing the same ETLs by hand and the ability to reuse Jobs, instead of rewriting them each time, is extremely valuable.”

A key component with open source is its vibrant community and the benefits it provides including sharing information, experiences, best practices and code. Companies can innovate faster through this model. For example, RTBF, one of over 100,000 Talend community users, states, “A major consideration was that Talend is open source and its community of active users ensures that the tools are rapidly updated and that user concerns are taken into account. Such forums make information easily accessible. As the community grows, more and more topics are covered which, of course, saves users a lot of time.”

And the good news is that the open source ETL tools category has blossomed with maturity to meet changing demands. What started as basic ETL and ELT capabilities has transformed into an open source integration platform. As firms break down their internal silos, data integration developers are being asked to integrate big data, to improve data quality and master data, to move from batch to real-time processing, and to create reusable services.

With increasing data integration requests, companies are looking for more and more pre-built components and connectors – from databases (traditional and NoSQL) and data warehouses, to applications like SAP and Oracle, to big data platforms like Cloudera and Hortonworks, to Cloud/SaaS applications like Salesforce and Marketo. Finally, not only do you need to connect to the Cloud, but run in the Cloud.

Almerys is an example that started with data integration and batch processing then moved to real-time data services, “Early on, significant real-time integration needs convinced us to adopt Talend Data Services, the only platform on the market offering the combination of a data integration solution and an ESB (Enterprise Service Bus).”

Big data may be getting all the attention and open source ETL tools may not be in the spotlight, but looking across the industry and what Talend customers are doing, they have certainly matured into an indispensible part of IT’s toolbox.

 

(Gartner: The State of Data Integration: Current Practices and Evolving Trends, April 3, 2014)

Big, Bad and Ugly - Challenges of Maintaining Quality in the Big Data Era – Part 1

$
0
0

More than a decade ago, we entered an era of data deluge. Data continues to explode - it has been estimated that for each day of 2012, more than 2.5 exabytes (or 2.5 million terabytes) of data were created. Today, the same amount of data is produced every few minutes!

One reason for this big data deluge is the steady decrease in the cost per gigabyte, which has made it possible to store more and more data for the same price. In 2004, the price of 1 GB of hard disk storage passed below the symbolic threshold of $1. It's now down to three cents (view declining costs chart). Another reason is the expansion of the Web, which has allowed everyone to create content and companies like Google, Yahoo, Facebook and others to collect increasing amounts of data.

Big data systems require fundamentally different approaches to data governance than traditional databases. In this post, I'd like to explore some of the paradigm shifts caused by the data deluge and its impact on data quality.

The Birth of a Distributed Operating System

With the advent of the Hadoop Distributed File System (HDFS) and the resource manager called YARN, a distributed data platform was born. With HDFS, very large amounts of data can now be placed in a single virtual place, similar to how you would store a regular file on your computer. And, with YARN, the processing of this data can be done by several engines such as SQL interactive engines, batch engines or real-time streaming engines.

Having the ability to store and process data in one location is an ideal framework to manage big data. Consulting firm, Booz Allen Hamilton, explored how this might work for organizations with its concept of a “data lake”, a place where all raw or unmodified data could be stored and easily accessed.

While a tremendous step forward in helping companies leverage big data, data lakes have the potential of introducing several quality issues, as outlined in an article by Barry Devlin: In summary, as the old adage goes, "garbage in, garbage out".

Being able to store petabytes of data does not guarantee that all the information will be useful and can be used. Indeed, as a recent New York Times article noted: “Data scientists, according to interviews and expert estimates, spend from 50 percent to 80 percent of their time mired in this more mundane labor of collecting and preparing unruly digital data, before it can be explored for useful nuggets.”

Another similar concept to data lakes that the industry is discussing is the idea of a data reservoir. The premise is to perform quality checks and data cleansing prior to inserting the data into the distributed system. Therefore, rather than being raw, the data is ready-to-use.

The accessibility of data is a data quality dimension that benefits from these concepts of a data lake or data reservoir. Indeed, Hadoop makes data and even legacy data accessible. All data can be stored in the data lake and tapes or other dedicated storage systems are no longer required. Indeed, the accessibility dimension was a known issue with these systems.

But distributed systems also have an intrinsic drawback, the CAP theorem. The theorem states that a partition-tolerant system can't provide data consistency and data availability simultaneously. Therefore, with the Hadoop Distributed File System - a partitioned system that guarantees consistency - the availability dimension of data quality can’t be guaranteed. This means that the data can't be accessed until all data copies on different nodes get synchronized (consistent). Clearly, this is a major stumbling block for organizations that need to scale and want to immediately use insights derived from their data. As Marissa Mayer from Google says: “speed matters”.  A few hundreds of milliseconds of delay in the reply to a query and the organization will lose customers.  Finding the right compromise between data latency and consistency is therefore a major challenge in big data, although the challenges tend to apply only in the most extreme situations as innovative technologies appear over time in order to tackle it. 

Co-location of Data and Processing

Before Hadoop, when organizations wanted to analyze data stored in a database, they could get it out of the database and put it in another tool or another database to conduct analysis or other tasks. Reporting and analysis are usually done on a data mart which contains aggregated data from operational databases. As the system scales, they can't be conducted on operational databases which contain the raw data.

With Hadoop, the data remains in Hadoop. The processing algorithm to be applied to the data can be sent to the Hadoop Map Reduce framework. And the raw data can still be accessed by the algorithm. This is a major change in the way the industry manages data: The data is no longer moved out of the system in order to be processed by some algorithm or software. Instead, the algorithm is sent into the system near the data to be processed. Indeed, the prerequisite to reap this benefit is that applications can run natively in Hadoop.

For data quality, this is a significant improvement as you no longer need to extract data to profile. You can then work with the whole data rather that with samples or selections. In-place profiling combined with BI Data systems opens new doors for data quality. It's even possible to think about some data cleansing processes that will take place in the big data framework rather than outside.

Schema-on-read

With traditional databases, the schema of the tables is predefined and fixed. This means that data that does not fit into the schema constraints will be rejected and will not enter the system. For example, a long text string may be rejected if the column size is smaller than the input text size. Ensuring constraints with this kind of "schema-on-write" approach surely helps to improve the data quality, as the system is safeguarded against data that doesn’t conform to the constraints. Of course, very often, constraints are relaxed for one reason or another and bad data can still enter the system. Most often, integrity constraints such as the no null value constraint are relaxed so that some records can still enter the system even though some of their fields are empty.

However, at least some constraints dictated by a data schema may mandate a level of preparation before data goes into the database. For instance, a program may automatically truncate too large a text data or add a default value when the data cannot be null, in order to still enter it into the system.

Big data systems such as HDFS have a different strategy. They use a "schema-on-read" approach. This means that there is no constraint on the data going into the system. The schema of the data is defined as the data is being read. It's like a “view” in a database. We may define several views on the very raw data, which makes the schema-on-read approach very flexible.

However, in terms of data quality, it's probably not a viable solution to let any kind of data enter the system. Letting a variety of data formats enter the system requires some processing algorithm that defines an appropriate schema-on-read to serve the data. For instance, such an algorithm would unify two different date formats like 01-01-2015 and 01/01/15 in order to display a single date format in the view. And it could become much more complex with more realistic data. Moreover, when input data evolves and is absorbed in the system, the change must be managed by the algorithm that produces the view. As time passes, the algorithm will become more and more complex. The more complex the input data becomes, the more complex the algorithm that parses, extracts and fixes it becomes - to the point where it becomes impossible to maintain.

Pushing this reasoning to its limits, some of the transformations executed by the algorithm can be seen as data quality transformations (unifying the date format, capitalizing names, …). Data quality then becomes a cornerstone of any big data management process, while the data governance team may have to manage ”data quality services” and not only focus on data.

On the other hand, the data that is read through the "views" would still need to obey most of the standard data quality dimensions. A data governance team would also define data quality rules on this data retrieved from the views. It raises the question of the data lake versus the data reservoir. Indeed, the schema on read brings huge flexibility to data management, but controlling the quality and accuracy of data can then become extremely complex and difficult. There is a clear need to find the right compromise.

We see here that data quality is pervasive at all stages in Hadoop systems and not only involves the raw data, but also the transformations done in Hadoop on this data. This shows the importance of well-defined data governance programs when working with big data frameworks.

In my next post, I'll explore the impacts of the architecture of big data systems on the data quality dimensions. We'll see how traditional data quality dimensions apply and how new data quality dimensions are likely to emerge, or gain importance. 

Defining Your “One-Click”

$
0
0

People talk about the impact of the “digital transformation” and how companies are moving to becoming “data-driven,” but what does it mean in practice? It may help to provide a couple of examples of data-driven companies. Netflix is often cited as a great example of a data-driven company. The entertainment subscription service is well known for using data to tune content recommendations to the tastes and preferences of their individual subscribers and even for developing new programming (NY Times: Giving Viewers What They Want). At Netflix, big data analysis is not something that only certain teams have access to, but rather a core asset that is used across the organization. Insights gained from data allow Netflix to make quick and highly informed decisions that improve all aspects of the business and most importantly, positively impact the customer experience.

Although perhaps not as well known outside the IT industry, GE is another fantastic example of a data -driven success story. GE has a vision of the “industrial internet,” a third wave of major innovation following the industrial and internet revolution, which is fueled by the integration of complex physical machinery with networked sensors and software.

One example is in GE’s $30B Renewable Energy division. This team has begun to execute on a vision of using smart sensors and the cloud to connect over 22,000 wind turbines globally in real-time. The ultimate goal is to predict downtime before it happens and to be able to tune the pitch and orientation of turbines to generate billions in additional energy production. Talend is helping them achieve this vision. Our work with GE has helped cut the time it takes to gather and analyze sensor data from 30 days to one.  And, we believe that we can cut this down to minutes in the very near future.

Amazon is another stunning example of what it means to be a data-driven company. While data plays a significant role in all aspects of Amazon’s business, I view the company’s one-click ordering system, a button that once you click it automatically processes your payment and ships the selected item to your door - as a particularly compelling and pure illustration of being data-driven. This single button proves just how adept Amazon is at turning massive volumes of shopper, supplier and product data into a customer convenience and competitive advantage. Of course, Amazon didn’t become this data-driven overnight. Similar to its evolution from an online bookstore to the leading online retailer, becoming data-driven was a process that took time.

As an integration company deeply rooted in big data and Hadoop, our mission is to help companies through the process of becoming data-driven and, ultimately, define their own “one-click”. Regardless of the industry, companies’ one-click is often associated with customer-facing initiatives – which could be anything from protecting banking clients from fraud to enabling preemptive maintenance of turbines on a wind farm as discussed in the GE example.

Some organizations mistakenly believe being data-driven is all about being better at analytics. While analytics is certainly an important facet, companies must first become highly proficient at successfully stitching together desperate data and application silos.  Next, companies must manage and streamline the flow of data throughout their entire organization and ensure that the data they are analyzing is accurate and accessible in an instant.

This is certainly something that our recently released Talend 5.6 aims to help companies achieve. For those of you not familiar with our Integration Platform, it combines data integration, application integration, data quality and master data management (MDM) capabilities into a single, unified workflow and interface. We believe this approach, coupled with the now over 800 components and connectors for easing the integration of new applications, data sources with big data platforms, helps simplify data management and significantly reduce the otherwise steep learning curve associated with big data and Hadoop.

While 5.6 is a great solution for companies initiating their data journey, it’s also ideal for helping companies become data-driven and define their “one-click,” especially given some of latest features we’ve introduced. As noted in our announcement, version 5.6 adds new efficiency controls for MDM. In our view, MDM is a key component for empowering our clients to begin to uniquely identify and track their customers across various touch points, as well as govern the association rules between various data sets. Notably, Talend 5.6 also initiates support for the latest Hadoop extensions, Apache Spark and Apache Storm. While perhaps not achievable for all companies immediately, the ability to operate in real time should be on every organization’s roadmap, and is in part, what these technologies will help facilitate.

As some of you may have heard, later this year we will launch Talend Integration Cloud, an Integration Platform-as-a-Service (iPaaS). The solution will enable the connection of all data sources – cloud-to-cloud, cloud-to-ground – and IT teams design and deploy projects that can run wherever they are needed. Also, for the first time, with Talend Integration Cloud, we will be enabling line of business users to access data integration tools and build jobs without having to rely on IT. Expect to hear far more about Talend Integration Cloud over the coming months, but we are very excited to provide our customers with this new tool in their arsenal and allow them to extend data access and intelligence throughout their enterprise.

I’m looking forward to the year ahead and being part of a fantastic team that will be helping more companies become data-driven and define their “one-click”. What about you? Is this the year your company, like Amazon, will be able to use data to make smarter business decisions at any given moment across your entire organization? If so, I hope to hear from you soon.

Use Big Data to Secure the Love of Your Customers

What is a Container? Cloud and SOA Converge in API Management (Container Architecture Series Part 2)

$
0
0

This is the second in a series of posts on container-centric integration architecture. The first post provided a brief background and definition of the Container architecture pattern. This post explores how Service Oriented Architecture (SOA) and Cloud initiatives converge in API Management, and how Platforms provide the Containerization infrastructure necessary for API Management.

Today we are seeing an increasing emphasis on Services and Composite Applications as the unit of product management in the Cloud. Indeed, API Management can be seen as the logical convergence of Cloud and SOA paradigms. Where SOA traditionally emphasized agility within the enterprise, API Management focuses on agility across the ecosystem of information, suppliers and consumers. With SOA, the emphasis was on re-use of the business domain contract. API Management addresses extensibility at all layers of the solution stack. Platforms take responsibility for non-business interfaces and deliver them via Containerization to business layer Service developers who can focus on the domain logic.

Most enterprise applications have historically had a single enterprise owner responsible for design, configuration and business operation of the Application. Each application team has its own release cycle and controls more or less all of its own dependencies subject to Enterprise Architecture policies and central IT operations provisioning. In this model the unit of design and delivery is at the Application level. Larger enterprises may have independent IT operations to operate and manage the production environment, but the operations contract has always been between individual application development teams as the supplier of business logic and the central IT organization, rather than a peer-to-peer ecosystem of business API’s shared between organizations. 

As long as the number of interdependencies between projects remains low and the universe of contributors and stakeholders stays narrow, the Application delivery model works fine.

The relevant principle here is that organizational structure impacts solution architecture. This is natural because the vectors of interest of stakeholders define the decision making context within the Software Development Life Cycle (SDLC). Applications built in isolation will reflect the assumptions and design limitations implicit in the environment in which they were created. A narrow organizational scope will therefore influence requirements decisions. Non-functional requirements for extensibility are understandably hard to justify in a small and shallow marketplace. 

SOA and Cloud change all of this.

As the pace of development accelerates, enterprises see an increasing emphasis placed on agility. SOA is about creating composite solutions through assembly of reusable Services.  Each line of business provides its own portfolio of services to other teams. Likewise each organization consumes many services. Each of service has its own product lifecycle. Because Services are smaller than Applications, they have a much faster product lifecycle. By encapsulating re-usable business functionality as a moremodular Service rather than a monolithic Application, the enterprise increases agility by decreasing cycle time for the minimal unit of business capability. 

In response, development and operations teams have crafted DevOps strategies to rapidly deploy new capability. It is not just that DevOps and Continuous Delivery have revolutionized the SDLC; the unit of release has changed as well. In these agile environments, the unit of business delivery is increasingly the Service rather than the Application. In large organizations the Application is still the unit of product management and funding lifecycles, but the Service module is a much more agile model for Cloud and SOA ecosystems. As a result, evolution takes place more rapidly.

This has implications beyond the enterprise boundary. More modular composite applications can be quickly constructed from modules that are contributed by a broader ecosystem of stakeholders, not just those within a single enterprise. From this perspective, SOA design patterns enable more rapid innovation by establishing social conventions for design that allow a faster innovation cycle from a broader population. As a result, evolution takes place more broadly.

SOA initiatives often emphasize standards-based interoperability. Interoperability is necessary but not sufficient for successful adoption of SOA. Standardization, modularity, flexibility, and extensibility are necessary in practice. These can be provided via Containerized Platforms. 

SOA drives containerization because the resulting composite business solutions cross organizational and domain boundaries. Coarse grained Service API’s are shared between organizations via Service Contracts. But these come at a cost; the same granularity that provides the increased agility also increases the interactions between organizations and the complexity of the enterprise as a whole. The added complexity is not a negative since it is essential complexity for the modern enterprise; but it must be managed via SOA Governance and API Management. 

While Service Orientation can be incorporated into technology oriented services such as Security, the primary focus of SOA contracts is on the business domain. Increased integration both within and across the enterprise boundary have implicit dependencies on these cross-cutting technology services. Data must be secured both in transit and at rest, audit and non-repudiation must be provided in a non-invasive manner, capacity must be provisioned elastically to respond to changing demand. All of these cross-cutting concerns require collaboration across the enterprise boundary; they impact service consumers as well as service providers. In order to scale, these non-functional requirements can be delegated to Platforms that realize them through Containers.

What distinguishes a Platform from an Application is the open architecture exposed by deeper API’s that are consumed by the Services running on the platform. In order for the Services to be composed into solutions, the Platform API must be loosely coupled to the Services. This is the function of the Container.  In addition to loose coupling, the Container provides Separation of Concerns between the Service development team and the Operations team. Lightweight containers make the delivery of Services much more modular, enabling among other things the operations team to deliver the fine-grained control necessary for elasticity. In an open ecosystem, a Service Container can also encapsulate the collaborative contract for the community of service providers and consumers, increasing flexibility and re-use. API contracts also support infrastructure features such as authentication and authorization which become pluggable parts of the Platform. An extensible and open infrastructure allows further innovation while insulating business domain developers from the IT concerns.

Platform architectures can be applied within the enterprise or across enterprises. As Cloud delivery models have become popular, Platforms have emerged as a means of accelerating adoption of Cloud offerings.  Google, Facebook, Microsoft, Amazon, and SalesForce are all examples of Platforms.  But there are other, older examples of Platform architectures. In many ways B2B portals are the archetype Platform, offering an ecosystem of extensible, layered API’s upon which higher level services can be delivered across enterprise boundaries.

Of course Cloud and SOA are intimately related. After all, Cloud comes in three flavors, Infrastructure, Platform, or Software as a Service. So Service orientation is implicit in the Cloud paradigm.  What Cloud adds to Service Orientation is a maturity benchmark for delivery and operations of services. Whether they are top-level business services or the supporting infrastructure, Cloud requires self-service on-demand, measurability, visibility, and elasticity of resources. Whereas an enterprise can deliver SOA design patterns, the agility achieved in practice will be constrained if operations cannot step up to the Cloud-delivery model. Containerized Platforms provide both the efficiency necessary for Cloud maturity as well as the extensible modularity required for successful SOA adoption in the larger ecosystem.

The next post will explore how Talend applies Apache OSGI technology to provide a Service Container for enterprise integration as part of the Talend Unified Platform.


Retail: Personalised Services to Generate Customer Confidence

$
0
0

In recent years, the Internet and e-commerce have revolutionised the retail industry, in the same way, for example, as the appearance of supermarkets. Beyond ease of purchase and the ability to consult the opinion of other consumers, e-commerce has overwhelmingly changed the way in which information about a customer's journey to purchase is captured. Today, it is also captured on a far more individual basis. For example, e-commerce enables you to know, with relative ease, what a particular customer is looking for, how they reached the site, what they buy, the associated products they have purchased previously, and even what purchases they abandoned. Reconstructing the customer’s journey was extremely difficult to achieve when the sole purchasing channel was the physical store and the only traceable element was the purchase itself. At best, the customer was only identified at the checkout, which, for example, ruled out the prospect of providing them personalised recommendations.  

Thanks to a better understanding of the customer's journey to purchase, e-commerce has opened up the possibility of not only gaining a better understanding of customer behaviour, but also the ability to react in real time based on this knowledge. Due to the success of such programmes, distributors have considered applying these concepts across all sales channels – stores, call centres, etc. This is the dual challenge facing most retailers today: to fully understand the customer's journey across different sales channels (multi- or omni-channel), while benefitting from greater accuracy, including in the case of physical retail outlets.

This is not as easy as it seems. Depending on the channel chosen by the customer, the knowledge obtained by the seller is not the same: as we know, whilst at the checkout, the customer will only be recognised if they own a loyalty card or have previously visited the store. But, in the latter case, it will be extremely complex to make the link to past purchases. Similarly, a website may enable the collection of data on the intention to buy, but it is extremely difficult to correlate these events with the purchasing transactions if they are not made online and in the same session. The stakes are high, given that 78% of consumers now do their research online prior to making a purchase[1] (the famous Web-to-store or ROPO).

One solution is to integrate sensors into the various elements that make up a customer’s purchasing journey, then analyse and cross reference this big data to extract concrete information from it. For example, we have noticed that Internet users often visit commercial websites during the week in order to prepare for making a purchase on a Saturday. If, for example, it has a self-service Wi-Fi facility linked to a mobile app enabling it to personalise the customer’s journey, the store can follow this journey right up to the actual purchase, or even influence it by proposing a good deal at just the right moment.

Some of our customers are already largely engaged in this process, which is done in a gradual manner. It all usually begins with a very detailed analysis of the customer’s online journey, to collect information on intention, cross reference it at an aggregated level with the actual purchases, at the catchment-area level, for example, to determine correlations and refine the segmentations. Then, this information is cross referenced for a second time with the transactional data from the physical stores and the website, which enables us to map the customer’s journey from the intention to buy to the purchase or even beyond, and across different channels. Thirdly, it's a matter of developing a recommendation system in real time throughout the customer's journey that yields a dual benefit: increased sales and greater loyalty.

The main challenge facing distributors in the future actually lies in the value-added services that they may or may not be able to provide to their customers, to accompany their products or services. Consumers have learned to be wary of digital technology. For example, they create specific email addresses to get the offer they need without revealing their true identity in order to prevent further contact. More than ever, they will only be inclined to share information on their intentions and their profiles if their trust has been gained and they perceive some benefit.

How do you create this trust? Via value-added services: when consumers see that their interests are being considered, they do not feel constrained or trapped by a commercial logic that is beyond them. Let us imagine that, on the basis of till receipts or a basket that is in the process of being filled, a retailer can guide the choice of products based on personal criteria, excluding, for example, those that contain peanut oil, which I must avoid as my son has just been declared highly allergic to it. I am aware that my journey is being tracked by the retailer, but I understand its uses and I derive some benefit from it. Amazon, with its “1-Click” ordering, has shown the way. In other sectors, such as the taxi industry, newcomers have gone even further, revolutionising the customer’s journey by utilising digital technology, from searching for a service to payment through a range of innovative services that make the customer’s life easier, such as the automated capture of expense forms.

In a world in which advertising and tracking are increasingly present, data analysis that is carried out with the sole aim of commercial transformation is ultimately doomed to failure, as it is based on an imbalance between the benefits offered to the customer and those gained by the supplier[2]. Until now, personalisation in retail has had a tendency to limit itself to marketing and measurement based on conversion rates, except for distributors, which have increasingly relied on customer loyalty. Multichannel is not the invention of the distributors but a reaction to consumers’ wishes. Think about it, even Amazon, Internet pure player par excellence, is going to start opening physical stores. Why? Because it has fully understood that a key element was missing in its bid to become better acquainted with its customers’ journey, while responding more effectively to their wishes.

 

Avoiding the Potholes on the Way to Cost Effective MDM

$
0
0

Master data management is one of those practices that everyone in business applauds. But anyone who has been exposed to the process realizes that MDM often comes with a price. Too often what began as a seemingly well thought out and adequately funded project begins accumulating unexpected costs and missing important milestones.

First we need to know what we’re talking about.  One of the best definitions of a MDM project I’ve heard is from Jim Walker, a former Talend director of Global Marketing and the man responsible for the Talend MDM Enterprise Edition launch.  Jim describes MDM as, “The practice of cleansing, rationalizing and integrating data across systems into a ‘system of record’ for core business activities.”  

I have personally observed many MDM projects going off the rails  while working with other organizations.  Some of the challenges are vendor driven. For example, customers often face huge initial costs to begin requirements definition and project development. And they can spend millions of upfront dollars on MDM licenses and services – but even before the system is live, upgrades and license renewals add more millions to the program cost without any value being returned to the customer. Other upfront costs may be incurred when vendors add various tools to the mix.  For example, the addition of data quality, data integration and SOA tools can triple or quadruple the price. 

Because typically it is so expensive to get an MDM project underway, customer project teams are under extreme pressure to realize as much value as they can as quickly as possible.  But they soon realize that the relevant data is either stored in hard to access silos or is of poor quality – inaccurate, out of date, and riddled with duplication. This means revised schedules and, once again, higher costs. 

Starting with Consolidation

To get around some of these problems, some experts advise starting small using the MDM Consolidation method. Because this approach consists of pulling data into the MDM Hub (the system’s repository) and performing cleansing and rationalizing, the benefit is that Consolidation has little impact on other systems. 

While Consolidation is also a good way to begin learning critical information about your data, including data quality issues and duplication levels,  the downside is that these learning’s can trigger several months of refactoring and rebuilding the MDM Hub.  This is a highly expensive proposition, involving a team of systems integrators and multiple software vendors.

In order to realize a rapid return on MDM investment, project teams often skip the Consolidation phase and go directly to a Co-existence type of MDM. This approach includes Consolidation and adds synchronization to external systems to the mix. Typically data creation and maintenance will co-exist in both the MDM system and the various data sources.  Unfortunately this solution introduces difficult governance issues regarding data ownership, as well as data integration challenges such as implementing a service-oriented architecture (SOA) or data services.

There are other types of MDM, each with its own set of problems. The upshot is that the company implementing an MDM system winds up buying additional software and undertaking supplementary development and testing, incurring more expense.

An Alternative Approach

Rather than become entangled in the cost and time crunches described above, you should be looking for vendors that provide a solution that lets you get underway slowly and with a minimum amount of upfront costs. 

In fact, part of the solution can include Open Source tools that allow you to build data models, extract data, and conduct match analysis, while building business requirements and the preliminary MDM design. All at a fraction of the resource costs associated with more traditional approaches.

Then, with the preliminary work in place, this alternative solution provides you with the tools needed to scale your users.  It is efficient enough to allow you to do the heavy development work necessary to create a production version of your MDM system without breaking the bank. 

Once in an operational state, you can scale up or down depending on your changing MDM requirements.  And, when the major development phase is over, you can ramp down to a core administrative group, significantly reducing the cost of the application over time.

You should look for vendors offering pricing for this model based on the number of developers – a far more economical and predictable approach when compared to other systems that use a pricing algorithm based on the size of data or the number of nodes involved. 

This approach to MDM deployment is particularly effective when combined with other open source tools that form the foundation of a comprehensive big data management solution. These include big data integration, quality, manipulation, and governance and administration.

By following this path to affordable, effective MDM that works within a larger big data management framework, you will have implemented a flexible architecture that grows along with your organization’s needs.

Announcing the Talend Passport for MDM Success

$
0
0

The need for a 360° view of customers, products, or any business objects needed in your daily work is not new. Wasn’t it supposed to be addressed with ERP, CRM, Enterprise Data Warehouses, by the way?

But the fact is that there is still a gap between the data expectations from the Lines of Businesses and what IT delivers; now, with the advent of Big Data, the gap is widening at an alarming pace. Does that mean that we should forget about MDM because it is too challenging to achieve? Well, can you give up being customer centric?  Can you open your doors to new data-driven competitors from your industry?  And could you afford to ignore industry regulations and privacy mandates?

Of course you can’t.

MDM is a must, not an option, so there isn’t a choice but to overcome these challenges. To accomplish this -- together with selected Consulting and System Integration partners with proven track record in MDM consulting -- we designed the Talend Passport for MDM Success. 

In a must read blog post titled “MDM: Highly recommended, still misunderstood”, Michele Goetz from Forrester Research provides evidence that MDM is hot topic. But at the same time, she warns that MDM is much more than “loading data into the hub, standardizing the view, and then pushing the data.”, but rather a data strategy. This explains why many surveys have shown that organizations often struggle to design a sound MDM strategy, not to mention a clear ROI.

This contrasts with the most recent success stories that Talend sees for MDM. Those achieving success, appear to do so by closely linking their MDM back-end with business outcomes on the front-end. This then drove them to fully engage their Lines of Businesses, not only to get the necessary funding, but also to collaboratively implement a sustainable support organization as well as best practices for Data Governance. This also allowed them to deliver MDM incrementally, by starting small with a well-defined sweet spot in mind and then expanding fast through a series of initiatives aligned with well-defined business objectives.

The lesson learned is that planning is crucial to Master Data success; however, this also appears to be the most challenging step of the project. As a result, most organizations need guidance to succeed at this phase.  At Talend, we believe that our role goes beyond equipping MDM projects with the right toolset: we want to contribute as much as we can to the overall success and this is what guided us to address the issue. We aspire to help our customers back up their MDM initiative with a solid business case and to build a clear project plan and to address the prerequisites before engaging their projects.

This drove us to the design of the Talend Passport for MDM Success. We designed it as a collaborative effort with our partners:  we selected MDM Consulting and System Integrators across regions, those who had both a proven track record both in MDM consulting and in delivering Talend MDM projects on expectations, on time and on budgets. Once we gathered the community, we worked on the deliverable of the offer. 

The Talend Passport for MDM Success is packaged consulting services that can be delivered in a short period (from four to six weeks depending on the project type and scope). It provides guidance to ensure that the MDM project is on the right track and establishes a solid foundation for an MDM roll-out. Concretely, the goal is to:

- Assess an organization maturity for engaging in an MDM program and set up a plan to meet the prerequisites;

Define/refine the MDM business case(s) and be ready to promote them to the Lines of Businesses;

Draw a project roadmap and get ready to start the execution

The feedback from the selected partners with regards to this initiative has outpaced the expectations. The initial objective was to have one partner on board for each of our core regions before the end of first quarter of 2015. As of today, nine companies have joined the alliance, from global providers to deep specialist boutiques. And those names resonate: Bearing Point, Cap Gemini, CGI, CSC, McKnight Consulting Group, IPL, Micropole, Sopra Steria, Virtusa. And they are all ready to deliver.

Take McKnight Consulting Group for example. They are fully focused on strategizing, designing and deploying in the disciplines of Master Data Management, Big Data, Data Warehousing and Business Intelligence. Their CEO, William McKnight is a well-known thought leader in the area of information management as a strategist, information architect and program manager for complex, high-volume, full-lifecycle implementations worldwide. Here is his feedback on the program: “McKnight Consulting Group is delighted to be a partner in the Passport to Success program.  MDM is an imperative today.  Master data must be formed and distributed as soon as the data is available.  Often the data needs workflow and quality processes to be effective.  We have been helping clients realize these benefits for many years and are extremely focused in building solid, workable plans, built contextually to the organization’s current and future desired state.  All of our plans have formed the master data strategies for many years.  We look forward to continuing to get information under control and to raising the maturity of the important asset of information."

Because we know the power of a community, thanks to our open source DNA, we didn’t want to reinvent the wheel by creating a new offer. Rather, we are taking a very pragmatic approach in order to leverage the best practices and approaches that each of our partners have already successfully delivered. So we designed collaboratively with each of those partners a Passport for MDM Success that can be delivered today. It was simply a matter of aligning our objectives and assets. From a more personal perspective, this was a great exercise to connect with the best MDM experts from around the world and come together to create an offer that fully meets our initial objectives.

Now that we launched the offer, the results are very promising. Not only are some of our prospects opting for these services, but they are already running our approach as a way to accelerate and secure their planning efforts and make sure that they have their Lines of Business on board. We also see interest from customers that are already engaged in delivering MDM, so that they can augment the impact of their MDM implementation and expand their projects to other domains and use cases.

Now that we delivered this initiative, the story is not over. First, we welcome other Consulting and System Integrators that have experience in providing MDM guidance as well as delivering Talend MDM projects to join the community. Also, together with our partners we will begin to add deeper industry flavor to this program, so that we bundle specific industry best practices into our standard services.

What is “The Data Vault” and why do we need it?

$
0
0

For anything you might want to do, understanding the problem and using the right tools is essential.  Resulting methodologies and best practices that inevitably arise become the catalyst for innovation and superior accomplishments.  Database systems, particularly data warehouse systems are no exception, yet does the best data modeling methodologies of the past offer the best solution today?

Big Data, agreeably a very hot topic, will clearly play a significant part in the future of business intelligence solutions.  Frankly the reliance upon Inmon’s Relational 3NF and Kimball’s STAR schema strategies simply no longer apply.  Using and knowing how to use the best data modeling methodology is a key design priority and has become critical to successful implementations.  Persisting with outdated data modeling methodologies is like putting wagon wheels on a Ferrari.

Data Vault: Adaptable to changeToday, virtually all businesses make money using the Internet.  Harvesting the data they create in an efficient way and making sense of it has become a considerable IT challenge.  One can easily debate the pros and cons involved in the data modeling methodologies of the past, but that will not be the focus of this blog.  Instead let’s talk about something relatively new that offers a way to easily craft adaptable, sensible, data models that energize your data warehouse:  The Data Vault!

Enterprise Data Warehouse (EDW) systems aim to provide true Business Intelligence (BI) for the data-driven enterprise.  Companies must address critical metrics ingrained in this vital, vibrant data.  Providing an essential data integration process that eventually supports a variety of reporting requirements is a key goal for these Enterprise Data Warehouse systems.  Building them involves significant design, development, administration, and operational effort.  When upstream business systems, structures, or rules change, fail to provide consistent data, or require new systems integration solutions, the minimum reengineering requirements present us with problem #1: The one constant is change; so how well can an EDW/BI solution adapt?

Consumption and analysis of business data by diverse user communities has become a critical reality to maintain a competitive edge yet technological realities today often require highly trained end-users.  Capturing, processing, transforming, cleansing, and reporting on this data may be understandable, but in most cases the sheer volume of data can be overwhelming; Yup, problem #2: Really Big Data; often characterized as: Volume, Velocity, Variety, Variability, Veracity, Visualization, & Value!

Crafting effective and efficient EDW/BI systems, simplified for usability and reporting on this data, quickly becomes a daunting and often difficult technical ordeal even for veteran engineering teams.  Several integrated technologies are required from database systems, data processing (ETL) tools like Talend, various programming languages, administration, reporting, and interactive graphics software to high performance networks and powerful computers having very large storage capacities.  The design, creation, delivery, and support of robust, effortless EDW/BI systems for simplified, intelligent use are, you guessed it; problem #3: Complexity!

Often we see comprehensive and elegant solutions delivered to the business user that fails to understand the true needs of the business.  We’re told that’s just the way it is due to technical requirements (limitations; wink, wink) and/or design parameters (lack of features; nudge, nudge).  Hence; problem #4: The Business Domain; fit the data to meet the needs of the business, not the other way around!

Furthermore, as upstream systems change (and they will), as EDW/BI technology plows ahead (and they must), as the dynamic complexities involved prevail (relentlessly), every so often new data sources need to be added to the mix.  These are usually unpredicted and unplanned for.  The integration impact can be enormous often requiring complete regeneration of the aggregated data; hence, problem #5: Flexibility; or the lack there of!

So how do we solve these problems?  Well …

Bill Inmon widely regarded as the father of data warehousing, defines a data warehouse as:

A subject oriented, nonvolatile, time-variant collection of data in support of management’s decisions
(http://en.wikipedia.org/wiki/Bill_Inmon)


Star schemaRalph Kimball (http://en.wikipedia.org/wiki/Ralph_Kimball), a pioneering data warehousing architect, developed the “dimensional modeling” methodology now regarded as the de-facto standard in the area of decision support.  The Dimensional Model (called a “star schema”) is different from Inman’s “normalized modeling” (sometimes called a “snowflake schema”) methodology.  In Kimball’s Star Schema, transactional data is partitioned into aggregated “facts” with referential “dimensions” surrounding and providing descriptors that define the facts.  The Normalized Model (3NF or “third normal form”) stores data in related “tables” following relational database design rules established by E. F. Codd and Raymond F. Boyce in the early 1970’s that eliminate data redundancy.  Fostering vigorous debate amongst EDW/BI Architects as to which methodology is best, both have weakness when dealing with inevitable changes in the systems feeding the data warehouse and in cleansing data to conform to strict methodology requirements.

Further, the OLAP cube (for “online analytical processing”) is a data structure that allows fast analysis of data from multiple perspectives.  The cube structure is created from either a Star or Snowflake Schema stored as metadata from which one can view or “pivot” the data in various ways.  Generally cubes have one time based dimension that supports a historical representation of data.  Creating OLAP cubes can be very expensive and often create a significant amount of data that is of little or no use.  The 80/20 rule appears in many cases to hold true (where only 20% of the OLAP cube data proves useful) which begs the question: Built upon a traditional architecture does an OLAP cube truly deliver sufficient ROI?  Often, the answer is a resounding, NO!  Durable EDW/BI systems must deliver real value.

 

A Fresh Approach

The Data Vault is a hybrid data modeling methodology providing historical data representation from multiple sources designed to be resilient to environmental changes.  Originally conceived in 1990 and released in 2000 as a public domain modeling methodology, Dan Linstedt, its creator, describes a resulting Data Vault database as:

A detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business.  It is a hybrid approach encompassing the best of breed between 3NF and Star Schemas.  The design is flexible, scalable, consistent and adaptable to the needs of the enterprise.
(http://en.wikipedia.org/wiki/Data_Vault_Modeling)

Focused on the business process, the Data Vault as a data integration architecture, has robust standards and definitional methods which unite information in order to make sense if it.  The Data Vault model is comprised of three basic table types:

the data vaultHUB (blue): containing a list of unique business keys having its own surrogate key.  Metadata describing the origin of the business key, or record ‘source’ is also stored to track where and when the data originated.

LNK (red): establishing relationships between business keys (typically hubs, but links can link to other links); essentially describing a many-to-many relationship.  Links are often used to deal with changes in data granularity reducing the impact of adding a new business key to a linked Hub.

SAT (yellow): holding descriptive attributes that can change over time (similar to a Kimball Type II slowly changing dimension).  Where Hubs and Links form the structure of the data model, Satellites contain temporal and descriptive attributes including metadata linking them to their parent Hub or Link tables.  Metadata attributes within a Satellite table containing a date the record became valid and a date it expired provide powerful historical capabilities enabling queries that can go ‘back-in-time’.

There are several key advantages to the Data Vault approach:

- Simplifies the data ingestion process

- Removes the cleansing requirement of a Star Schema

- Instantly provides auditability for HIPPA and other regulations

- Puts the focus on the real problem instead of programming around it

- Easily allows for the addition of new data sources without disruption to existing schema

Simply put, the Data Vault is both a data modeling technique and methodology which accommodates historical data, auditing, and tracking of data.

The Data Vault is the optimal choice for modeling the EDW in the DW 2.0 framework
Bill Inmon

 

Adaptable

Through the separation of business keys (as they are generally static) and the associations between them from their descriptive attributes, a Data Vault confronts the problem of change in the environment.  Using these keys as the structural backbone of a data warehouse all related data can be organized around them.  These Hubs (business keys), Links (associations), and SAT (descriptive attributes) support a highly adaptable data structure while maintaining a high degree of data integrity.  Dan Linstedt often correlates the Data Vault to a simplistic view of the brain where neurons are associated with Hubs and Satellites and where dendrites are Links (vectors of information).  Some Links are like synapses (vectors in the opposite direction).  They can be created or dropped on the fly as business relationships change automatically morphing the data model as needed without impact to the existing data structures.  Problem #1 Solved!

 

Big Data

Data Vault v2.0 arrived on the scene in 2013 and incorporates seamless integration of Big Data technologies along with methodology, architecture, and best practice implementations. Through this adoption, very large amounts of data can easily be incorporated into a Data Vault designed to store using products like Hadoop, Infobright, MongoDB and many other NoSQL options.  Eliminating the cleansing requirements of a Star Schema design, the Data Vault excels when dealing with huge data sets by decreasing ingestion times, and enabling parallel insertions which leverages the power of Big Data systems.  Problem #2 Solved!

 

Simplification

Crafting an effective and efficient Data Vault model can be done quickly once you understand the basics of the 3 table types: Hub, Satellite, and Link!  Identifying the business keys 1st and defining the Hubs is always the best place to start.  From there Hub-Satellites represent source table columns that can change, and finally Links tie it all up together.  Remember it is also possible to have Link-Satellite tables too.  Once you’ve got these concepts, it’s easy.  After you’ve completed your Data Vault model the next common thing to do is build the ETL data integration process to populate it.  While a Data Vault data model is not limited to EDW/BI solutions, anytime you need to get data out of some data source and into some target, a data integration process is generally required.  Talend’s mission is to connect the data-driven enterprise.  

With its suite of integration software, Talend simplifies the development process, reduces the learning curve, and decreases total cost of ownership with a unified, open, and predictable ETL platform.  A proven ETL technology, Talend can certainly be used to populate and maintain a robust EDW/BI system built upon a Data Vault data model.  Problem #3 Solved!

 

Your Business

The Data Vault essentially defines the Ontology of an Enterprise in that it describes the business domain and relationships within it.  Processing business rules must occur before populating a Star Schema.  With a Data Vault you can push them downstream, post EDW ingestion.  An additional Data Vault philosophy is that all data is relevant, even if it is wrong.  Dan Linstedt suggests that data being wrong is a business problem, not a technical one.  I agree!  An EDW is really not the right place to fix (cleanse) bad data.  The simple premise of the Data Vault is to ingest 100% of the source data 100% of the time; good, bad, or ugly.  Relevant in today’s world, auditability and traceability of all the data in the data warehouse thus become a standard requirement.  This data model is architected specifically to meet the needs of today’s EDW/BI systems.  Problem #4 Solved!
 

To understand the Data Vault is to understand the business

(http://danlinstedt.com)

 

Flexible

The Data Vault methodology is based on SEI/CMMI Level 5 best practices and includes many of its components combining them with best practices from Six Sigma, TQM, and SDLC (Agile).  Data Vault projects have short controlled release cycles and can consist of a production release every 2 or 3 weeks automatically adopting the repeatable, consistent, and measurable projects expected at CMMI Level 5.  When new data sources need to be added, similar business keys are likely, new Hubs-Satellites-Links can be added and then further linked to existing Data Vault structures without any change to the existing data model.  Problem #5 Solved!

 

Conclusion

In conclusion, the Data Vault modeling and methodology addresses the elements of the problems we identified above:

- It adapts to a changing business environment

- It supports very large data sets

- It simplifies the EDW/BI design complexities

- It increases usability by business users because it is modeled after the business domain

- It allows for new data sources to be added without impacting the existing design

This technological advancement is already proving to be highly effective and efficient.  Easy to design, build, populate, and change, the Data Vault is a clear winner.  Very Cool!  Do you want one?

Visit http://learndatavault.com or http://www.keyldv.com/lms for much more on Data Vault modeling and methodology.

Big Data - a Relatively Short Trip for the Travel Industry

$
0
0

If there is one sector that has been particularly affected by the digital revolution, it is travel. According to research company PhoCusWright, the share of bookings derived from online channels will increase to 43 percent this year[1]. The move of more consumers to online sources for researching and booking travel, with sites like Airbnb, Booking.com or TripAdvisor counting visitors in the tens of millions, is a further boost to a sector that has historically always been a strong collector of detailed consumer information. At the same time, companies like Uber and BlaBlaCar are already showcasing the power of being data driven by fundamentally disrupting traditional taxi and rail travel services.

What could be more natural in these circumstances than travel companies being among the most committed to their digital transformation? A Forbes Insights report released earlier this year reinforces this point, placing travel at the top of industries in which companies are using data-driven marketing to find a competitive edge[2]. According to the report, 67 percent of travel executives say they have used data-driven marketing to find a competitive advantage in customer engagement and loyalty, and 56% percent have done so for new customer acquisition.

More Miles to Go

While the digital transformation of the travel industry is certainly underway, there is still a way to go – especially for the other 33 and 44 percent of travel executives who have yet to use data to drive a competitive advantage! Moreover, while the travel industry may be advanced in terms of marketing engagement, when it comes to relationships and the management of customer data, they still have a way to go. For instance, how often are you asked by reception at check-in if this is the first time at the hotel? And, even though during your stay you must constantly prove your identity, why is this only for hotel billing and security purposes rather than to enjoy personalized services? Moreover, some consumers suspect that being identified by the travel industry turns out to be a detriment rather than a benefit (for example the case of “IP Tracking“ - the more one visits a booking site, the higher the ticket price might climb).

The reality is that companies in the travel industry are confined mostly to handling transactions, when there are technologies and practices linked to customer knowledge available that actually enable them to better manage and personalize the entire customer journey. The challenge is to reinvent the notion of the travel agency, which was formerly essential to linking customers to service providers. The Internet allows people to do a lot of the work themselves, such as finding a provider, making a reservation and responding to an event. The role of advisor remains, designed to provide, according to the customer’s profile, the right service at the right time. 

How do you differentiate?

Each ticket reservation (plane, train, coach, etc.), each hotel stay and each car rental leaves a “digital trail”, which can be consolidated and analyzed within a Customer Data Platform. This enables travel companies to better understand the needs and desires of an individual customer. Thus, a large amount of data can be collected both before (while booking a trip or a flight: destination preferences, language, climate, activities, etc.), during (food, excursions, sports, etc.) and after the trip (customer reviews and social commentary, recommendations, next trip, etc.). During the trip or the journey, it is also possible to be permanently connected to the customer, for example through the provision of Wi-Fi access, (as is already the case in the majority of hotels, airports, and increasingly on trains and planes). Globally, there are many more points of interaction today than there have ever been.

We therefore see travel companies launching services based on the Internet of Things and offering real-time recommendations to deliver new offers[3](example: the tennis court is open; would you like to use it?). Here, we are talking about managing the entire the customer journey, not just the initial act of purchase. 

A rough methodology

The first technological brick in this model is a customer database that covers all of the proposed services (online or offline reservations, points of sale, after sales, customer service, call centers). This should include basic information about the customer, and is what we call the golden record (also known as Master Data Management). To get a unique view of the customer that has been updated across all channels, it must also reflect the transactions and events that took place during the customer’s journey, including interactions. Big data plays a key role in this platform, as it can also integrate data from the web and social networks. Additionally, it allows for the extrapolation of analytical information using raw data, such as a segmentation or scorings that enable companies to predict the customer’s affinity to a certain service.

This platform can also connect in real time to all points of contact, for example, a call center (which helps to increase efficiency and relevance through an immediate understanding of the customer context), websites, points of sale, reception desks, etc. The greater the number of points of contact, the more precise the picture of the individual customer will be and the greater the opportunity for companies to provide the right service at the right time. To the extent that this type of system directly affects the processes and the key actors in the customer relationship, it is essential to support the project with an accompanying change.

In summary, a Customer Data Platform (some call it a Data Management Platform or DMP, but this term is ambiguous in my opinion since it is more often used in reference to a tool for managing online traffic and the purchase of display ads rather than a cross-channel platform intended to supply value-added services to target customers) enables, on the one hand, the creation of a sustainable and up-to-date customer information base and, on the other hand, a means to offer online services to the connected customer throughout their journey/stay/travel, thus creating a personalized relationship. And, finally, it allows for the recommendation of personalized offers in real time at the most opportune moments.

Though it may be difficult to maintain a one-to-one relationship with customers in some sectors, this is not the case in the travel sector; trips are often tailored, the context is personal and interactions with customers are frequent. The development of a Customer Data Platform is therefore essential for professionals in these sectors. Developing a real understanding of the customer journey is their last hope in a world where digital technology giants are beginning to take over their turf and mobility will only make it easier for them to collect more data.  

If you are interested in learning more about the impact of technology on the travel industry, you may wish to view this related on-demand webinar. The webinar details how TUI, the world’s number one integrated tourism business, with over 1,800 travel agencies and leading online portals, as well as airlines, hotels, and cruise lines, is using Talend Master Data Management (MDM) to build a single customer view and deliver a more seamless user experience across multiple channels.

Jean-Michel Franco is the Director, Product Marketing for Data Governance products, Talend, a global leader in big data integration and related markets.



[1]Competitive Landscape Of The U.S. Online Travel Market Is Transforming”, Forbes April 2014 http://www.forbes.com/sites/greatspeculations/2014/04/08/competitive-landscape-of-the-u-s-online-travel-market-is-transforming/

[2]Data Driven and Digitally Savvy: The Rise of the New Marketing Organization”, Forbes, January 2015, http://www.forbes.com/forbesinsights/data_driven_and_digitally_savvy/

 

Viewing all 824 articles
Browse latest View live