Big-Data-Benchmarking-Test: Talend und Informatica im Vergleich

Wenn Sie je ein Gespräch mit einem unserer Berater geführt oder Marketingunterlagen meines Teams gelesen haben, wissen Sie bereits, dass wir damit werben, dass Talend bei Big Data erhebliche Geschwindigkeitsvorteile gegenüber der Konkurrenz bietet.

Hier sehen Sie zum Beispiel eine Folie, die Teil unseres Mediensets für Talend 6 war.

Das lässt sich leicht als bloßes Marketing abtun, daher war es mir wichtig, die Aussage mit handfesten Daten zu untermauern.Dazu haben wir bei MCG Global Services, einem führenden Anbieter von Informationsmanagementdiensten, Benchmark-Tests in Auftrag gegeben, bei denen die Talend Big Data Integration mit Informatica Big Data Edition verglichen wird.

MCG hat sowohl bei den Benchmark-Tests als auch beim Ausarbeiten von Anwendungsfällen und Fragen, die für viele Organisationen hochgradig relevant wären, hervorragende Arbeit geleistet.

Unter anderem wurden die folgenden Fragen gestellt:

- Welche Auswirkungen haben Seiten- und Produktaufrufe auf unserer Website auf den Umsatz? Wie viele Seitenaufrufe erfolgen vor der Kaufentscheidung (online oder im Geschäft)? (Anwendungsfall 1)

- Wie wirken sich unsere Gutscheinkampagnen auf Produktumsatz und Servicenutzung aus? Besuchen Kunden nach Ansicht oder Erhalt unserer Gutscheinwerbung unsere Website? Kaufen sie mehr Produkte oder zusätzliche Produkte, die sie ohne Gutschein nicht kaufen würden? (Anwendungsfall 2)

- Wie stark beeinflusst bzw. fördert unsere Empfehlungs-Engine den Produktabsatz? Neigen Kunden dazu, aufgrund dieser Empfehlungen zusätzliche Produkte zu kaufen? (Anwendungsfall 3)

Wie Sie unten sehen werden, bestätigt der Benchmark-Test den Geschwindigkeitsvorteil, mit dem wir werben.Wenn Sie einen detaillierteren Überblick über die Bedingungen und Ergebnisse des Benchmark-Tests erhalten möchten, können Sie den vollständigen Bericht hier herunterladen.

Hier sehen Sie einen Überblick über den Gesamtvorteil mit Talend und erkennen, wie dieser mit zunehmendem Datenvolumen ansteigt.

Beim Vergleich von Talend und Informatica lässt sich relativ einfach erklären, weshalb der Abstand so enorm ist. Dank der In-Memory-Kapazitäten von Apache Spark können Talend-Nutzer Datasets einfach schneller einbinden. Spark verwendet schnelle Remote Procedure Calls für eine effiziente Verteilung und Planung der Aufgaben. Darüber hinaus nutzt es die Vorteile eines Thread Pools für die Ausführung von Aufgaben anstelle eines Pools von Java Virtual Machine-Prozessen. So kann Spark Aufgaben in Millisekunden planen und ausführen, während die MapReduce-Planung mehrere Sekunden und – bei stark belegten Clustern – sogar Minuten dauern kann.

Informatica Big Data Edition bietet keine direkte Spark-Unterstützung. Hier hängt es von der Hadoop-Engine und deren Konfiguration ab, wie sich Hive-on-Spark verhält und was es leistet.

Zur Erinnerung: Wenn Sie Näheres über die Benchmark-Tests erfahren möchten, können Sie den vollständigen Bericht hier herunterladen.

Kaum zu glauben, dass 2016 (das Jahr, in dem Talend seinen 10. Geburtstag feiert) direkt vor der Tür steht. Wenn unsere Gesellschaft und unsere Unternehmen den Prognosen der Filmindustrie entsprächen, würden wir alle in fliegenden Autos herumschwirren und Roboter würden mit unseren Hunden Gassi gehen... Natürlich haben wir in Bezug auf Fahrzeuge mit geringem Benzinverbrauch oder Elektroantrieb immense Fortschritte gemacht, doch fliegen tun sie noch nicht. Unmittelbarer hingegen gilt: Es gibt definitiv einige neue Technologien, die sich im Jahr 2016 erheblich auf Unternehmen und die Gesellschaft allgemein auswirken werden. Hier einige unserer „Wetten“:

Echtzeit-Analytik wird in den Mittelpunkt rücken

Unter all den neuen technologischen Innovationen wird Big-Data-Analytik im Jahr 2016 auf jeden Fall zur wichtigsten revolutionären Kraft werden. Diese Art von sofort aktionsfähigen Informationen (im Vergleich zur rückblickenden Datenanalyse) ist nicht nur eine Option, sondern eine Notwendigkeit, und zwar insbesondere aufgrund des Tempos, mit dem sich sowohl Verbraucher als auch Unternehmen bewegen. Wir verlangen relevante, personalisierte Informationen – und zwar jetzt! Zum Glück steht diese Art von Datenintegration und Verarbeitungsleistung nicht nur riesigen Cloud-Anbietern wie Netflix, Google oder Amazon zur Verfügung, sondern wird Teil des Mainstream. 2016 werden Unternehmen jeder Größe und in allen Branchen in der Lage sein, Gelegenheiten zu nutzen, die früher unvorstellbar waren. So z. B. um die Patientenversorgung zu verbessern, Ernteerträge zu steigern, um die Welt zu ernähren und allgemein besser informierte Geschäftsentscheidungen zu treffen.

Neue, unvorhergesehene Bedrohungen werden entstehen, wodurch sich die Notwendigkeit eines erhöhten Kundenfokus verstärkt

Im Zeitalter von Big Data in Echtzeit werden früher nicht erreichbare Ziele endlich zur Realität. Allerdings werden dadurch auch neue geschäftliche Herausforderungen entstehen. Es werden erhebliche Wettbewerbsbedrohungen auftreten. Die größten Bedrohungen können jetzt von außerhalb der Kernbranche eines Unternehmens stammen, d. h. von Organisationen, die bisher nicht einmal tangenziell mit Ihrem Geschäft zu tun hatten und solche, von denen Sie es sich nie hätten vorstellen können, dass sie mit Ihnen konkurrieren könnten. Dies kann Ihren Marktanteil bedrohen. Unternehmen müssen also in der Lage sein, Daten zu analysieren, die neuen Bedrohungen zu antizipieren und Möglichkeiten zu entwickeln, um diese Bedrohungen nicht nur zu bekämpfen, sondern ihren Kundenbindungsprozess neu zu bewerten und zu beleben, um treue Kunden zu behalten.

Seit Jahren arbeiten Unternehmen daran, kundenorientierter zu werden. Meist erfahren die Kunden jedoch keinerlei Verbesserungen durch diese Investitionen, und im heutigen Zeitalter von Big Data in Echtzeit reicht „gut“ hinsichtlich der Kundenerfahrung einfach nicht mehr aus. Mit der Einführung neuer Echtzeit-Big-Data-Technologien im Jahr 2016 werden mehr Unternehmen in der Lage sein, die Kundenerfahrung zu beeinflussen, wo es am meisten darauf ankommt: nämlich im aktuellen Moment. Unternehmen werden Technologie nutzen können, um personalisierte Informationen, Anreize und Service zu bieten, die insgesamt für eine bessere Kundenerfahrung sorgen. Den Alltagsverbraucher wie eine wichtige Persönlichkeit zu behandeln, ist etwas, das jeder Unternehmer anstreben sollte. Durch die Anwendung von Big Data in Echtzeit werden Kunden jetzt zum ersten Mal den Unterschied feststellen können.

Der Wechsel von CIOs wird sich beschleunigen

Der Abstand zwischen erfolgreichen und erfolglosen CIOs wird sich im Jahr 2016 erheblich vergrößern. CIOs, die beim Umstieg auf die Cloud und auf Big Data Pionierleistungen erbrachten, werden Pilotprojekte hinter sich lassen und in die Produktion gehen und damit Einblicke für das Geschäft erzeugen, die alles verändern können. CIOs, die diesen Schritt versäumt haben, werden bloßgestellt und abgedrängt, während ihre Unternehmen hinter der Konkurrenz zurückfallen. Unternehmen, die bereits ihre Big-Data-Plattform entwickelt haben, profitieren beim Big-Data-Sprint 2016 von einem beträchtlichen Vorsprung. Mit der Einführung von Spark und Spark-Streaming können sie das wahre Potenzial ihrer Investitionen in die Entwicklung von Data Lakes und Data Warehouses auf Hadoop freilegen. Für Big-Data-Pioniere werden sich ihre Investitionen 2016 bezahlt machen, und der Abstand zwischen erfolgreichen und erfolglosen CIOs wird erheblich zunehmen.

Mit dieser Zunahme wird auch die Nachfrage nach talentierten CIOs stärker, wodurch ein CIO-Talentkrieg ausgelöst werden wird, bei dem die Schwachen bloßgestellt und die Starken heiß umworben werden. Auf der Talend Connect haben wir beispielhafte Unternehmen ausgezeichnet, die 2016 in puncto innovativer Datenintegration an vorderster Front stehen werden. Diese Branchenführer nutzten neue Möglichkeiten, um stetig wachsende Datenvolumen in aktionsfähige Informationen umzuwandeln, die nicht nur ihre Unternehmen verbessern, sondern in vielen Fällen auch der breiteren Bevölkerung zugute kommen. Zum Glück gibt es auch für Unternehmen, die ins Hintertreffen geraten sind, neue Datenintegrationstechnologien, die die schnelle Bereitstellung von Spark-Kapazitäten zum Kinderspiel machen. Das bedeutet, dass auch sie noch die Chance haben, aufzuholen.

Unternehmen werden neue Tools anschaffen und nutzen

Da wir nun Big Data in Echtzeit als eine Technologie mit hohem Revolutionspotenzial identifiziert haben, die sich 2016 erheblich auswirken wird, und die Konsequenzen erläutert haben, die eintreten werden, wenn Unternehmen nicht mithalten, wollen wir jetzt ansprechen, wie Unternehmen gewährleisten können, dass sie der Konkurrenz einen Schritt voraus bleiben.

Das Zeitalter von Big Data bringt Firmen dazu, die eigenen Organisationsstrukturen zu überdenken. Big Data in Echtzeit durchbricht die Barrieren herkömmlicher Best Practices und Strukturen von Unternehmen. Die Dynamik des „Geschäft gegen IT“ wird einer neuen Dynamik weichen: „Geschäft + IT = Innovationsleistung“. Die Unternehmen, die dabei Erfolg haben werden, sind jene, die ermitteln, wie Geschäft und IT als Partner erfolgreich zusammenarbeiten können. Es müssen funktionsübergreifende Innovationszentren entstehen, unter Führung von CEOs, CIOs, CDOs und neuer Chief Marketing Technology Officers (CMTOs), die alle zusammenarbeiten, um ihr erworbenes Wissen gemeinsam optimal zu nutzen. Diese schnellen Spezialeinheiten in Sachen Information werden in der Lage sein, Informationen in Umsatz zu verwandeln und auf neue Märkte vorzustoßen – was zuvor als unmöglich galt – während sie gleichzeitig alle Sicherheits- und Datenschutzauflagen erfüllen. Silos innerhalb des Unternehmens müssen aufgelöst werden, um Echtzeit-Big-Data im Jahr 2016 auf die nächste Stufe zu heben, damit das neue Jahr für IHR Unternehmen zum Erfolgsjahr wird.

Wir freuen uns auf das kommende Jahr und die damit einhergehenden Innovationen!

Talend vs Informatica – Benchmark Big Data

Si vous avez déjà dialogué avec un commercial Talend ou lu certains documents marketing proposés par mon équipe, vous connaissez notre message : dans le domaine des Big Data, les performances des solutions Talend sont largement supérieures à celles des produits de la concurrence.

Voyez par exemple la diapositive suivante (extraite de notre présentation de Talend 6):

Je réalise que certaines personnes pourraient considérer cette affirmation comme une simple allégation marketing. J'ai donc pensé qu'il serait judicieux de l'étayer par quelques preuves concrètes. Nous avons fait appel aux compétences de MCG Global Services, leader en gestion stratégique de l'information, pour définir des tests de benchmark et comparer les caractéristiques des solutions Talend Big Data Integration et Informatica Big Data Edition.

MCG a fait un excellent travail, à la fois pour ce benchmark et pour définir un ensemble de cas d'usage et de questions qui seront très utiles à la plupart des entreprises.

Exemples de questions :

- Nos clients et les visiteurs de notre site Internet déclenchent un certain nombre de vues. Quel est l'impact de ces vues sur les ventes ? En moyenne, combien de pages sont vues avant une décision d'achat (en ligne ou en magasin) ? (Cas d'usage 1)

- Quel est l'impact de nos campagnes promotionnelles à base de coupons sur nos ventes de produits ou l'utilisation de nos services ? Les clients et prospects qui consultent ou reçoivent nos promotions à base de coupons sont-ils incités à visiter notre site Web et à acheter plus de produits ou/et d'autres produits ? (Cas d'usage 2)

- Notre moteur de recommandation est-il efficace pour influencer ou stimuler les ventes ? Les clients ont-ils tendance à acheter plus de produits ou/et d'autres produits grâce à ces recommandations personnalisées ? (Cas d'usage 3)

Comme vous pouvez le constater dans la figure ci-dessous, le benchmark confirme les performances des solutions Talend. Pour plus de détails sur les conditions d'exécution et les résultats de ce benchmark, vous pouvez télécharger le rapport complet en cliquant sur ce lien.

Les figures ci-dessous présentent un aperçu des gains réalisés avec les solutions Talend et montrent que ces gains augmentent avec le volume de données.

Dans le match Talend-Informatica, il est très facile d'expliquer pourquoi la différence est si extraordinaire. De toute évidence, le traitement en mémoire (in-memory) d'Apache Spark permet aux utilisateurs Talend d'intégrer beaucoup plus rapidement leurs données. Spark s'appuie sur un mécanisme ultra-rapide d'appels vers des systèmes distants (RPC) qui optimise la distribution et planification des tâches. En outre, le traitement Spark repose sur un pool de threads et non sur un pool de processus exécutés par des machines virtuelles Java (JVM), ce qui lui permet de planifier et exécuter les tâches en quelques millisecondes, alors que la planification Informatica MapReduce exige parfois plusieurs secondes (ou minutes !) dans les clusters très actifs.

Avec Informatica Big Data Edition (qui ne supporte pas Spark directement), le comportement et les performances de Hive sur Spark dépendent de la configuration et puissance du moteur Hadoop.

Pour plus de détails sur ces tests de benchmark, n'hésitez pas à télécharger le rapport complet en cliquant sur ce lien.

Quel avenir pour l'Internet des objets ?

Il ne fait aucun doute que les « objets connectés » ne cessent de se multiplier, accompagnés par une augmentation exponentielle des données. Cette tendance n'affecte pas seulement les consommateurs mais aussi de nombreux secteurs d'activité : santé, industries pétrolière et gazière, transport, aviation, etc. Pour les entreprises qui cherchent à exploiter ces technologies, la prolifération des capteurs et terminaux intelligents crée de nouvelles formes de valeur tout en faisant apparaître des défis spécifiques. Toutefois, même si ce tsunami de données a le potentiel de transformer le paysage industriel comme jamais auparavant, bien peu d'entreprises disposent à ce jour des technologies nécessaires ou même de business models capables d'exploiter réellement la puissance de l'Internet des objets (IoT).

Comment les entreprises doivent-elles s'informer face à cette invasion de capteurs et terminaux connectés, et comment peuvent-elles s'y préparer dès maintenant – sous peine de se laisser distancer par des concurrents plus réactifs ? Nous vous présentons ci-dessous quatre aspects à surveiller :

Des compteurs intelligents pour une planète plus intelligente

Prenez par exemple les compteurs intelligents/connectés (déjà déployés par m2ocity, lauréat des Talend Data Masters et par Springg) et considérez leur contribution à l'augmentation du volume des données. Ce type d'appareil génère environ 400 Mo de données par an. Pas très spectaculaire... mais les chiffres font rapidement boule de neige ! Un récent rapport de Bloomberg prédit que 680 millions de compteurs intelligents seront installés dans le monde d'ici à 2017, soit (estimation) 280 péta-octets de données générées par ces compteurs chaque année !

Mais vous vous demandez sans doute comment de simples compteurs peuvent contribuer à une planète plus intelligente ? Prenez l'exemple de Springg, une SSII internationale spécialisée dans la gestion des données agricoles, le développement de logiciels et les technologies à base de capteurs. Grâce aux logiciels Talend, Springg peut évaluer les données collectées par les capteurs utilisés dans ses laboratoires mobiles répartis dans le monde entier pour mesurer la composition des sols. À partir de cette connaissance, Springg formule des conseils aux agriculteurs au sujet de leurs terrains et des engrais, ce qui leur permet d'augmenter considérablement leurs cibles de rendement – et de contribuer à l'alimentation du monde.

Données volantes

Vous aimeriez connaître une autre zone de croissance des données de machine à machine ? Il vous suffit de lever le nez ! Un rapport Wikibon signale que l'industrie aéronautique est prête pour l'innovation sous forme d'analytique des Big Data. Un vol commercial (selon la durée) peut générer jusqu'à 40 To de données qui serviront à analyser le trajet, régler le fonctionnement des réacteurs, identifier des routes plus économiques et réduire le temps passé au sol. (Ce type de prise de décision piloté par les données est également en train de révolutionner l'e-commerce.) Les compagnies aériennes qui seront capables de gérer leurs Big Data en temps réel et les transformer en connaissance bénéficieront un avantage concurrentiel décisif.

Prenez par exemple la compagnie aérienne Air France/KLM. Chacun de ses Airbus A380 est équipé d'environ 24.000 capteurs qui génèrent 1,6 Go de données par vol. La compagnie utilise ces données pour détecter les incidents avant même qu'ils se produisent. En analysant cette masse de données, Air France peut détecter les réparations 10 à 20 jours avant qu'elles deviennent nécessaires. Cette nouvelle approche évite les immobilisations intempestives des avions, une situation qui a un impact sur les coûts de la société, mais aussi son niveau de service à la clientèle et son chiffre d'affaires.

Apprentissage automatique et science des données

Les données sont le carburant de votre entreprise. Vous devez non seulement en intégrer autant que vous pouvez, mais vous devez aussi les analyser en vue de détecter une connaissance insoupçonnée. Les techniques de science des données (data science) et d'apprentissage automatique (machine learning) ont tendance à se diffuser dans les entreprises, dans la mesure où celles-ci sont de plus en plus souvent pilotées par les données. L'apprentissage automatique est une forme d'intelligence artificielle (IA) qui permet aux ordinateurs d'enrichir leurs connaissances et de prendre des décisions – par exemple pour automatiser certaines tâches portant sur des données scientifiques. Et avec le volume de données provenant des milliards de sources de l'Internet des objets (IoT), ce type d'automatisation est certainement une bonne nouvelle ! La librairie de machine learning Spark MLlib est en train de gagner en popularité avec ses nombreux algorithmes d'apprentissage automatique spécialisés dans la segmentation des clients, les prévisions, la classification, les analyses de régression, etc. L'intégration de mécanismes d'apprentissage automatique dans les charges de travail les plus exigeantes va être une étape importante pour les entreprises qui souhaitent exploiter plus efficacement le tsunami de données généré par les objets connectés – et qui sont à la recherche de la proverbiale aiguille [de connaissance] dans la botte de foin [de l'IoT].

Opérationnalisation de l’analytique

Même si nous sommes déjà entrés dans « l'âge d'or » des Big Data, un fait reste surprenant : la plupart des entreprises n'utilisent PAS tout le potentiel de leurs données. Une récente étude McKinsey a même montré que moins de 1 % des données générées par l'IoT sont utilisées pour les prises de décision ! QUOIIIII ?!

Pourquoi cette situation ? Dans la plupart des cas, les données générées par l'IoT restent confinées à des rôles d'alarme ou de contrôle en temps réel et non à des initiatives d'optimisation ou d'analytique prédictif/prescriptif. Par ailleurs, il y a encore de nombreux défis à relever pour faire de l'apprentissage automatique une réalité universelle. Les données doivent d'abord être organisées et nettoyées. Passer du modèle à la production peut prendre un certain temps (parfois plusieurs mois !) : les modèles analytiques changent constamment, exigent de multiples mises à jour, et le codage à la main est loin d'avoir disparu... Toutes ces contraintes ne vont pas dans le sens l'entreprise pilotée par les données. Et pourtant, il existe une solution ! Les entreprises qui appliquent des technologies natives et ouvertes comme celles de Talend pour réaliser l'intégration en temps réel de leurs Big Data en évitant tout codage manuel et en tirant parti des qualités des produits Spark/Spark Streaming et de l'apprentissage automatique Spark, sont prêtes à dégager une meilleure connaissance de leurs données brutes.

Contactez-nous

Vous souhaitez découvrir les nouvelles fonctionnalités de Talend 6.1 en matière d'apprentissage automatique et d'intégration des Big Data en temps réel ? Suivez notre webinar à la demande.

Big Data is undergoing a major change. Have you, like us, been seeing the adoption of Apache Spark as the new engine for Hadoop? Companies are very interested in its blazing fast speed and scalability. Many organizations are looking to use real-time big data to improve customer engagement and generate more business to increase their overall revenue.

Here’s our video that shows how Talend enables you to combine batch and streaming jobs for real-time analytics. It’s one of our how-to video series we are currently posting here, so check back often for new ones.

Remember, if you don’t already have Talend installed, you can still play along with the video by downloading a free trial.

Click here if you want to find out about more of the features available in Talend 6: http://www.talend.com/products/talend-6

Guest Blog by Bernard Marr, Founder and CEO of The Advanced Performance Institute

I could start this article by saying that Big Data is taking over the world, spreading into every industry and that if you aren’t “doing it”, you’re going to be a dinosaur destined for extinction. That’s how hundreds of other articles that have been written about Big Data in business over the last few years have started. And many of them are fine articles filled with useful information.

But it’s really a lazy way to start an article – I’m going to assume most people reading this will know this already. In fact, I’m willing to bet that that it’s the main reason you’re reading this article in the first place.

But as true (and clichéd) as those sentiments may be, they oversimplify the situation. The truth is that even if you are “doing it”, there’s no guarantee of success. What’s important is that you do it well – and although that might seem obvious, it’s an obstacle that I’ve seen many companies fail to navigate.

So with this post I’m going to start from the beginning, and explain a few steps that need to be carefully considered before you spend a penny on collecting data, hiring analysts or setting up your very own distributed cloud storage infrastructure. It will hopefully help you to avoid the most common pitfalls – pitfalls which I’ve seen many businesses of all sizes tumble into, usually because they got carried away on the Big Data hype train

The very first thing to do is quickly mention what Big Data is – and just as importantly, what it isn’t – because this is a common misconception. Apologies if you think I am over-simplifying here, but this is still something I constantly see people getting wrong! Big Data is not simply “a lot of data”. And understanding that is key to avoiding some of the most common errors.

Recap – so what’s Big Data

Big Data is the term used to describe our ability to make sense of the ever-increasing volumes of data in the world. It not only refers to the data but also the methodologies and technologies used to store and analyze the data. Increasingly, it is particularly concerned with understanding and drawing insights from unstructured data (data which does not fit nicely into the rows and columns of a spreadsheet, such as video, audio or social media data) as well as machine-generated data (data which comes from smart devices and internet-connected things). Because this is the sort of data that is growing at an enormous rate in today’s digital, mobile, always-on world. For more on what big data is have a look here.

This, among other reasons, is why I often prefer to use the term Smart Data rather than Big Data. It’s really not the size that counts, it’s what you do with it.

Key most important step

The next and most important point you need to consider is why you want to use Big Data in the first place. And the answer really shouldn’t be “because everyone else is”.

Instead, what you need is a clear business case. You need to know how, and why, Big Data is useful to your company. Can it help you to solve a particular problem? Almost certainly it can – but you need to be certain that you’ve identified the right problems. A retailer client of mine once called me in to help with some Big Data projects which they had under way. During our first meeting I asked them to explain their projects to me, only to be told that would take some time as they had over 250 different data-driven projects on the go! Many of these were not even directed at solving a particular problem, or were focused on altering metrics which they could not demonstrate the significance of (for example, predicting which days of the year staff were most likely to call in sick – when there was no evidence that staff absenteeism was impacting their business performance). With just a relatively little amount of work we were able to cull most of those projects and allocate resources to the ones which were likely to drive positive change. The bottom line is that if you can’t say immediately how a project is going to answer important questions about your business, improve your service to your customers, or create efficiencies in your operations, you’re probably wasting your time.

Finding the right data

This leads nicely into the next pitfall I want to highlight: Make sure that the data you’re collecting is the right data. Too many companies launch into a Big Data initiative with the idea that “We’ll just collect all the data that we can, and work out what to do with it later”. This is an extremely wrong-headed way to go about things and all too often leads to disaster. It brings with it two potentially project-crippling dangers. The first is that with too much data you “won’t be able to see the woods for the trees”. Rather than focusing on the data that’s likely to drive the insights or change you’re looking for, you will become distracted by patterns and maybe even insights which have little potential to teach you anything useful.

The second problem with the “gotta collect it all” attitude is that any data collection and storage brings with it expense, as well as legal obligations and compliance – and with Big Data projects often involving personal data, these expenses and obligations can be immense.

The need for data-skills

As well as a financial cost, there’s obviously also a cost in human resources and time. If you have data scientists bumbling their way through hundreds of projects with no clear aim, or decoding terabytes of data you have no clear, immediate use for, they’re likely to be unavailable, or distracted, when something of real value comes along. Having the right people with the right skills in the right place is essential. Good data scientists don’t come cheap – generally commanding salaries of $100k or more, and the best are constantly in demand and rarely short of work. The more you know about precisely what you want to achieve, the more likely you are to find the right people for the job. This won’t necessarily mean hiring externally – one client of mine in the financial industry realized there was a heavy skillset crossover between some money market analysts already employed at the company, and the work they were looking to hire data scientists for. By offering on-the-job training to financial analysts interested in working in Big Data science, they were able to far more efficiently fill the roles.

Good project-management

Another point, which I can’t over stress the importance of, is the importance of good communication throughout the project. This involves both ensuring that there is “buy in” for your project across the team carrying out the work and the wider organization – from c-level executives to the nuts and bolts techies who will be carrying out the analytics and the customer facing or workforce staff whose work will be affected by your results. Everyone needs to clearly understand what it is you are trying to achieve and, crucially, why. Plenty of Big Data projects fail because the frontline staff responsible for putting the insights you’ve gleaned into play don’t understand why they are suddenly being told to do things differently than how they’ve done them for years. This isn’t their fault – it’s almost always because no one has taken the time to explain things properly to them. They don’t need to understand the ins and outs of the machine learning algorithms which are running across the distributed, unstructured data you’re analyzing in real time. But there should always be a logical, common-sense reason for what you’re asking them to do. The only real difference is that you now (hopefully) have stats and analytical evidence that backs up your decisions regarding both overall strategy and day-to-day business procedures.

So, back near the start of the article I mentioned that I prefer to talk about Smart Data, rather than Big Data. This isn’t just because I think it is a more accurate description for what the term really entails. It’s also because it works as a handy breakdown of the steps you need to take to make sure your data analytics activity proves fruitful. Those steps are:

S –Start with strategy (ensure you have a clear business case for what you are doing)

M – Measure metrics and data – And make sure it’s the right data!

A– Apply analytics – being certain you have the right skills and technology in place for the job you need to get done.

R– Report results – Ensure you have clear lines of communication, from top to bottom of your organization.

T – Transform your business in a positive way, based on the insights you’ve discovered

Following that basic template would be a good start to making sure your Big Data project doesn’t become one of the many which fail to deliver any real benefits, but one that delivers real business value and performance improvements to your organization.

If you have worked with any type of new software you know those first few moments are always challenging. There’s always that learning curve you have to go through before being able to fully leverage the product’s capabilities.

Therefore we understand it’s equally challenging to learn a new development framework, such as Apache Spark. With Talend 6 however we’ve made it easy for you to convert an existing Map Reduce job to a Talend Spark job.

Enjoy our video that shows how Talend 6 can help any user with Map Reduce and Talend Spark jobs. It’s one of our how-to video series we are currently posting here, so check back often.

Remember, if you don’t already have Talend installed, you can still play along with the video by downloading a free trial.

Click here if you want to find out about more of the features available in Talend 6: http://www.talend.com/products/talend-6

We recently published a benchmark comparing Talend Big Data Platform to Informatica Big Data Edition, showing the performance benefits of our native Apache Spark approach over Informatica’s solution. Informatica responded with a rebuttal that combines some good points along with some claims that are either misleading or completely false. (Privately, their lawyers also sent a letter to the group that performed the benchmark demanding that they retract the benchmark.) I’d like to set the record straight. Let’s start with their more valid points:

· The benchmark used a “two year old version of Informatica”. This is mostly true. When we started the benchmark, we used the most recent version of Informatica (which they released in June 2014, so it was then 16 months old). Almost simultaneous with the benchmark publishing, Informatica released their new version 10 which we haven’t benchmarked yet. In general, Informatica releases their products every 2-3 years, while we release twice per year, so it’s not surprising to see their product out of date relative to ours and the rest of the big data ecosystem – this is the normal state of things, except right around one of their release windows.

I’d like to also point out that since the benchmark was done, we also released a newer version of our platform, and according to our internal benchmark, with Spark 1.5, we’ve already seen a 16% speed improvement.

· The benchmark compares Informatica using MapReduce to Talend using Spark. True. Informatica’s latest available version at the time only supported Hive (which runs on top of MapReduce), so we used that.

· Our benchmark didn’t use TPC-DS. True. We compared the products using several common real-world digital marketing and e-commerce scenarios such as product recommendations and coupon-influence analysis. Interestingly, even though Informatica apparently used an industry standard benchmark suite they didn’t actually publish their full results and configurations, which is actually required by the TPC consortium to publish a benchmark.

We actually think our scenarios are a better real-world integration scenario than what was tested with TPC-DS. The TPC-DS benchmark is primarily focused on analytics use cases with a smaller focus on data integration. The authors of the benchmark even wrote: “Although the emphasis is on information analysis, the benchmark recognizes the need to periodically refresh its data.”

· We only used 12M records. Somewhat true. The total benchmark actually processed 75 million records, but it’s true that many real-world scenarios will process more. That said, our performance differential actually improved dramatically as the data volumes tested increased.

Informatica’s post then went on to make a number of technical claims, many of which were misleading or simply false. I’ll cover a few that are particularly worth while discussing. In Informatica’s blog post, they talked about three key issues in comparing the products:

1. Performance. We agree this is critical and maintain we are faster. Nothing in their unpublished benchmark tells us otherwise.

2. Layer of abstraction. Informatica pointed out that this is key to provide future protection in the fast-changing big data landscape, which we heartily agree with. In fact, we provided exactly that when we launched Talend 6, allowing our customers to upgrade any of their existing MapReduce jobs to Spark to gain the 5x performance benefit with just one click. If you’re able to find anyone running the Informatica Big Data Edition, ask them what the upgrade experience to the brand new version 10 is like (this is likely to be a challenge as there are so few of them in production). Unlike Talend’s fully compatible approach, Informatica actually doesn’t provide a clean abstraction layer, and requires a lengthy and awkward upgrade/conversion approach to go from their own version 9.x to version 10. I can only say that they must have cut the upgrade feature because there wasn’t enough customer demand for it…

Here’s our UI to upgrade a MapReduce job to Spark to get the 5x performance improvement:

And by the way, if it makes sense for you to run the job in real-time rather than batch, that’s just two clicks away using either Storm or Spark Streaming:

3. Breadth of functionality. We agree. If you can find one of those elusive Informatica Big Data Edition customers, ask them how compatible it is with the classic PowerCenter. You may be surprised to find out that their Big Data Edition is actually a completely separate product from their classic Powercenter with a different job designer, different server, different management, different metadata – basically different everything – and that’s not compatible with PowerCenter. It actually supports a very limited subset of the full PowerCenter functionality, and so you need to figure out how to partition your jobs between full PowerCenter and their Big Data Edition, shuffling data back and forth along the way. This doc (helpfully written by an Informatica Sales Engineer) describes the missing functionality and where you’ll need to fall back on PowerCenter. Warning, it’s 15 feature-packed pages. At Talend, of course, we support everything on our Big Data Edition since it’s a pure superset of our standard Data Integration with additional Big Data functionality. It’s the most popular version of our product so it shouldn’t be surprising that it’s fully functional.

In addition to those three evaluation criteria proposed by Informatica, we’d suggest a few more:

1. Ease of deployment and management. How easy is it to deploy, manage, monitor, and upgrade your integration solution? Does it require putting something separate on each node in the Hadoop cluster, or does it natively leverage the full power of Hadoop and Spark without any additional management overhead? Think about upgrade scenarios as well – do you have to worry about specific version compatibilities of some legacy component that you had to install on every node? Check out this post, especially the last exchange: “No, you only need to install on the data nodes, but you do need to install on ALL the data nodes.” I don’t think it has changed in the version 10 edition but I might be wrong…

2. Cost. How much does it cost, and how do costs scale as you use more data and thus more Hadoop nodes? Are you paying for each Hadoop node twice? Are you paying both your Hadoop distributor and your data integration vendor for each node or are you only paying your data integration vendor the developers using the system?

3. Cloud compatibility. If you’re interested in moving to the cloud now or in the future, is your vendor’s approach compatible with that desired direction? Talend’s solution is 100% symmetrical between cloud and premise, so anything that you build on premise can run without changes in the cloud. And as part of that our native Spark solution can run in your own Spark cluster or you can use an on-demand Amazon EMR cluster in Amazon Web Services, which we will spin up before the job and then spin down after the job completes. I have no idea how you’d use Blaze in an AWS/EMR scenario, since it requires a turd dropped on every node. If it’s even possible to use there, it certainly won’t be something you can dynamically spin up and spin down.

4. Future trajectory. Are you locked into one vendor’s proprietary runtime and upgrade trajectory, or are you leveraging the amazing amount of innovation and progress going into the open source Hadoop and Spark ecosystem? Which technology do you expect to progress faster? When I joined Talend over a year ago, we made a decision to go all in on Apache Spark. This has turned out to be a terrific strategic decision. The Spark project is the most active Apache project in the world, and Hadoop overall is progressing at a rate faster than anything I’ve ever seen in my professional career.

If you pause to think about it for a moment, it might seem like a surprising technology strategy choice for Informatica to create something like Blaze rather than leveraging Spark. But if you step away from their spurious claims for a moment and look at the problem from their point of view, you realize that it solves a very real problem. For them that is, not for you.

Informatica’s problem is that they’ve always charged for their proprietary runtime, first with PowerCenter CPUs and now with Blaze Hadoop nodes. From a business model perspective, this is critical since most of their $1B revenue is tied to these runtime licenses. So the idea of leveraging someone else’s runtime – even an incredibly powerful and flexible one like Spark – is not just foreign to them but actively dangerous to their business model. They’ll do everything they can to keep you paying for runtime licenses as long as they can. This is a data tax, or the modern version of the old mainframe MIPS pricing model. Again, this is Informatica’s problem, not yours.

In summary, if you’re so committed to the Informatica stack that you are willing to put a legacy runtime on every Hadoop node, suffer the performance hit, toggle back and forth between their incompatible traditional and big data ETL products, and rule out a simple migration to the cloud then Informatica has a good solution for you. If on the other hand you want a product that takes full advantage of native Spark and Hadoop performance/scale, is fully functional, improves at the breakneck speed of the Hadoop ecosystem, works seamlessly in the cloud when you’re ready to do that, and doesn’t require a data tax, then I humbly suggest that you take a look at Talend. There’s a reason why we’re #1 in big data integration.

You may have read some of the back and forth between Talend and Informatica regarding Talend’s big data speed benchmark against Informatica.

While there are several claims and concerns made by each side, Informatica’s chief complaint is that they launched a brand new product in November, after we ran the benchmark. They claim it’s faster than ours. We disagree, so here’s my challenge: Let’s set up a joint benchmark using a mutually agreeable third party. Let’s use a certified configuration agreeable to both of us, a mutually agreed benchmark, and allow both companies to give technical guidance to the company running the benchmark. And let the best product win. I’m betting its Talend. If you choose to hide behind your lawyers and publish misleading marketing fluff instead, then we know you secretly agree with me too. If you really believe in your product, then let’s have some fun together.

At Talend we feel the need for a single view of the customer has reached a whole new level. And Talend 6 is introducing a very powerful tool that will empower all MDM users to create a golden record that will make every customer interaction personal.

We all understand that everyone in the company may not need access to the full MDM web user interface to consult records. That is why Talend 6 introduces for those users a customer scorecard application that can be built on top of Talend MDM.

This video will show how to create a customized application based on the MDM REST API for users that don’t need to access the full Talend MDM interface. If you don’t already have Talend installed, you can still play along with the video by downloading a free trial.

Click here if you want to find out about more of the features available in Talend 6: http://www.talend.com/products/talend-6

Hadoop and the broader Big Data ecosystem continue to innovate at an incredible rate. By harnessing the power of the community and creating a survival-of-the-fittest competitive landscape, the open-source development approach helps not only fuel the pace of innovation but also drive buyer confidence and market adoption.

Open source is also important to a growing number of developers who are moving away from proprietary software looking for greater efficiency as well as transparency. In our experience, an open source approach makes for better software and happier customers, so we are all for it!

Among the various Big Data open source projects, the Data Processing space is probably the most active and promising. There are many Data Processing Engines/Frameworks out there, some are fully open source like Apache Spark, Apache Flink, Apache Apex while others are packaged and available as a service such as Google Dataflow. Most Apache open source projects combine streaming and batch data processing, and provide various levels of APIs to help programmatically develop pipelines or data flows. Google is helping to lead this charge with an abstraction layer that allows Dataflow SDK-defined pipelines to run on different runtime environments.

Google Leading the Open Source Charge with Dataflow SDK

A little over a year ago, Google open sourced its Dataflow SDK, which provides a programming model used to express Data processing pipelines (Input/source -> Transformation/Enrichment -> output/target) very easily. What is great about this SDK is the level of abstraction it provides so you can think of your pipeline as a simple flow without worrying too much about the underlying complexity of the distributed and parallel data processing steps required to execute your flow.

Talend has a long history with the Apache Software Foundation (and already has committers on key Enterprise Apache projects such as Apache ActiveMQ, Camel, CXF, Karaf, Syncope or Falcon) and has been focusing a lot on developer productivity. Given this, as Google announced its proposal for Dataflow to become an Apache Software Foundation incubator project, it became very natural for Talend to join with them to help accelerate development along with a few other companies that share similar interests and core values.

A Series of Firsts for the Apache Software Foundation

Upon acceptance, Dataflow will be the first Apache Software Foundation project offering a set of SDKs allowing the abstraction of the definition and execution of Data Processing/Pipes workflows, supporting complex Data Ingestion and Integration enterprise patterns including routing as well as data and message transformations.

Open Source, Future-Proof

Developers leveraging the Dataflow framework won’t be “locked-in” with a specific data processing runtime and will be able to leverage new data processing framework as they emerge without having to rewrite their Dataflow pipelines, making it Future-proof.

Moving forward, Talend will commit developers to the Dataflow framework, specifically on the Ingestion and Integration front as well as work with the community on future runners. We look forward to contributing to this project and the broader Big Data community.

A Talend Community Coders post brought to you by: Sergey Beryozkin

Who could've thought that Swagger and WADL can be real friends ?
Both Swagger and WADL are about describing REST APIs and while the former has a definite momentum, the latter has proved to be very capable and helpful to JAX-RS users.

The important thing is that we have users who submit WADL documents to the runtime or build time code generators which is all working fine. We also have Swagger users who use cool Swagger features and being happy with a nice UI being generated. And WADL users, while being happy with WADL (which IMHO is indeed a very capable language for describing schema rich XML but with some extensions - even JSON - services) would like to use Swagger to introspect the code generated by WADL processors and have a nice API UI.

So my colleague Andrei and Francesco, Apache Syncope maestro, have driven the work about enhancing a WADL generator to set WADL documentation fragments as Java Docs in the generated sources and then having CXF Swagger features being very smart about enhancing Swagger JSON payloads with these Java Docs, with Francesco doing some magic there. I should also mention Andriy Redko doing some work earlier on directly with Swagger for it to better support JAX-RS annotations and initiating the CXF Swagger project and Aki Yoshida doing a lot of Swagger2 work next.

So here you go, WADL and Swagger United in Apache CXF.

IMHO this project has been a perfect example of the power of the Open Source collaboration with the contributors from different teams working effectively on this project.

More about Sergey Beryozkin: a committer of Apache CXF and Apache Aries.

Sergey has focused on working with web services and XML technologies for over twenty years. He is currently the leader of the Apache CXF JAX-RS and OAuth2 implementation projects. As a software architect in the Application Integration Division of Talend, he focuses on Talend Service Factory and works with Talend colleagues on creating better support for REST.

With the rapid changes to SaaS applications and cloud platforms taking place today, the area of cloud integration is now in constant flux. Years ago, cloud integration used to be seen as a tool that accomplished a simple use case, such as replicating SaaS data to an on-premise database for analytics. However, with the innovations taking place in the cloud with regards to analytics, big data, and application development, the very nature of cloud integration is changing. Here are three major changes in cloud for 2016 that will impact the way in which companies will need to think about their cloud integration strategies.

Central IT Takes Over Cloud Analytics Initiatives

Ever since the announcement of salesforce.com’s Wave Analytics in October 2014, with the tagline “Analytics for the Rest of Us”, the cloud analytics race began. In the very same month, Birst announced its partnership with SAP HANA, touting an architecture that provided more instant analytics to the end user. Barely a year later, in October 2015, Amazon Web Services announced its QuickSight cloud BI service, targeted towards data analysts.

A few other occurrences in the cloud analytics world happened in 2015 that are interesting to note:

It was revealed in January 2015 that a basic Salesforce Wave license costs about $40,000, in addition to other site licenses and per user costs
In April 2015, Domo announced their latest funding round along with an interesting tidbit –customers would have to contact Domo directly if they wanted any data integrations set up so they could maintain the secrecy of their cloud analytics stack
A few days later, Tableau and Birst, once bitter rivals, formed a partnership that allowed Birst users to directly connect to Tableau. Birst would provide a central enterprise data repository from which to create new datamarts for business users, while Tableau brought its data discovery and visualization capabilities to the table
Tata Consulting Services (TCS) and Tableau announced an alliance where TCS would focus on developing “large scale delivery capabilities” for Tableau’s data visualization functionality

These occurrences prove that although cloud analytics was originally targeted towards the self-service needs of business users within LOBs, many of these deployments are more complex than originally thought. The need to contact vendors directly to set up data integrations, enterprise IT-style pricing, the realization that data visualization only forms one layer of the cloud analytics stack, and the involvement of global System Integrators all points towards signs that Central IT will be the major driver of cloud analytics projects in 2016, rather than the individual LOBs.

Your 2016 Cloud Takeaway

Organizations looking at cloud integration solutions to integrate the variety of data sources required for an enterprise-class initiative should choose a cloud integration platform that has a unified data fabric across data sources, transformations, and integration patterns.

Big Data Processing Moves to the Cloud

With the growth in big data, Spark is replacing MapReduce as the data processing standard. The industry is moving more towards real-time streaming data and experimenting with machine learning use cases. To successfully accomplish Big Data use cases in production at scale, all associated infrastructure for provisioning, deployment, logging, and monitoring, as well as ingestion technologies for streaming data and workflow management tools need to be on a single cloud platform. Cloud leaders such as Amazon Web Services have perfected the art of minimizing latency and optimizing clusters for the processing of large datasets. The recent January 2016 announcement by Chinese e-commerce giant Alibaba launching 20 new online services within its AliCloud offering related to big data only underscores the importance of having a cloud-centric big data strategy. This trend will only accelerate in 2016.

Your 2016 Cloud Takeaway

Using cloud integration as a centerpiece for big data processing in the cloud is a strategy that enterprise architects should consider. Since the future of big data is real-time, organizations should realize that there are different real-time streaming technologies out there, such as Kafka, Storm, and Spark Streaming to name a few. Each streaming technology serves a different use case, and a cloud integration platform needs to connect to a variety of these streaming technologies.

“Hybrid Integration” Gets Redefined Again

The “hybrid integration” term has been one of the most confusing terms out there. In recent years, middleware and integration providers have used the term to mean many things. While it is quite clear that a hybrid integration scenario is one involving cloud and on-premises data sources, the specificapproach taken towards solving the challenges involved with integrating these apps will define how successful enterprises are in deriving value out of their investments.

Your 2016 Cloud Takeaway

The ever-changing definition of hybrid integration will continue to evolve due to the explosion of varieties of data within the internet of things (IoT), new open source big data technologies, and the emerging area of microservices. At the end of the day, it is important that companies realize that it is in their best interests that all their apps, databases, and infrastructure move to the cloud. A hybrid integration strategy that supports this end goal will ensure that innovation moves at a much more rapid pace.

What are your thoughts about what's ahead for cloud this year? Share them with me @Ash__V on Twitter or comment below!

Related Resources

Is Cloud Integration Best for Your Organization?

Products Mentioned

Talend Data Fabric

Talend Integration Cloud

The last quarter of 2015 was marked by our 7^th annual user conference, Talend Connect. After day one, which focused exclusively on retail partners and integrators, the user conference continued with some 300 participants heeding Talend's call, including customers, sponsors, exhibitors, journalists and analysts, to name a few.

Video>> Talend Connect 2015 Recap

Rethinking How We Consume Data

The success of the event and the quality of our customers' and partners' presentations testify once again to Talend's premier position in the areas of innovation, agility and flexibility. Users worldwide are well aware of the capability of the Talend approach to industrialize processes and contribute to data integration, sharing, quality and governance. We also made great strides this fall by offering the Hadoop integration suite—the most powerful solution of its kind on the market—made possible with the native support of Spark and Spark streaming. Talend Connect gave visitors a front-row seat as customers such as Air France, m2ocity and Orange gave testimonials on the innovations they were able to achieve with our products.

In 2016, we will go even further as we help companies meet the challenges of digital transformation. With Talend Integration Cloud and our Data Preparation solutions available in the first quarter of 2016, we will be speeding up efforts to unite business workers and IT teams on the one hand, and on the other, we’ll be solidifying the idea of "self-service" throughout organizations. Self-service will be a priority for IT in the coming decade—even if some may not recognize it yet. Business Intelligence and Data Discovery tools have allowed users to freely navigate through data and to create reports and analyses themselves. Talend Data Preparation gives them the independence they need to supply these analyses to the business with informed insight. This is particularly pertinent with Big Data, because when it comes to achieving a comprehensive picture, the industry is increasingly favoring data, which can often originate from outside the company, such as stock market conditions, buyer confidence ratings, etc. Line-of-business workers can gather useful data and turn it into quality data, shape them, enrich them and finally, integrate them, all on their own. This type of solution is a win-win for everyone involved:

- The business analyst is frustrated by spending most of their time researching useful data instead of analyzing it for helpful insights. With the right data preparation tool they can become exponentially more productive by using interactive data manipulation features;

- Line-of-business workers no longer need to depend on IT, business analysts or any other middlemen to access the data they need, in the form and when they need it;

- The IT director has a channel through which open up access to corporate data lakes for line of business managers, while maintaining control over business information.

In 2016, Talend will continue to innovate and revolutionize the way data is consumed within companies.

Big Data: the Driver of the Next Industrial Revolution

There’s no denying that data is everywhere: corporate information systems, Clouds, customers, suppliers and connected objects. In fact, IDC expects the digital universe to reach 44 zettabytes (44,000 billion gigabytes) by 2020.[1] Winning companies are those that can cash in on the growing amount of data available and make it suitable for use by becoming data-driven. Data has the power to transform business processes, business models and the way in which companies make decisions.

The entire economy is transforming, from the design of an object to its consumption. We are in the middle of the next industrial revolution. Real-time data, predictive analytics and the Internet of Things (IoT) are all part of this revolution and are reshaping entire industries. Agriculture, manufacturing, energy and health are all industries being profoundly impacted by the use of data to solve tremendous challenges: how do we feed the growing world population? How do we optimize energy resources and develop greener alternatives? How do we detect a disease outbreak in its earliest stage? How do we evaluate the likelihood of developing a disease or not?

There are already numerous initiatives underway and several of Talend's customers—such as GE Healthcare, Springg and m2ocity—have already embraced this data revolution.

For example, Springg, a DutchSprouts software provider, helps farmers maximize their crop production in order to feed more people and help impoverished farmers make enough money to send their kids to school. Another very telling example is seen in m2ocity, France's leading provider of remote meter reading solutions. m2ocity collects and processes data from its network of more than 1.6 million meters distributed throughout 2,000 cities in order to help major metropolises operate more efficiently and conserve precious resources.

This year Talend will celebrate its tenth anniversary. I was the 21^st person to join the team and I have been fortunate enough to watch our early dream become a reality. The start-up that would revolutionize the integration market, and which now leads the market for Big Data integration solutions, paving a new path and pioneering a revolution. But the best part of all is that the adventure is just beginning.

Join us for what’s to come in 2016!

Related Resources

With Talend, Speed Up Your Big Data Integration Projects

Easier Data Integration: 5 Steps to Success

Products Mentioned

Talend Big Data

Talend Integration Cloud

[1] Source: 7th Annual IDC Digital Universe Study

Le dernier trimestre 2015 a été marqué par la 7^èmeédition de notre conférence utilisateurs, le Talend Connect. Après une journée dédiée aux partenaires revendeurs et intégrateurs, la conférence utilisateurs a pris place le lendemain et c’est quelques 300 participants qui ont répondu à l’appel de Talend : utilisateurs, clients, sponsors, exposants, journalistes, analystes…

Vidéo>> Aperçu du Talend Connect 2015

Consommer les données autrement

Le succès de cet événement et la qualité des présentations de nos clients et partenaires présents témoigne une fois de plus du leadership de Talend en matière d’innovation, d’agilité et de flexibilité. Les capacités d’industrialisation des processus contribuant à l’intégration, l’échange, la qualité, la gouvernance des données sont bien connues de nos utilisateurs dans le monde. Nous avions d’ailleurs récemment pris plusieurs longueurs d’avance cet automne en rendant disponible la suite d’intégration pour Hadoop la plus performante du marché avec le support natif de Spark et Spark streaming. Talend Connect a permis à nos visiteurs d’entendre les témoignages de clients tels que Air France, m2ocity, Orange et des innovations qu’ils ont pu réaliser grâce à nos produits.

En 2016, nous allons encore aller plus loin pour aider les entreprises face aux enjeux de la transformation numérique. Avec Talend Integration Cloud et notre solution de Data Preparation disponible au premier trimestre 2016, nous allons accélérer le rapprochement des utilisateurs métiers et des équipes informatiques d’une part et concrétiser l’idée de « Self-Service » au sein des organisations d’autre part. Le self-service devient une priorité IT pour la prochaine décennie. Les outils de Business Intelligence et de Data Discovery ont permis aux utilisateurs de naviguer librement dans les données et de créer leurs rapports et analyses par eux-mêmes. Talend Data Preparation leur donne l’autonomie nécessaire pour alimenter ces analyses avec les données pertinentes - cela se confirme d’autant plus avec les Big Data, car pour obtenir une vue 360, le métier est de plus en plus friand de données (souvent extérieures à l’entreprise), comme la météo (pour comprendre les variations de ventes en fonction du temps), ou bien la géolocalisation pour suivre les articles en magasin - ils peuvent par eux-mêmes rassembler les données utiles, puis les mettre en qualité, en forme, les enrichir et enfin les connecter entre-elles ; tout le monde est gagnant dans cette dynamique :

- Le Business Analyst, jusqu’à présent frustré de passer le plus clair de son temps à rechercher les données utiles plutôt qu’à les analyser, gagne en productivité grâce à des fonctions interactives de manipulation de données ;

- L’utilisateur métier ne dépend plus de l’informatique, du business analyste, ou d’un quelconque autre intermédiaire pour accéder aux données dont il a besoin quand il en besoin;

- La DSI dispose d’un canal qui lui permet à la fois de déléguer certaines tâches aux utilisateurs métiers, désormais rendus autonomes, tout en gardant le contrôle, puisque l’outil lui permet de superviser les accès et de mettre en place un cercle vertueux de données et une gouvernance à grande échelle.

En 2016, Talend va continuer à réinventer les usages et révolutionner la manière dont les données sont consommées au sein des entreprises.

Big Data : moteur de la prochaine révolution industrielle

Les volumes de données qui transitent au sein d’une entreprise augmentent considérablement. IDC prévoit pour 2020 un univers numérique de 44 zettaoctets (44 mille milliards de gigaoctets).[1] Les données sont partout : système d’information de l’entreprise, Cloud, clients, fournisseurs, objets connectés. Les entreprises gagnantes sont celles qui exploitent à leur profit la masse grandissante des données disponibles et exploitables en devenant des entreprises orientées données.

Aujourd’hui, les données ont le pouvoir de transformer les processus métiers, les modèles de business, la manière de prendre des décisions.

C’est toute l’économie qui se transforme, de la conception d’un objet à sa consommation – nous sommes au cœur de la prochaine révolution industrielle. Le temps réel, l’analyse prédictive, l’Internet des Objets font partie des composantes de cette révolution et sont en train de transformer des secteurs d’activité tout entier. L’agriculture, l’industrie, l’énergie et la santé sont les secteurs qui sont en train de connaître les plus gros bouleversements et les enjeux sont majeurs : comment pourra-t-on nourrir tous les êtres humains ? Comment optimiser nos ressources énergétiques et développer des solutions plus écologiques ? Comment détecter une épidémie à son stade le plus précoce ? Comment évaluer la probabilité de développer ou non une maladie ? Comment optimiser nos méthodes de production ?

Les initiatives sont déjà nombreuses et certains clients de Talend, comme GE Healthcare, Springg et m2ocity, ont déjà pris la révolution en marche.

Prenons par exemple le cas de Springg, un éditeur de logiciels du groupe DutchSprouts dont le but est d’aider les agriculteurs à optimiser leur production. Les données des sols sont récoltées via différents capteurs sont traitées et analysées en temps réel ; les agriculteurs peuvent alors prendre les mesures nécessaires en fonction des résultats.

Un autre exemple très parlant est celui de la société m2ocity, premier opérateur de télérelevé en France. Acteur majeur dans le développement de villes intelligentes, m2ocity collecte et traite des données issues des compteurs raccordés à son réseau de plus d’1,6 millions de compteurs répartis dans 2000 communes. La société peut ensuite les distribuer rapidement dans un format convivial, mais également développer de nouveaux services à valeur ajoutée.

Talend va fêter ses dix ans cette année. J’étais le 21^èmeà rejoindre l ‘équipe et j’ai la chance d’avoir vu notre vision du départ devenir réalité. La start-up qui voulait révolutionner le marché de l’intégration et qui est maintenant leader sur le marché des solutions d’intégration des Big Data a su trouver une nouvelle voie et montrer le chemin à suivre … et le plus excitant en définitive, est que l’aventure ne fait que débuter.

Au plaisir de vous retrouver en 2016.

Ressources associées

Avec Talend, accélerez vos projets d'intégration des Big Data

Réussir vos projets d'intégration de données

Produits mentionnés

Talend Big Data

Talend Integration Cloud

[1] Source: 7th Annual IDC Digital Universe Study

February is here and it seems that everyone has made their predictions (even me!) for what’s ahead for technology in 2016. As I read through the various predictions lists, I started discovering a common theme: the inclusion of real-time data initiatives as a main priority for IT. Real-time streaming data and analytics is rapidly becoming the lifeblood of today’s data-driven economy. Why? Because your customers live in an age of information immediacy.

The need to move at the speed of real-time is officially a reality for every business. Here are three trends I believe are driving the real-time big data movement:

1. The Ascendance of Apache Spark

Apache Spark is no longer just a talking point for IT. A recent survey of IT managers, developers, business intelligence/data analysts and data scientists found that 70 percent of respondents listed Apache Spark as the compute framework they’re most interested in deploying. The main reasons behind this interest include:

- Ease of deployment

- Speed

- Support of a wide range of programming languages.

At it’s core, Apache Spark brings the productivity of function programming and the speed of in-memory data processing to Hadoop. Data integration platforms, like Talend Real-Time Big Data, that are built on Spark enable your big data projects to deliver real-time results and value.

Data driven companies like Uber, Yahoo! and Air France are using the big data processing power of Spark to deliver new customer services, react to data and feedback in real-time, and get a leg up on the competition. Think back to the last time you called a taxi. It’s been awhile hasn’t it? There hundreds of compelling scenarios where Spark can be used – from real-time product recommendations in eCommerce, to optimizing IoT-equipped sensors on airplanes to predict and analyze maintenance issues. When business success is dependent on acting fast, Spark is simply the answer.

Download>> Take Talend Real-Time Big Data for a Test Drive

2. IoT is Evolving and Businesses Are Looking to Take Advantage

Speaking of concepts moving from talking points to true development, IoT is no longer an IT buzzword. According to a recent report, a staggering 55 percent of organizations reported that they were either using IoT systems to collect marketplace operations or data, or planned to do so within a year.

We are living in an age of data abundance due to billions of connected things. However a recent study by McKinsey showed that less than 1 percent of all IoT data is actually being used for decision making today! Capturing the value of data in real-time will shine a light on the true value of the IoT movement.

Companies applying open, native technologies like Talend for real-time big data integration, which requires zero hand-coding, utilizes Spark, Spark Streaming and Spark machine learning, can start to get more insight from their data.

3. Real-Time Data Yields Real Business Results, And That’s the Tipping Point

While technology improvements can start the conversation on being data-driven, businesses need to extract bottom line benefits in order for it to really catch on. The good news is that it’s happening.

Today’s business must recognize that the Age of the Customer has descended upon us. Customers can gain immediate information on lower costs or better services with the click of a button. To meet this challenge, businesses must leverage real-time information to understand customer needs and deliver them.

Related Resources

With Talend, Speed Up Your Big Data Integration Projects

Advanced Analytics with Apache Spark

Products Mentioned

Talend Big Data

It’s almost time for the big game–the one and only, Super Bowl—and this year promises to be even bigger and better than ever. Super Bowl 50 takes place Feb. 7, 2016, at Levi's Stadium in Santa Clara, California, where 75,000 people are expected to cram in for the game. But even more than that number are expected to be watching from home or elsewhere via broadcast and online.

Last year’s epic game between the Patriots and the Seahawks was the most watched broadcast in U.S. TV history with an average audience of 114.4 million viewers. Even the halftime show by Katy Perry during the 2015 Super Bowl earned the ‘most-watched’ performance in Bowl history. And while most fans may not notice the difference, big data has the potential to change the Super Bowl experience in several ways.

Retailers Looking to Data as Product Demand Scales

As millions of Americans pick up their favorite snacks for Super Bowl parties, sports experts are spending hours on end pontificating on the minutiae of the tiniest details happening both on and off the field in an effort to predict the winner of the big game. So as living rooms pack out with casual and diehard fans alike, it’s estimated that over a BILLION chicken wings, 11.2 million pounds of potato chips, 8.2 million pounds of tortilla chips, 3.8 million pounds of popcorn and three million pounds of nuts will be consumed during Super Bowl 50. With numbers like those, retailers better be sure to stock up on their supplies in time for kick-off—a key area where big data can help.

Download>> Test Drive Talend Real-Time Big Data for Free

Using Data to Predict the Outcomes

What remains to be seen is whether or not this year’s matchup between the Denver Broncos and Carolina Panthers or the halftime show performed by Coldplay, Beyonce and Bruno Mars will be able to top 2015 records. This is the third straight season wherein the top seeds from each conference are facing off in the Super Bowl. It’s happened only three times since 1990 prior to this stretch. A victory in Super Bowl 50 would give Peyton Manning his 200th career win, including playoffs. Manning would be the only QB in NFL history with 200+ wins, surpassing Brett Favre (199 wins, including playoffs) for the greatest number of wins overall.

One of the favorite pastimes in all sports, not just the NFL, is to predict the outcome of each contest. Predicting a Super Bowl winner is far from an easy task, especially when both teams are so evenly matched, but many experts have turned to big data in the hopes it can provide added insights on game outcomes.

In the average football game, there are numerous data points that sports analysts track—from individual player performance to overall team stats. This is an area where big data can take insights to the next level—measuring things like total distance each team will travel to get to the game, the impact of weather conditions on individual plays, and comparisons between different player matchups. Another data set used in today’s NFL: equipping each player with sensors in shoulder pads so that coaches can access detailed location data on each player, and from that data, analyze things like player acceleration and speed.

From all these different stats and figures, big data algorithms can be created to come up with an eventual winner in any game. The challenge to create the most accurate algorithm is one that a handful of businesses and institutions have already looked at in the past. For example, one company, Varick Media Management, created their own Prediction Machine that boasted a 69 percent accuracy rating during the 2013-2014 NFL regular season as well as an impressive record for other championship games. Facebook also tries to predict a winner from an analysis of social media data. Speaking of, in 2015 over 28.4 million tweets related to the game and halftime show were sent during the live telecast—making it the most tweeted Super Bowl ever.

Even though these algorithms take into account a lot of data, the results are far from being 100 percent accurate. After all, while Varick Media Management accurately predicted the Seahawks would win last year’s Super Bowl and Facebook predicted a Denver Broncos victory in 2014, which ended up being a blowout loss to the Seahawks.

Advertising Technology

Going beyond sports analysis and the big game, big data may have a big impact on the thing many fans anticipate most: the commercials. Super Bowl ads cost millions of dollars; and research seems to show that only about 20 percent of those ads lead to more products sold. Additionally, with big data collected through social media listening tools, companies can potentially get a picture of what people talk about most before, during, and after the game. Hence, using big data analysis tools, companies can potentially create more targeted advertising campaigns to drive more engagement, making their Super Bowl ad a better return on investment.

The Super Bowl remains an exciting game that tens of millions of people around the world will enjoy, but many aspects of the game are likely to change as we continue to thrive in the era of big data. Whether it be in terms of predicting the most likely winner of the game or how advertising is handled, big data stands to have a significant impact. In the meantime, fans can still watch some of America’s most skilled athletes perform at their best!

Related Resources

With Talend, Speed Up Your Big Data Integration Projects

Products Mentioned

Talend Big Data

Self-service data preparation, which we define as empowering business users and analysts to prepare data for themselves prior to analysis, is often cited as the next big thing. In fact, Gartner predicts that “by 2017 most business users and analysts in organizations will have access to self-service tools to prepare data for analysis”. To me, this sounds like a wake-up call for IT professionals.

The most unique part of the self-service data preparation trend is that goes beyond the office of the CIO. It is a huge step forward for data-driven organizations as a whole. If done well, it will enable anyone to put data at work in their working context whether it be IT, finance, marketing or anything else. In fact, a recent Mckinsey survey found that the use of self-service digital and mobile aids help workers do their jobs more efficiently. This is what we call “the last mile of Business Intelligence and Data Integration”. Self-service enables an organization to not only dramatically extend the reach of data, but also its applicability across any operational tasks.

Introducing Talend Data Preparation

Talend Data Preparation is our proposition to enable business and IT teams to be at the forefront of this major trend. It will empower any decision-maker in your organization to quickly prepare data so they can spend more time using it for analytics and operations.

Look at any organization today, and you’ll find that data is everywhere and it’s relatively useless without being put it in context for consumption. Business analysts spend way too much time making data ready for analysis: find the data source, discover its content, cleansing it, standardizing so that it aggregates correctly for analysis and connects with other data points. 80% of the time for data analysis is spent repetitive tasks, which leaves only 20% of time in getting and sharing the insights. Talend Data Preparation brings efficiency and productivity to those analysts for creating clean and valuable data in minutes, not hours.

Download>>Try Talend Data Preparation for Free

But we designed Talend Data Preparation to reach a larger audience than the business analyst. Digital transformation is not only about empowering a happy few. Every operational task in any business group is a candidate to becoming more data driven. During the roll-out of this product we spoke with business users across all departments in Talend, from marketing campaign managers to financial controller, from digital marketing managers to sales admins. We found that operational workers are spending hours in doing these repetitive tasks with their data. They use Excel rather than Tableau or another similar BI tool. Data Preparation can drastically reduce the amount of time spent on these tasks so Marketers and other business users can get to doing what they do best, rather than struggling with Excel!

This differentiates Talend Data Preparation from most of its competition, since most of them are designed only for a specific audience, such as Business Analyst or Data Scientist. Targeting a specific audience leads to creating another data silo, which might add to the data access pain rather that curing it. In contrast, not only Talend Data Preparation is designed to address the needs of different personas, but it fosters collaboration between them. Even more importantly, it provides a sound way to reconcile IT and Lines of Business so that they can unlock data collaboratively. Self-service is not “Do It Yourself”, because data democratization is not anarchy, it needs control, rules, and governance, otherwise it will fail.

In fact, Gartner predicts that many organizations are actually on the way for self-service failure: "through 2016, less than 10 percent of self-service BI initiatives will be governed sufficiently to prevent inconsistencies that adversely affect the business".

Clean Data, 1 Click Away

The good news is that today, you are just a click away to start delivering on the promises of self-service data governance. Data Preparation Free Desktop has been released. It’s the first Open source tool in this market, so you can download it for free. As its name suggests, it runs on a desktop so that you can install it in a few clicks and get hands on in a matter of clicks. We designed getting started videos and training guide to turn you into a data hero in a matter of minutes.

In addition to what you’ll experience once you downloaded the Free Desktop, the commercial version (that will be released in Q2) will add enterprise class capabilities: multi-user, role-based access; collaboration and data governance; shared inventory of published certified data and other enterprise data; support for hundreds of data sources and targets; high performance server-based data processing. And because it will be delivered as core components within Talend Data Fabric, it will bring self-services across every Talend integration scenarios, from Data Integration to Big Data, from real-time integration to Master Data Management.

Believe me: Talend Data Preparation is an easy catch, it will bring immediate value to your daily job, allow you to reach quickly new milestones in your data journey… and it is fun to use. We’d love to have your on-board and get your feedback (and we even designed a nice feedback form into the product just for that)!

Leave your comments on what you think about Talend Data Preparation below or tweet me @jmichel_franco!

Related Resources

With Talend, Speed Up Your Big Data Integration Projects

Products Mentioned

Talend Data Preparation

How many hours have you slaved away in MS Excel changing fields like customer name or address because you need to blend the two and your CRM or SFA doesn’t format data fields in the same way? I cannot tell you how many hours have I spent in MS Excel trying to clean up data or shape data to look how I need it to before sending to my ODS or CDW. As a former IT manager, I was always embarrassed to tell my business owners that their simple request to reformat or split fields in our ODS would take 3 weeks to develop, test and deploy to production. It just seemed crazy to me. I once worked with a gentleman, let's say his name is Bob, that got so good with Excel and VBA scripting that his Excel sheets became applications that his business owners loved so much they used his Excel sheets to replace an application that was meant to cleanse, shape, blend and aggregate our target physician data for the Sales team.

It's truly crazy how good people get with Excel macros and VBA scripting or creating full on visualizations of Mario. But, I never wanted to be an Excel wizard, and I am betting you don't either. Last November, I got early access to the pre-beta version of the new Talend Data Preparation tool and I must tell you this is going to save both IT and Business people HOURS of time.

What Can I Do With Data Prep?

With Talend Data preparation I can do and see so many things with my data in a matter of minutes that otherwise would take me forever and about 3 other additional tools to get done.

Data Discovery

In the Data Prep tool, I can quickly do some impressive data discovery on the datasets I’m trying to work with. It uses these crazy cool semantic libraries to tell me if a field is of a certain type, and I don't mean just text or integers. I am talking about identifying if the data is a country field, US State, or date field (doesn't matter if it is different formats either!). So, when cleansing data, right off the bat I see if my data conforms exactly how it should as well as how much data within my datasets doesn’t match the expected data types using the cool data quality bars.

Cleaning Data

To top off the sweet data discovery and profiling, I can also fix most of my data issues in the Talend Data Preparation tool by using a few of the kick ass functions it provides. For example, if one of my fields, like first name, has leading or trailing spaces I can clean those spaces out with one click.

Think about doing this same job Excel; I would have to write a function to trim left and right, about 5-6 clicks and some typing. I can do a lot of other great functions like split a single field into multiple fields based on some delimiter like an underscore or a dash. How many systems put those concatenated fields into one field and it's up to you to split them out? The answer is too many to count. Well, with this simple function it’s no longer an issue. There are hundreds of functions that you can choose from that help you fix your data, and they give great suggestions of possible functions based on the type of data you are looking at in the field.

Download>>Try Talend Data Preparation for Free

Data Visualization

Did I mention the visualization in the tool? It is packed with helpful charts and graphs of the field's data to help you zero in on data that needs to be resolved.

Go Hands on With Data Prep

In our first Data Prep intro video we show you how one field, which is a marketing lead score, can be cleaned in an instant. The field we’re looking at should be an absolute value between 0 and 100, but the graph in the tool is extremely skewed too the far right indicating an issue with the number. Once we click on the graph to filter my top scores and we can see that some of my data came with 999 as a score.

This is clearly a default in a system, which always wreak s havoc on other systems. With a quick double click and typing in 0 then checking a box to apply to all data with the same value I fix the high end issue. But, when the graph refreshes we see a new problem. We now have negative marketing scores (this data is killing me)!! But no fear, a simple (suggested) function changes those negatives to zero for me! Now, think about how you can apply similar logic to things like inventory levels, system ratings gone wrong, you name it, you can clean it up here.

The Recipe for Clean Data

Here’s the cool part, throughout the process I am taking action on the data, the tool is keeping track of my cleansing in a simple list we call “the recipe”. Yeah, the recipe to fix my data is just listed for my convenience on the left side of the tool and if I want to remove a step because I don't need it any more I can just click on a little trash can and there it goes out of the recipe!

I can also save the recipe so that I can share it with friends and colleagues or reuse it on a new dataset. I can export my clean and shaped data out to a new dataset or into other systems like Tableau to see the results of my work. The great thing about those recipes is that my original data is untouched. The preparation recipe is just showing me what the data would look like if I took those actions. To realize the end result of the recipe's preparation I just need to export the results. (Just in case you can't change the original dataset, just a little compliance issue many people deal with!)

There are endless possibilities in what you can do with the Talend Data Preparation tool and your data. If you spend hours fixing and cleaning or reshaping your data files in spreadsheets or other tools on your desktop then you NEED to check out Talend Data preparation out today! I started out talking about data from a CRM or SFA being mismatch and a mess, well just think about the Data Lakes and Reservoirs sitting out in the Big Data Hadoop systems! Typically, this is raw untransformed data which is going to cause an even bigger headache to cleanse and utilize. The Talend Data Preparation tool is able to connect to all your data repositories and help you speed up that cycle of discovering, cleansing, resolving all your troublesome data.

I invite you to watch this introduction video, and even better try out the free Desktop version for yourself, import your worst data and let us know how you fixed your data with Talend! I dare you.

Related Resources

With Talend, Speed Up Your Big Data Integration Projects

Products Mentioned

Talend Data Preparation

Picture this: You just got back from a week in Vegas staffing yet another industry tradeshow. You’re exhausted, completely backlogged on email and other projects, and you have boatloads of new leads that need to be validated and entered into the CRM system….

You stare at that gigantic excel spreadsheet of names and think, “This is going to take me all day!” Sound familiar? Well don’t pull your hair out just yet—help has arrived.

Manually entering and cleansing your data is like watching paint dry—mind-numbingly boring and an enormous time killer. It’s right up there with balancing your checkbook, doing dishes and cleaning out the garage…

But it’s a highly important part of the process because like the old saying goes, "garbage-in = garbage-out”—i.e. analyzing unprepared data can lead to highly misleading results. Well, what if there was a ridiculously easy tool that reduced the time it took to clean and format your data from hours to minutes?

Thankfully the brilliant data minds at Talend feel your pain and have responded with a new solution called Talend Data Preparation. This new, free desktop application enables you to easily evaluate, cleanse and combine data from different sources in minutes with super easy, guided tools that walk you through the process. It’s like taking a walk in the park…

That’s right, you read it here first kids! So start planning that afternoon basketball game or manicure/pedicure because you’re about to get HOURS back in your day. Ok, maybe I’m being a tad too optimistic on that front, but in all seriousness, Talend Data Preparation, allows you to stop spending valuable time “cleaning and crunching data” and focus your efforts on more strategic tasks like analyzing the ROI of the event overall and be able to share that insight back with your team (earning you major brownie points—Bonus!!)

Download>>Try Talend Data Preparation for Free

Here’s a look at how Talend Data Preparation can make your job easier. First, let’s consider some of the steps you take when readying data for analysis in order to determine if that recent tradeshow had significant ROI. Starting off, your messy data may look a little something like this:

Now you have to check the data for accuracy; ensure the values used are valid—i.e. Mr. vs. Mrs. or Ms.; ensure the information is complete in all fields; eliminate duplicate entries; transform and standardize the data; load the data into your Marketing Automation of CRM system and finally develop and document a database structure that integrates the various measures.

Are you tired yet? I am…and my eyes are glazing over from looking at this same spreadsheet for the past three hours….

Now, with Talend Data Preparation, you can access, cleanse and prepare data in a simple, intuitive way. First, when you enter the tool, it prompts you with a pop-up guide to import your Excel file or other types of data sources. Then you select that same spreadsheet and in just a few clicks you can cleanse, standardize, complete fields and filter the data by region, job title, etc. in order to gain better insight into the quality of your new leads. Your once messy spreadsheet now looks like this:

Now you can easily import the cleansed data into Tableau in order to make the next event perform even better. Additionally, as you’ve been selecting and cleaning up various fields in your spreadsheet, Talend Data Preparation has been saving a record of those selections. The choices you make then constitute a ‘recipe’ that you can keep on file for future reference to avoid rework. So the first time you work with a dataset, you save time. But when the data is refreshed (i.e. at next year’s show), you not only reduce time, but you essentially eliminate the work needed entirely! (Now you definitely have some free time in your day for at least a coffee break or walk through the park during lunch).

But the BEST part is of Talend Data Preparation is: It’s not just for marketers. Talend Data Preparation is also perfectly suited for human resource and benefits administrators, business and financial analysts, or really any role that spends a bunch of time working with data or ripping their hair out over spreadsheets. So whether you’re cleansing tradeshow leads, calculating the upside of a potential acquisition, analyzing transactions or sizing up the best dental, vision, life insurance and flexible spending plans for your company—now access to insight is just a click away!

You’re bright and good at what you do—so why be spending more time wrangling data vs. sharing your insights with your team? So now, really the only question you should be asking yourself is: where can I get it?!!! Download your free version of Talend Data Preparation today by simply clicking here and start your journey to an easier day at the office.

Related Resources

With Talend, Speed Up Your Big Data Integration Projects

Products Mentioned

Talend Data Preparation

Bei Big Data ist Geschwindigkeit das A und O

Prognosen für 2016 – Vier Arten, wie sich Big Data und Analytik auf jedes Unternehmen auswirken werden

Quand on parle de Big Data, les performances sont essentielles !

L'avenir de l'Internet des Objets – Quatre points à prendre en compte

Improve Customer Engagement and Generate More Business with Apache Spark

How To Turn Any Big Data Project Into a Success (And Key Pitfalls To Avoid)

Start Easily Using Apache Spark With Talend 6!

Talend’s Benchmark Against Informatica – Setting the Record Straight

My Challenge to Informatica: Let’s Play

All Talend MDM Users Can Now Help Create a Golden Record

Talend Joins Google to Propose Dataflow as an ASF Incubator Project

Google Leading the Open Source Charge with Dataflow SDK

A Series of Firsts for the Apache Software Foundation

Open Source, Future-Proof

WADL and Swagger United in Apache CXF

3 Cloud Trends to Prepare for in 2016

Is Cloud Integration Best for Your Organization?

Talend Connect 2015: Rethinking Data

Rethinking How We Consume Data

Big Data: the Driver of the Next Industrial Revolution

Talend Connect 2015 : Repenser les données

Consommer les données autrement

Big Data : moteur de la prochaine révolution industrielle

3 Trends Behind the Movement to Real-Time Data

Big Data and the Big Game: Super Bowl 50

Clean and Actionable Data 1 Click away

Data Prep 101: Getting Started with Talend Data Preparation

Good News Marketeers! Your Day Job Just Got a WHOLE lot Easier