Apache Beam in 2017: Use Cases, Progress and Continued Innovation

This blog was originally published by Anand Iyer & Jean-Baptiste Onofré [@jbonofre] on the Apache Beam blog.

On January 10, 2017, Apache Beam (Beam) got promoted as a Top-Level Apache Software Foundation project. It was an important milestone that validated the value of the project, legitimacy of its community, and heralded its growing adoption. In the past year, Apache Beam has experienced tremendous momentum, with significant growth in both its community and feature set. Let us walk you through some of the notable achievements.

Use cases

First, let’s take a deeper look at how Apache Beam was used in 2017. Beam, being a unified framework for batch and stream processing, enables a very wide spectrum of diverse use cases. Here are some use cases that exemplify the versatility of Beam:

Community growth

In 2017, Apache Beam had 174 contributors worldwide, from many different organizations. The Apache community was proud to count 18 PMC members and 31 committers among that mix. The community had seven releases in 2017, each bringing a rich set of new features and fixes.

The most obvious and encouraging sign of the growth of the Beam community, and validation of its core value proposition of portability, is the addition of significant new runners (i.e. execution engines). We entered 2017 with Apache Flink, Apache Spark 1.x, Google Cloud Dataflow, Apache Apex, and Apache Gearpump. In 2017, the following new and updated runners were developed:

Apache Spark 2.x update
IBM Streams runner
MapReduce runner
JStorm runner

In addition to runners, Beam added new IO connectors, some notable ones being the Cassandra, MQTT, AMQP, HBase/HCatalog, JDBC, Solr, Tika, Redis, and ElasticSearch connectors. Beam’s IO connectors make it possible to read from or write to data sources/sinks even when they are not natively supported by the underlying execution engine. Beam also provides fully pluggable filesystem support, allowing us to support and extend our coverage to HDFS, Amazon S3, Microsoft Azure Storage, and Google Storage. We continue to add new IO connectors and filesystems to extend Beam use cases.

A particularly telling sign of the maturity of an open source community is when it is able to join forces with multiple other open source communities, and mutually improve the state of each collaborative society. Over the past few months, the Beam, Calcite, and Flink communities have come together to define a robust spec for Streaming SQL, with contributing engineers from over four organizations. If you are excited by the prospect of improving the state of streaming SQL, please join us!

In addition to SQL, new specs for XML and JSON based declarative DSLs are also in the Proof of Concept stage (PoC) and in need of additional contributors.

Continued innovation

Innovation is important to the success on any open source project, and Apache Beam has a rich history of bringing ground-breaking ideas to the open source community. For example, Apache Beam was the first to introduce some seminal concepts in the world of big-data processing:

Unified batch and streaming SDKs that enable users to author big-data jobs without having to learn multiple, disparate SDKs/APIs;
Cross-Engine Portability: Giving enterprises the confidence that workloads authored today will not have to be re-written when open source engines become outdated and are supplanted by newer ones; and
Semantics that are essential for reasoning about unbounded unordered data, and achieving consistent and correct output from a streaming job.

In 2017, the pace of innovation continued with the addition of the following capabilities:

A cross-Language Portability framework accompanied by a Go SDK;
Dynamically Shardable IO (SplittableDoFn);
Support for schemas in PCollection, allowing the community to extend Beam’s runner capabilities; and
Extensions addressing new use cases such as machine learning, and new data formats.

Areas of improvement

Any retrospective view of a project is incomplete without an honest assessment of areas for improvement. In terms of parts of the Beam project still requiring innovation, two aspects stand out:

Helping runners showcase their individual strengths. After all, portability does not imply homogeneity. Different runners have different areas in which they excel, and we need to do a better job of highlighting their individual strengths.
Based on the previous point, helping customers make a more informed decision when they select a runner or migrate from one to another.

In 2018, we aim to take proactive steps to improve the above aspects.

Ethos of the project and its community

The world of batch and stream big-data processing today is reminiscent of the Tower of Babel parable: a slowdown of progress because different communities spoke different languages. Similarly, today there are multiple disparate big-data SDKs/APIs, each with their own distinct terminology to describe similar concepts. The side effect is user confusion and slower adoption.

For this reason, in order to help inspire and accelerate adoption, the Apache Beam project aims to provide an industry standard portable SDK that will:

Benefit users by providing innovation with stability: The separation of SDKs and engines enables healthy competition between runners, without requiring users to constantly learn new SDKs/APIs and rewrite their workloads to benefit from each innovation; and
Benefit big-data engines by growing the pie for everyone: Making it easier for users to author, maintain, upgrade and migrate their big-data workloads will lead to significant growth in the number of big-data deployments in production.

The Apache Beam community is both proud of the tremendous achievements we’ve been able to attain thus far, and we hope you will join us in continuing the project’s advancements in the years to come.

The post Apache Beam in 2017: Use Cases, Progress and Continued Innovation appeared first on Talend Real-Time Open Source Data Integration Software.

Apache Beam in 2017: Use Cases, Progress and Continued Innovation

Use cases

Community growth

Continued innovation

Areas of improvement

Ethos of the project and its community

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112