Data science programs aren’t just for students anymore. Now, data scientists can turn to open online courses and other resources to boost their skill sets. We sat down with Isabelle Nuage, Director of Product Marketing, Big Data at Talend to get insight on what resources are out there:
Q: How would you characterize the differences between data science research processes and machine learning deployment processes?
Isabelle: In the grand scheme of things, Data Science is Science. Data Scientists do a lot of iterations, through trial & error, before finding the right model or algorithm that fit their needs and typically work on sample data. When IT needs to deploy machine learning at scale, they’ll take the work from the data scientists and try to reproduce at scale for the enterprise. Unfortunately it doesn’t always work right away as sample data is different in that real life data has inconsistencies often missing values as well as other data quality issues.
Q: Why is putting machine learning (ML) models into production hard?
Isabelle: Data Scientists work in a lab mode, meaning they are often operating like lone rangers. They take the time to explore data, try out various models and sometimes it can take weeks or even months to deploy their data models into production. By that time, the models have already become obsolete for the business. Causing them to have to go back to the drawing board. Another challenge for Data Scientists is data governance, and without it data becomes a liability. A good example of this is in clinical trial data where sensitive patient information has to be masked so it is not accessible by everyone in the organization.
Q: What are the stumbling blocks?
Isabelle: There is a lack of collaboration between the Data Science team and IT where each tend to speak their own language and have their own set of skills that the other might not understand. Data Science is often considered to be a pure technology discipline and not connected to business needs as the asks are often tied to the need for fast decision making in order to innovate and outsmart the competition. Existing landscapes, such as enterprise warehouses, are not flexible enough to enable Data Science teams access to all the historical and granular information as some data is stored on tapes. IT is needed to create a Data Lake in order to store all that historical data to train the models and add the real-time data enabling real-time decisions.
Q: How are enterprises overcoming them?
Isabelle: Enterprises are creating Cloud data lakes (better suited for big data volumes and processing) and leveraging the new services and tools such as serverless processing to optimize the cost of machine learning processing on big data volume. Additionally they are also creating a center of excellence to foster collaboration across teams as well as hiring a Chief Data Officer (CDO) to really elevate data science to a business discipline.
Q: What advice might you offer enterprises looking to streamline the ML deployment process?
Isabelle: Use tooling to automate the manual tasks such as hand-coding that foster collaboration between the Data Science and IT teams. By letting the Data Science team explore and do their research, but let IT govern and deploy data so it’s not a liability for the organization anymore. And doing this in a continuous iteration and delivery fashion will enable continuous smart decision making throughout the organization.
Q: What new programs for learning data science skills have caught your attention and in what ways do they build on traditional learning programs?
Isabelle: I’m most interested in new tools that democratize data science, provide a graphical, easy-to-use UI and suggest the best algorithms for the dataset, rather than going through a multitude of lengthy trials and errors. These tools make data science accessible to more people, like business analysts, so more people within the enterprise can benefit from the sophisticated advanced analytics for decision-making. These tools help people get a hands-on experience without needing a PhD.
Q: What are some of your favorite courses and certifications?
Isabelle: I’d say, Coursera as it offers online courses where people can learn at their own pace, they even offer some free data science and free machine learning courses too. Another great option is MIT eLearning, which also offers course for Data Science and Big Data.
Check out Talend Big Data and Machine Learning Sandbox to get started.
The post Data Scientists Never Stop Learning: Q&A Spotlight with Isabelle Nuage of Talend appeared first on Talend Real-Time Open Source Data Integration Software.