How to Implement a Job Metadata Framework using Talend

Today, data integration projects are not just about moving data from point A to point B, there is much more to it. The ever-growing volumes of data, the speed at which the data changes presents a lot of challenges in managing the end-to-end data integration process. In order to address these challenges, it is paramount to track the data-journey from source to target in terms of start and end timestamps, job status, business area, subject area, and the individuals responsible for a specific job. In other words, metadata is becoming a major player in data workflows. In this blog, I want to review how to implement a job metadata framework using Talend. Let’s get started!

Metadata Framework: What You Need to Know

The centralized management and monitoring of this job metadata are crucial to data management teams. An efficient and flexible job metadata framework architecture requires a number of things. Namely, a metadata-driven model and job metadata.

A typical Talend Data Integration job performs the following tasks for extracting the data from source systems and loading them into target systems.

Extracting data from source systems
Transforming the data involves:
- Cleansing source attributes
- Applying business rules
- Data Quality
- Filtering, Sorting, and Deduplication
- Data aggregations
Loading the data into a target systems
Monitoring, Logging, and Tracking the ETL process

Figure 1: ETL process

Over the past few years, the job metadata has evolved to become an essential component of any data integration project. What happens when you don’t have job metadata in your data integration jobs? It may lead to incorrect ETL statistics and logging as well as difficult to handle errors occurred during the data integration process. A successful Talend Data Integration project depends on how well the job metadata framework is integrated with the enterprise data management process.

Job Metadata Framework

The job metadata framework is a meta-data driven model that integrates well with Talend product suite. Talend provides a set of components for capturing the statistics and logging information during the flight of the data integration process.

Remember, the primary objective of this blog is to provide an efficient way to manage the ETL operations with a customizable framework. The framework includes the Job management data model and the Talend components that support the framework.

Figure 2: Job metadata model

Primarily, the Job Metadata Framework model includes:

Job Master
Job Run Details
Job Run Log
File Tracker
Database High Water Mark Tracker for extracting the incremental changes

This framework is designed to allow the production support to monitor the job cycle refresh and look for the issues relating to job failure and any discrepancies while processing the data loads. Let’s go through each of piece of the framework step-by-step.

Talend Jobs

Talend_Jobs is a Job Master Repository table that manages the inventory of all the jobs in the Data Integration domain.

Attribute	Description
JobID	Unique Identifier to identify a specific job
JobName	Job Name is the name of the job as per the naming convention (<type>_<subject area>_<table_name>_<target_destination>
BusinessAreaName	Business Unit / Department or Application Area
JobAuthorDomainID	Job author Information
Notes	Additional Information related to the job
LastUpdateDate	The last updated date

Talend Job Run Details

Talend_Job_Run_Details registers every run of a job and its sub jobs with statistics and run details such as job status, start time, end time, and total duration of main job and sub jobs.

Attribute	Description
ID	Unique Identifier to identify a specific job run
BusinessAreaName	Business Unit / Department or Application Area
JobAuthorDomainID	Job author Information
JobID	Unique Identifier to identify a specific job
JobName	Job Name is the name of the job as per the naming convention (<type>_<subject area>_<table_name>_<target_destination>
SubJobID	Unique Identifier to identify a specific sub job
SubJobName	Sub Job Name is the name of the sub job as per the naming convention (<type>_<subject area>_<table_name>_<target_destination>
JobStartDate	Main Job Start Timestamp
JobEndDate	Main Job End Timestamp
JobRunTimeMinutes	Main Job total job execution duration
SubJobStartDate	Sub Job Start Timestamp
SubJobEndDate	Sub Job End Timestamp
SubJobRunTimeMinutes	Sub Job total job execution duration
SubJobStatus	Sub Job Status (Pending / Complete)
JobStatus	Main Job Status (Pending / Complete)
LastUpdateDate	The last updated date

Talend Job Run Log

Talend_Job_Run_Log logs all the errors occurred during particular job execution. Talend_Job_Run_Log extracts the details from the Talend components specially designed for catching logs (tLogCatcher) and statistics (tStatCacher).

Figure 3: Error logging and Statistics

The tLogCatcher component in Talend operates as a log function triggered during the process by one of these components: Java exceptions, tDie or tWarn. In order catch exceptions coming from the job, tCatch function needs to be enabled on all the components.

The tStatCatcher component gathers the job processing metadata at the job level.

Attribute	Description
runID	Unique Identifier to identify a specific job run
JobID	Unique Identifier to identify a specific job
Moment	The time when the message is caught
Pid	The Process ID of the Job
parent_pid	The Parent process ID
root_pid	The root process ID
system_pid	The system process ID
project	The name of the project
Job	The name of the Job
job_repository_id	The ID of the Job file stored in the repository
job_version	The version of the current Job
context	The Name of the current context
priority	The priority sequence
Origin	The name of the component if any
message_type	Begin or End
message	The error message generated by the component when an error occurs. This is an After variable. This variable functions only if the Die on error checkbox is cleared.
Code
duration	Time for the execution of a Job or a component with the tStatCaher Statistics check box selected
Count	Record counts
Reference	Job references
Thresholds	Log thresholds for managing error handling workflows

Talend High Water Marker Tracker

Talend_HWM_Tracker helps in processing delta and incremental changes of a particular table. The High Water Tracker is helpful when the “Change Data Capture” is not enabled and the changes are extracted based on specific conditions such as “last_updated_date_time” or ‘revision_date_time.” In some cases, the High Water Mark relates to the highest sequence number when the records are processed based on the sequence number.

Attribute	Description
Id	Unique Identifier to identify a specific source table
jobID	Unique Identifier to identify a specific job
job_name	The name of the Job
table_name	The name of the source table
environment	The source table environment
database_type	The source table database type
hwm_datetime	High Water Field (Datetime)
hwm_integer	High Water Field (Number)
hwm_Sql	High Water SQL Statement

Talend File Tracker

Talend_File_Tracker registers all the transactions related to file processing. The transaction details include source file location, destination location, file name pattern, file name suffix, and the name of the last file processed.

Attribute	Description
Id	Unique Identifier to identify a specific source file
jobID	Unique Identifier to identify a specific job
job_name	The name of the Job
environment	The file server environment
file_name_pattern	The file name pattern
file_input_location	The source file location
file_destination_location	The target file location
file_suffix	The file suffix
latest_file_name	The name of the last file processed for a specific file
override_flag	The override flag to re-process the file with the same name
update_datetime	The last updated date

Conclusion

This brings to the end of the implementing Job metadata framework using Talend. The following are key takeaways from this blog:

The need and the importance of Job metadata framework
The data model to support the framework
The customizable data model to support different types of job patterns.

As always – let me know if you have any questions below and happy connecting!

The post How to Implement a Job Metadata Framework using Talend appeared first on Talend Real-Time Open Source Data Integration Software.

How to Implement a Job Metadata Framework using Talend

Metadata Framework: What You Need to Know

Job Metadata Framework

Talend Jobs

Talend Job Run Details

Talend Job Run Log

Talend High Water Marker Tracker

Talend File Tracker

Conclusion

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112