Today, data integration projects are not just about moving data from point A to point B, there is much more to it. The ever-growing volumes of data, the speed at which the data changes presents a lot of challenges in managing the end-to-end data integration process. In order to address these challenges, it is paramount to track the data-journey from source to target in terms of start and end timestamps, job status, business area, subject area, and the individuals responsible for a specific job. In other words, metadata is becoming a major player in data workflows. In this blog, I want to review how to implement a job metadata framework using Talend. Let’s get started!
Metadata Framework: What You Need to Know
The centralized management and monitoring of this job metadata are crucial to data management teams. An efficient and flexible job metadata framework architecture requires a number of things. Namely, a metadata-driven model and job metadata.
A typical Talend Data Integration job performs the following tasks for extracting the data from source systems and loading them into target systems.
- Extracting data from source systems
- Transforming the data involves:
- Cleansing source attributes
- Applying business rules
- Data Quality
- Filtering, Sorting, and Deduplication
- Data aggregations
- Loading the data into a target systems
- Monitoring, Logging, and Tracking the ETL process

Figure 1: ETL process
Over the past few years, the job metadata has evolved to become an essential component of any data integration project. What happens when you don’t have job metadata in your data integration jobs? It may lead to incorrect ETL statistics and logging as well as difficult to handle errors occurred during the data integration process. A successful Talend Data Integration project depends on how well the job metadata framework is integrated with the enterprise data management process.
Job Metadata Framework
The job metadata framework is a meta-data driven model that integrates well with Talend product suite. Talend provides a set of components for capturing the statistics and logging information during the flight of the data integration process.
Remember, the primary objective of this blog is to provide an efficient way to manage the ETL operations with a customizable framework. The framework includes the Job management data model and the Talend components that support the framework.

Figure 2: Job metadata model
Primarily, the Job Metadata Framework model includes:
- Job Master
- Job Run Details
- Job Run Log
- File Tracker
- Database High Water Mark Tracker for extracting the incremental changes
This framework is designed to allow the production support to monitor the job cycle refresh and look for the issues relating to job failure and any discrepancies while processing the data loads. Let’s go through each of piece of the framework step-by-step.
Talend Jobs
Talend_Jobs is a Job Master Repository table that manages the inventory of all the jobs in the Data Integration domain.
Attribute |
Description |
JobID |
Unique Identifier to identify a specific job |
JobName |
Job Name is the name of the job as per the naming convention (<type>_<subject area>_<table_name>_<target_destination> |
BusinessAreaName |
Business Unit / Department or Application Area |
JobAuthorDomainID |
Job author Information |
Notes |
Additional Information related to the job |
LastUpdateDate |
The last updated date |
Talend Job Run Details
Talend_Job_Run_Details registers every run of a job and its sub jobs with statistics and run details such as job status, start time, end time, and total duration of main job and sub jobs.
Attribute |
Description |
ID |
Unique Identifier to identify a specific job run |
BusinessAreaName |
Business Unit / Department or Application Area |
JobAuthorDomainID |
Job author Information |
JobID |
Unique Identifier to identify a specific job |
JobName |
Job Name is the name of the job as per the naming convention (<type>_<subject area>_<table_name>_<target_destination> |
SubJobID |
Unique Identifier to identify a specific sub job |
SubJobName |
Sub Job Name is the name of the sub job as per the naming convention (<type>_<subject area>_<table_name>_<target_destination> |
JobStartDate |
Main Job Start Timestamp |
JobEndDate |
Main Job End Timestamp |
JobRunTimeMinutes |
Main Job total job execution duration |
SubJobStartDate |
Sub Job Start Timestamp |
SubJobEndDate |
Sub Job End Timestamp |
SubJobRunTimeMinutes |
Sub Job total job execution duration |
SubJobStatus |
Sub Job Status (Pending / Complete) |
JobStatus |
Main Job Status (Pending / Complete) |
LastUpdateDate |
The last updated date |
Talend Job Run Log
Talend_Job_Run_Log logs all the errors occurred during particular job execution. Talend_Job_Run_Log extracts the details from the Talend components specially designed for catching logs (tLogCatcher) and statistics (tStatCacher).

Figure 3: Error logging and Statistics
The tLogCatcher component in Talend operates as a log function triggered during the process by one of these components: Java exceptions, tDie or tWarn. In order catch exceptions coming from the job, tCatch function needs to be enabled on all the components.
The tStatCatcher component gathers the job processing metadata at the job level.
Attribute |
Description |
runID |
Unique Identifier to identify a specific job run |
JobID |
Unique Identifier to identify a specific job |
Moment |
The time when the message is caught |
Pid |
The Process ID of the Job |
parent_pid |
The Parent process ID |
root_pid |
The root process ID |
system_pid |
The system process ID |
project |
The name of the project |
Job |
The name of the Job |
job_repository_id |
The ID of the Job file stored in the repository |
job_version |
The version of the current Job |
context |
The Name of the current context |
priority |
The priority sequence |
Origin |
The name of the component if any |
message_type |
Begin or End |
message |
The error message generated by the component when an error occurs. This is an After variable. This variable functions only if the Die on error checkbox is cleared. |
Code |
|
duration |
Time for the execution of a Job or a component with the tStatCaher Statistics check box selected |
Count |
Record counts |
Reference |
Job references |
Thresholds |
Log thresholds for managing error handling workflows |
Talend High Water Marker Tracker
Talend_HWM_Tracker helps in processing delta and incremental changes of a particular table. The High Water Tracker is helpful when the “Change Data Capture” is not enabled and the changes are extracted based on specific conditions such as “last_updated_date_time” or ‘revision_date_time.” In some cases, the High Water Mark relates to the highest sequence number when the records are processed based on the sequence number.
Attribute |
Description |
Id |
Unique Identifier to identify a specific source table |
jobID |
Unique Identifier to identify a specific job |
job_name |
The name of the Job |
table_name |
The name of the source table |
environment |
The source table environment |
database_type |
The source table database type |
hwm_datetime |
High Water Field (Datetime) |
hwm_integer |
High Water Field (Number) |
hwm_Sql |
High Water SQL Statement |
Talend File Tracker
Talend_File_Tracker registers all the transactions related to file processing. The transaction details include source file location, destination location, file name pattern, file name suffix, and the name of the last file processed.
Attribute |
Description |
Id |
Unique Identifier to identify a specific source file |
jobID |
Unique Identifier to identify a specific job |
job_name |
The name of the Job |
environment |
The file server environment |
file_name_pattern |
The file name pattern |
file_input_location |
The source file location |
file_destination_location |
The target file location |
file_suffix |
The file suffix |
latest_file_name |
The name of the last file processed for a specific file |
override_flag |
The override flag to re-process the file with the same name |
update_datetime |
The last updated date |
Conclusion
This brings to the end of the implementing Job metadata framework using Talend. The following are key takeaways from this blog:
- The need and the importance of Job metadata framework
- The data model to support the framework
- The customizable data model to support different types of job patterns.
As always – let me know if you have any questions below and happy connecting!
The post How to Implement a Job Metadata Framework using Talend appeared first on Talend Real-Time Open Source Data Integration Software.