ETL

ETL with Alteryx: Converting XML to a relational dataset


Every day, companies accumulate more and more documentation, due to the increase in the number of processes that can be managed from an IT system. Thus, there is an even greater decentralisation of data-driven decision-making.

According to studies, 92% of companies invest in storage, analytics and artificial intelligence to leverage their information. However, only 19% believe that they make the right strategic, data-driven decisions (Data Driven companies).

The problem, in many cases, lies in the diverse provenance of the data. Data is collected from different sources, be it from a production and warehousing system, from different sales departments, or even from different finance and human resources departments.

In addition, not all data are in the same format (XML, JSON, XLS...), which makes it even more difficult to compare them. Thus, the need to create ETLs arises.

WHAT IS AN ETL?

An ETL(Extract, Transform, Load) is a tool capable of centralising information, extracting it from different sources, transforming all the desired criteria for better understanding or eliminating errors and then loading it into a destination data warehouse. 

These data movements can be programmed on a regular basis or activated to occur periodically. The types of projects for which ETL tools are used vary widely, as these tools are designed to be very flexible. 

Some common projects would be:

  • Migration of application data from old to new systems
  • Data integration of recent corporate mergers and acquisitions
  • Integration of data from external suppliers and partners
  • Collection of transactional data for business people to work with, commonly referred to as Data Marts or Data Warehouses.

Once an organisation opts for ETL, the company is "married" to that specific tool, as it will design a lot of logic into the data movement processes and become a vital part of managing its business. Therefore, the purchase and use of an ETL tool is a very strategic move, even if it is intended for a small tactical project to begin with.

WHAT ARE THE PHASES OF AN ETL?

Extraction Phase:

Most companies have numerous data sources and use a myriad of analytical tools to produce business intelligence. Extraction consists of bringing "raw" data from different sources and transferring it to a single repository..

Examples of data sources include:

  • Legacy systems
  • Relational databases
  • XML files
  • CRM systems
  • XLS files
  • Business applications
  • Analytics tools
  • JSON

Having different data sources represents a major challenge for organisations, as storing different types of information significantly increases data governance and storage costs.

Additionally, there are ecosystems that are not natively compatible with other solutions, which makes it even more difficult to weave information from different sources and in different formats together.

Transformation Phase:

The cleaning and organisation stage is the stage where all data from multiple source systems are standardised and converted into a convenient format, which improves their quality, governance and exploitation. During this stage, data can undergo various operations: cleansing, filtering, decoupling, merging, reorganisation, duplication.....

Some of the main advantages of automatic transformation over manual transformation is to ensure that global changes are made to datasets in a systematic way, e.g. special characters can be removed and the layout of data can be completely changed once the data transformation process is completed, duplicate items can be removed, field formatting can be changed, blank or redundant rows and columns can be removed, and only data that is interesting to store for later exploitation can be kept. 

Charging phase:

After having this new view, with a new perspective, the newly selected data must be loaded into a data warehouse.Loading a large amount of data into a data warehouse, for example, makes it easier to access and use the information, regardless of how many different types of data have gone through the ETL process.

The result is a clean set of data, structured and correctly classified, that is ready to be used, which will help organisations make critical business decisions and perform further analysis effectively.

WHAT ARE THE ADVANTAGES OF AN ETL?

  1. Enables a common data repository
  2. Improves decision making
  3. Enables comparison of sample data between source and target system 
  4. Helps improve productivity by coding and reusing without the need for additional technical expertise. ETL provides deep historical context for the business
  5. Helps identify meaningful patterns and insights
  6. Converts disparate data into a consistent format
  7. Helps derive business intelligence from the data 
  8. Contains easily usable components 
  9. Effortlessly manages complex transformation 
  10. Delivers maximum return on investment, i.e. roi

Although these tools can be programmed with any language (Java, C, C++, PHP...) they are mainly programmed in Python due to the large number of existing libraries, and are usually run with Apache Spark and PySpark. 

ALTERYX tool

However, there is an important gap between programmers and data analysts, since the former have the technical knowledge to carry it out, and the latter are the ones who must "play" with the data to reach valid conclusions for business development. This means a coordination and efficiency loss worth millions of € per year that you can calculate thanks to the Alteryx APA Platform™..

This is why in recent years, low-code tools have emerged that have revolutionised ETL processes, giving end users (data analysts) the power to perform their own extractions, transformations and data loading for subsequent analysis. 

These tools go further, giving the possibility to apply different modules that allow the analysis through Artificial Intelligence (machine and Deep learning) and obtain the conclusions.

In this video we show how in less than 5 minutes we create an ETL for the extraction of an XML dataset and subsequent transformation and loading to a relational database with our ALTERYX tool.

Alteryx facilitates disparate data access, analytics and data science through a code-free, code-friendly platform It empowers business analysts to automate their entire analytics and process flows in the context of the desired business outcome.

The platform has an intuitive drag-and-drop interface that allows for connectivity with different sources of information and its subsequent transformation and analysis. 

Alteryx is present in all areas of the company:

HRFINANCEMARKETING & SALESOPERATIONSSUPPLY CHAIN
Employee retentionTax automationSegmentationIncident managementDemand forecasting
EGSAudit and complianceForecastResponse time analysisInventory optimisation
RecruitmentDaily reconciliation360 ClientsCall centre analysisPredictive maintenance
PerformanceFinancial planning and analysisRotationCustomer sentimentCapacity planning
Management

DO YOU WANT TO KNOW MORE?

IMPORTANT: Read our Privacy Policy before proceeding. The information you provide may contain personal information.

RELATED NEWS

More news...