Data projects design that reflects ETL approach

Azure approach that reflects ETL;

  • Source: Identify the source systems to extract from.

In Azure, data sources include Azure Cosmos DB, Azure Data Lake, files, and Azure Blob storage

  • Ingest: Identify the technology and method to load the data.

During a load, many Azure destinations can accept data formatted as a JavaScript Object Notation (JSON), file, or blob. You might need to write code to interact with application APIs

Azure Data Factory offers built-in support for Azure Functions. You’ll also find support for many programming languages, including Node.js, .NET, Python, and Java. Although Extensible Markup Language (XML) was common in the past, most systems have migrated to JSON because of its flexibility as a semistructured data type.

  • Prepare: Identify the technology and method to transform or prepare the data

The most common tool is Azure Data Factory, which provides robust resources and nearly 100 enterprise connectors. Data Factory also allows you to transform data by using a wide variety of languages.

  • Analyze: Identify the technology and method to analyze the data.
  • Consume: Identify the technology and method to consume and present the data.

In traditional descriptive analytics projects, we might have transformed data in Azure Analysis Services and then used Power BI to consume the analyzed data. New AI technologies such as Azure Machine Learning services and Azure Notebooks provide a wider range of technologies to automate some of the required analysis.

You might find that you also need a repository to maintain information about your organization’s data sources and dictionaries. Azure Data Catalog can store this information centrally.

FavoriteLoadingAdd to favorites
Spread the love

Author: Shahzad Khan

Software developer / Architect