ETL Interview Questions
I've always had an interest in more of the Business Intelligence aspects of development, as my first real job was as an application support analyst.
- What is ETL?
ETL is an acronym for Extraction, Transformation and Loading. It the process of converting and loading data from one given source to another. - What are the steps that define ETL?
Defining the source (setting the source connection to the source)
Defining the target (setting the target connection to its target source)
Creating the mapping (defining and setting the business logic)
Create the session (defining the mapping instructions)
Create the work flow (setting the instruction sets) - What is a 3-tier data warehouse?
Here, the data is thought of as a 3 tiered system. The 1st layer is considered the source layer.
The 2nd layer is the integration layer, where 1st layer transformations are stored.
The 3rd layer is the dimension layer - this is where the presentation layer is accessed. - What is a snapshot?
A snapshot is a read-only copy of the master data.
- What is a materialized View?
A materialized view is simply an aggregate table, or a database view object. - What is round-robin processing?
Round robin means that we are processing more than one source or thread. Instead of reading or processing in a linear fashion, we have divided the source data into n units and, in turn, pull one element from each source.
Round robin also means that data is distributed evenly to n number of partitions. - What is hash partitioning?
Hash partitioning means that we create a hash key for each row in a source data set and process the data by the generated hash key.
Hash partitioning also means that data is distributed evenly via a generated hash key for each data element.
A hash key is essentially a small data value meant to represent a larger piece of data. - What is an Operational Data Store?
An Operational Data Store (ODS) is a repository between the staging area and the data warehouse. - What is the difference between a Full Load, an Incremental Load and an Initial Load?
Initial Load - Data is loaded into the target source for the first time.
Full Load - This is the equivalent of a new load process. All existing data is erased and fresh data is loaded into the store.
Incremental Load - Only changes to the target source are made. This is generally a scheduled process. - What is a factless fact schema?
A factless fact schema is a fact table without measures. That is, the facts represented in the table are likely computations (statistics, for example) and is more akin to a report table.