Data quality rules
using dimensions
I N T R O D U C T I O N TO DATA Q U A L I T Y
Chrissy Bloom
Head of Enterprise Data Strategy &
Governance
INTRODUCTION TO DATA QUALITY
Data quality rules
Data quality rule: type of business rule that
validates whether data meets business
requirements
Can be defined at:
Dataset level
Data element level
INTRODUCTION TO DATA QUALITY
Completeness data quality rule
Dataset data quality rule: All expected
records from the source must be loaded in the
target table.
Data element data quality rule: All records
must have a Customer ID, Customer Name,
and Customer Account Type populated.
INTRODUCTION TO DATA QUALITY
Timeliness data quality rule
Dataset data quality rule: All records in the
customer dataset must be loaded by the 900
am.
Data element data quality rule: All records
must have a tax ID populated by first time the
customer's account status is "Open".
INTRODUCTION TO DATA QUALITY
Validity data quality rule
Data element data quality rule:
All records must have a Birth Date value in
the format MM/DD/YYY and the value
must be in the past.
All records must have an Account Status of
Open, Closed, or Pending.
INTRODUCTION TO DATA QUALITY
Consistency data quality rule
Dataset data quality rule: The count of
records loaded today must be within +/- 5%
of the count of records loaded yesterday.
Data element data quality rule: All Customer
ID values in the AccountTable must also be
present in the CustomerTable.
INTRODUCTION TO DATA QUALITY
Accuracy data quality rule
Data element data quality rule: All records in
the CustomerTable must have accurate
Customer Name, Customer Birthdate, and
Customer Address fields when compared to
the Tax Form.
INTRODUCTION TO DATA QUALITY
Uniqueness data quality rule
Data element data quality rule:
All records must have a unique Customer
ID.
All records must have a unique
combination of customer name, customer
birth date, and customer address fields.
Let's practice!
I N T R O D U C T I O N TO DATA Q U A L I T Y
Data profiles
I N T R O D U C T I O N TO DATA Q U A L I T Y
Chrissy Bloom
Head of Enterprise Data Strategy &
Governance
INTRODUCTION TO DATA QUALITY
What is data profiling?
Data profiling: The activity of running
statistics on a data set to better understand
the data and field dependencies
Examples:
How many records are in the data set?
What are the min and max values for a
particular data element?
How many records have a particular data
element populated?
When column A is populated, what other
columns are also populated?
INTRODUCTION TO DATA QUALITY
Importance of data profiling
Data profiling:
Confirms what you already know
Reveals what you don't know
Identifies data quality issues
Aids in writing better data quality rules
INTRODUCTION TO DATA QUALITY
What does a data profile look like?
INTRODUCTION TO DATA QUALITY
Customer ID data profile
INTRODUCTION TO DATA QUALITY
Customer Name data profile
INTRODUCTION TO DATA QUALITY
Customer Birth Data data profile
INTRODUCTION TO DATA QUALITY
Customer Account Type data profile
INTRODUCTION TO DATA QUALITY
Using a data profile in data quality
All CustomerID values must be 11 numeric
characters.
All CustomerFirstName values must be 1 -
20 character string of text.
All CustomerLastName values must be 1 -
30 character string of text.
All CustomerBirthDate values must be in
the MM/DD/YYYY format and between
01/01/1900 and 99/99/9999.
All CustomerAccountType values must be
Loan, Deposit, Loan and Deposit, or Credit
Card.
Let's practice!
I N T R O D U C T I O N TO DATA Q U A L I T Y
Metadata and data
quality
I N T R O D U C T I O N TO DATA Q U A L I T Y
Chrissy Bloom
Head of Enterprise Data Strategy &
Governance
INTRODUCTION TO DATA QUALITY
What is metadata?
Metadata: data about data, or attributes that
describe data
Used to organize and understand datasets
and data elements
Used in the data quality process to
determine the:
definition of a field
owner of a field
field's last update date
INTRODUCTION TO DATA QUALITY
Metadata examples
Metadata can be found in a data dictionary.
Examples:
Business field name
Business definition
Data owner
Technical physical field name
INTRODUCTION TO DATA QUALITY
What is data lineage?
Data lineage: A representation of how data
moves in a pipeline, from where the data is
entered in the source through each step in the
data pipeline, until it is consumed.
Each layer has its own metadata
Used in the data quality process to
determine where to implement a data
quality rule
INTRODUCTION TO DATA QUALITY
Data lineage example
INTRODUCTION TO DATA QUALITY
Metadata and data lineage example
INTRODUCTION TO DATA QUALITY
Metadata and data lineage example bad practice
INTRODUCTION TO DATA QUALITY
Metadata and data lineage examplebest practice
Let's practice!
I N T R O D U C T I O N TO DATA Q U A L I T Y
Data quality issues
triage
I N T R O D U C T I O N TO DATA Q U A L I T Y
Chrissy Bloom
Head of Enterprise Data Strategy &
Governance
INTRODUCTION TO DATA QUALITY
Using data lineage and metadata
INTRODUCTION TO DATA QUALITY
Step 1: Review the data profile
INTRODUCTION TO DATA QUALITY
Step 2: Identify where the rule is running
INTRODUCTION TO DATA QUALITY
INTRODUCTION TO DATA QUALITY
Step 4: Correct the issue
Let's practice!
I N T R O D U C T I O N TO DATA Q U A L I T Y