Data Analysis in Industry: Lesson Plan
6 Hours
Contents
| Section |
|---|
| Overview |
| Session Materials |
| Prerequisites |
| Learning Objectives |
| Assignment |
| Standards Covered |
| KM2 Data Anlaysis Concepts |
| Recordings |
| Session Outline |
| Additional Resources |
| Coach Notes |
Overview
This module introduces the concepts of relational databases and how they should be designed and maintained. There is also discussion around other common database types, chiefly NoSQL. In the second half we will focus on data integration and data warehouses to explore how they can be used in our analysis.
Prerequisites
Learning Objectives
Assignment
Task
Use a work-related dataset to design your own relational database. You should describe the dataset, follow the normalisation steps and create an Entity Relationship Diagram (ERD).
Things to Consider
Standards Covered
Session Outline
| Section | Approx. Timing |
|---|---|
| Class Introduction | 10 Minutes |
| Introduction to Databases | 10 Minutes |
| Relational Databases | 15 Minutes |
| Keys | 20 Minutes |
| Break | 15 Minutes |
| Normalisation | 60 Minutes |
| NoSQL | 10 Minutes |
| Other Types of Database | 20 Minutes |
| Session 1 Recap | 10 Minutes |
| Class Introduction | 10 Minutes |
| Data Modelling | 15 Minutes |
| Entity Relationship Diagrams | 20 Minutes |
| DIY Database Design | 25 Minutes |
| Break | 10 Minutes |
| Data Warehouse | 15 Minutes |
| Data Integration | 10 Minutes |
| Types of Data Integration | 5 Minutes |
| Data Integration Life Cycle | 10 Minutes |
| Rules and Policies | 5 Minutes |
| Extract, Transform, Load | 25 Minutes |
| Combining Data | 10 Minutes |
| Session 2 Recap | 20 Minutes |
Coach Notes
| Topic | Class Introduction | Duration | 10 Minutes |
|---|---|---|---|
| Objectives | |||
|
|||
| Notes | |||
| Coach welcome apprentices to lesson and run through the session outline and learning objectives. Coach can run an ice breaker from here . |
|||
| Topic | Introduction to Databases | Duration | 10 Minutes |
|---|---|---|---|
| Objectives | |||
|
|||
| Notes | |||
|
In a module about databases it is important apprentices understand the basic concepts. In this section you will be guiding apprentices through the basic definition of a database management system, the key concepts and key people involved in building and maintaining them. Ask apprentices what experience they have with databases to assess baseline understanding. In this module they will be learning primarily about relational databases but we will also have a short amount of time for NoSQL databases too. |
|||
| Topic | Relational Databases | Duration | 15 Minutes |
|---|---|---|---|
| Objectives | |||
|
|||
| Notes | |||
|
Most of the data that apprentices will use will likely come from a RDBMS, it is structured, efficient and allows for improved analytics by providing opportunities for querying. Often SQL will be used to access the information, but that is not always true. Later on in the session we will discuss how to normalise a database, but in this section it is important apprentices are aware of key terms such as keys, normalisation, relationships and data redundancy. Use the 'bad example' to assess apprentices ability to spot pain points which will be discussed later. |
|||
| Topic | Keys | Duration | 20 Minutes |
|---|---|---|---|
| Objectives | |||
|
|||
| Notes | |||
|
Keys are how relationships are defined between tables in a RDBMS and are important for apprentices to identify when thinking about joins. The three types of key we will be looking at are primary (unique identifiers for each record), foreign (a primary key in another table, establishes the relationship) and composite (when two or more fields are combined to make a primary key). In the exercise apprentices will look at a table and try and identify the logical keys, use this to reinforce that primary keys must be unique and sometimes you may need to add a new field (e/g/ rowID) to create this key. Issues around normalisation can be explored at this point, for example, are the apprentices introducing redundant data? |
|||
| Topic | Normalisation | Duration | 50 Minutes |
|---|---|---|---|
| Objectives | |||
|
|||
| Notes | |||
|
A RDBMS is efficient because it optimises storage through creating several smaller tables with defined relationships, rather than one large table where information is repeated or missing (which takes up storage space). Make sure apprentices are aware of what happens at each stage and why it is important in the context of optimising the database- lower storage leads to quicker querying which leads to faster insights. Go through the example slowly, asking apprentices to spot non compliance with NF and what they think can be done to fix it. Examples are interspersed of redudant data and how they can be sorted. Apprentices often get confused with the difference between transitive and partial dependency, there is a slide later in the section to address this. Partial occurs only when there is a composite key and transitive indicates redundant data. Give a good 15 minutes at least for the exercise so apprentices can discuss and experiment, understanding what constitutes normal form. |
|||
| Topic | NoSQL | Duration | 10 Minutes |
|---|---|---|---|
| Objectives | |||
|
|||
| Notes | |||
|
The RDBMS may be the most common database type, but NoSQL is becoming more and more popular. In this section you will help apprentices understand four common types of database, how they are used and why they are useful. They key takeaway is that NoSQL databases can store unstructured data and also (sometimes) remove the need for joins which speeds up data retrieval. It is possible some organisations we work with have started to implement NoSQL databases, so encourage apprentices to share their experience or research them more. |
|||
| Topic | Other Types of Database | Duration | 20 Minutes |
|---|---|---|---|
| Objectives | |||
|
|||
| Notes | |||
|
Other than RDBMS and NoSQL, there are other ways data may be stored in an organisation that apprentices need to be aware of. If their organisation does use them, they need to be able to describe what they are used for and why they are useful in their portfolio. First you will go through historical types of database (network, hierarchical, multi-dimensional and object orientated) to explain key features and why they may be useful. You will then follow this up with a brief introdution to data warehouses. Do not spend too much time on this, as it is covered again in module 4. The key take away is that data warehouses allow for advanced analytics to be carried out on a unified data source within an organisation. |
|||
| Topic | Recap | Duration | 15 Minutes |
|---|---|---|---|
| Objectives | |||
|
|||
| Notes | |||
| Go through the learning objectives again and show apprentices the assignments and check for their understanding. You may want to create a kahoot quiz, but if you do please make sure you have submitted a create resource request first. Remind apprentices to complete session attendance log and update their OTJT. |
|||
| Topic | Class Introduction | Duration | 10 Minutes |
|---|---|---|---|
|
|||
| Coach welcome apprentices to lesson and run through the session outline and learning objectives. Coach can run an ice breaker from here . |
|||
| Topic | Data Modelling | Duration | 15 Minutes |
|---|---|---|---|
| Objectives | |||
|
|||
| Notes | |||
|
Session 2 focusses more on the design and maintenance of a DBMS after the previous introduction to them. In this section you will be guiding apprentices through the basic design process, from concept to fleshing them out logically and finally considering what the final DBMS will look like. Each stage has its own requirements and considerations which you should go through. There is a brief introduction to common data types, keep this low level as it will be explored in more detail during the SQL bootcamp. Apprentices just need to know what we mean by 'string', not anything more detailed like how they used or analysed. |
|||
| Topic | Entity Relationship Diagrams | Duration | 20 Minutes |
|---|---|---|---|
| Objectives | |||
|
|||
| Notes | |||
|
The final form of any data model is the ERD, as it shows within the DBMS the different entities (tables) and how they are related. Go through the key features of an ERD and discuss with apprentices examples. The crow foot notation can be confusing, so make sure you are clear on the relationships in the examples and what they mean. The examples are built in Lucid Chart and embedded, they may be slow to load, so refresh the page if this happens. Give apprentices time to explain each ERD example so they are clear on the steps involved and how to define relationship.s |
|||
| Topic | DIY Database Design | Duration | 25 Minutes |
|---|---|---|---|
| Objectives | |||
|
|||
| Notes | |||
|
Don't spend too much time on the Spotify example, but use it to reinforce the process of building an ERD. Give apprentices 15 minutes to design the ERD from the example and time for feedback and discussion. |
|||
| Topic | Data Warehouse | Duration | 15 Minutes |
|---|---|---|---|
| Objectives | |||
|
|||
| Notes | |||
|
This module has been about using databases to help with our analysis. Apprentices also need to be aware of data warehouses and how they are used in analysis. In this section, apprentices will be introduced to the concept of warehouses, their benefits and how they differ from a database. Terms such as OLAP and OLTP will be defined. If apprentices use warehouses currently, find out what they use them for and what they think the advantages are. |
|||
| Topic | Data Integration | Duration | 10 Minutes |
|---|---|---|---|
| Objectives | |||
|
|||
| Notes | |||
|
In this section you will be defining what data integration means and leading a discussion around why it is important in data management. These concepts will be appearing in the KM1 exam as well. You will also be showing apprentices examples of data sources as well as discussing the benefits of data integration. |
|||
| Topic | Types of Data Integration | Duration | 5 Minutes |
|---|---|---|---|
| Objectives | |||
|
|||
| Notes | |||
|
In this short section you will be showing the difference between batch and real time integration so apprentices can appreciate the processes and imagine benefits and challenges of each. |
|||
| Topic | Data Integration Life Cycle | Duration | 10 Minutes |
|---|---|---|---|
| Objectives | |||
|
|||
| Notes | |||
|
In this section you will be explaining the end to end process of how someone might carry out data integration, from selecting and profiling the data, understanding various requirements and designing the system to testing to make sure it works. This section incorporates vertical slides so you can lead discussions around each stage. |
|||
| Topic | Rules and Policies | Duration | 5 Minutes |
|---|---|---|---|
| Objectives | |||
|
|||
| Notes | |||
|
When using a data integration process it is important to consider rules and policies to ensure it works and is safe. Guide apprentices to consider what sort of rules around access, ensuring clean data and that the process has worked should be put into place. Link this back to GDPR and why data security is essential. |
|||
| Topic | ETL | Duration | 25 Minutes |
|---|---|---|---|
| Objectives | |||
|
|||
| Notes | |||
|
Extract, Transform, Load is an important process which is followed when unifying data into a warehouse. In this section you will guide apprentices through the process by first aksing them to consider how to count up all the words in a library. This is to get them to think about the difficulties in doing analysis when your data is spread across multiple locations. You will then explain the concept of ETL and how it works. Important things to point out- data is never removed or changed in ETL, the extracted data is always a copy. Also, apprentices may get confused about verification and validation. Verification is an internal process where you are checking the accuracy of your dataset (i.e. does your warehouse contain what it should?). Validation is an external process where you check against requirements of other systems (i.e. does the source files conform to what the target warehouse needs?). Validation happens during te extraction process and verification happens post loading. |
|||
| Topic | Combining Data | Duration | 10 Minutes |
|---|---|---|---|
| Objectives | |||
|
|||
| Notes | |||
|
This section is aimed to help apprentices think about their portfolio. If they ever combined data (through integration, joins, ETL, etc) they need to explain what they did, why they did it and any problems they encountered. You could run a pop quiz, and ask apprentices what they would include if they are asked to describe a particular process. |
|||
| Topic | Recap | Duration | 20 Minutes |
|---|---|---|---|
| Objectives | |||
|
|||
| Notes | |||
| Go through the learning objectives and assignments and clarify anything apprentices may still not understand. Go to Kahoot! and run the module 1 Kahoot quiz (apprentices can download the app on their phone, or just use their browser). Make sure apprentices update tehir OTJ and complete the SAL. |
|||