Data Analysis in Industry: Lesson Plan


6 Hours


Contents


Section
Overview
Session Materials
Prerequisites
Learning Objectives
Assignment
Standards Covered
KM2 Data Anlaysis Concepts
Recordings
Session Outline
Additional Resources
Coach Notes

Overview


This module introduces the concepts of relational databases and how they should be designed and maintained. There is also discussion around other common database types, chiefly NoSQL. In the second half we will focus on data integration and data warehouses to explore how they can be used in our analysis.



Materials


Session 1 Slide Deck

Session 2 Slide Deck


Prerequisites




Learning Objectives




Assignment


Task


Use a work-related dataset to design your own relational database. You should describe the dataset, follow the normalisation steps and create an Entity Relationship Diagram (ERD).


Things to Consider




Standards Covered




Recordings (coach use only)


Link

Session Outline


Section Approx. Timing
Class Introduction 10 Minutes
Introduction to Databases 10 Minutes
Relational Databases 15 Minutes
Keys 20 Minutes
Break 15 Minutes
Normalisation 60 Minutes
NoSQL 10 Minutes
Other Types of Database 20 Minutes
Session 1 Recap 10 Minutes
Class Introduction 10 Minutes
Data Modelling 15 Minutes
Entity Relationship Diagrams 20 Minutes
DIY Database Design 25 Minutes
Break 10 Minutes
Data Warehouse 15 Minutes
Data Integration 10 Minutes
Types of Data Integration 5 Minutes
Data Integration Life Cycle 10 Minutes
Rules and Policies 5 Minutes
Extract, Transform, Load 25 Minutes
Combining Data 10 Minutes
Session 2 Recap 20 Minutes


Additional Resources




Coach Notes


Topic Class Introduction Duration 10 Minutes
Objectives
  • To provide an overview of the class agenda and the expected learning objectives
Notes

Coach welcome apprentices to lesson and run through the session outline and learning objectives. Coach can run an ice breaker from here .

Contents

Topic Introduction to Databases Duration 10 Minutes
Objectives
  • Understand basic concepts of a database
Notes

In a module about databases it is important apprentices understand the basic concepts. In this section you will be guiding apprentices through the basic definition of a database management system, the key concepts and key people involved in building and maintaining them. Ask apprentices what experience they have with databases to assess baseline understanding. In this module they will be learning primarily about relational databases but we will also have a short amount of time for NoSQL databases too.

Contents

Topic Relational Databases Duration 15 Minutes
Objectives
  • Understand the basic concepts of a RDBMS
Notes

Most of the data that apprentices will use will likely come from a RDBMS, it is structured, efficient and allows for improved analytics by providing opportunities for querying. Often SQL will be used to access the information, but that is not always true. Later on in the session we will discuss how to normalise a database, but in this section it is important apprentices are aware of key terms such as keys, normalisation, relationships and data redundancy. Use the 'bad example' to assess apprentices ability to spot pain points which will be discussed later.

Contents

Topic Keys Duration 20 Minutes
Objectives
  • Describe the different types of key in a RDBMS
Notes

Keys are how relationships are defined between tables in a RDBMS and are important for apprentices to identify when thinking about joins. The three types of key we will be looking at are primary (unique identifiers for each record), foreign (a primary key in another table, establishes the relationship) and composite (when two or more fields are combined to make a primary key). In the exercise apprentices will look at a table and try and identify the logical keys, use this to reinforce that primary keys must be unique and sometimes you may need to add a new field (e/g/ rowID) to create this key. Issues around normalisation can be explored at this point, for example, are the apprentices introducing redundant data?

Contents

Topic Normalisation Duration 50 Minutes
Objectives
  • Understand the principles behind normalisation
  • Modify a table so it is in 3NF
Notes

A RDBMS is efficient because it optimises storage through creating several smaller tables with defined relationships, rather than one large table where information is repeated or missing (which takes up storage space). Make sure apprentices are aware of what happens at each stage and why it is important in the context of optimising the database- lower storage leads to quicker querying which leads to faster insights. Go through the example slowly, asking apprentices to spot non compliance with NF and what they think can be done to fix it. Examples are interspersed of redudant data and how they can be sorted. Apprentices often get confused with the difference between transitive and partial dependency, there is a slide later in the section to address this. Partial occurs only when there is a composite key and transitive indicates redundant data. Give a good 15 minutes at least for the exercise so apprentices can discuss and experiment, understanding what constitutes normal form.

Contents

Topic NoSQL Duration 10 Minutes
Objectives
  • Understand the different types of NoSQL database
Notes

The RDBMS may be the most common database type, but NoSQL is becoming more and more popular. In this section you will help apprentices understand four common types of database, how they are used and why they are useful. They key takeaway is that NoSQL databases can store unstructured data and also (sometimes) remove the need for joins which speeds up data retrieval. It is possible some organisations we work with have started to implement NoSQL databases, so encourage apprentices to share their experience or research them more.

Contents

Topic Other Types of Database Duration 20 Minutes
Objectives
  • Justify the use of Big Data in analysis by considering the benefits and drawbacks
Notes

Other than RDBMS and NoSQL, there are other ways data may be stored in an organisation that apprentices need to be aware of. If their organisation does use them, they need to be able to describe what they are used for and why they are useful in their portfolio. First you will go through historical types of database (network, hierarchical, multi-dimensional and object orientated) to explain key features and why they may be useful. You will then follow this up with a brief introdution to data warehouses. Do not spend too much time on this, as it is covered again in module 4. The key take away is that data warehouses allow for advanced analytics to be carried out on a unified data source within an organisation.

Contents

Topic Recap Duration 15 Minutes
Objectives
  • To recap the day
Notes

Go through the learning objectives again and show apprentices the assignments and check for their understanding. You may want to create a kahoot quiz, but if you do please make sure you have submitted a create resource request first. Remind apprentices to complete session attendance log and update their OTJT.

Contents

Topic Class Introduction Duration 10 Minutes
  • To provide an overview of the class agenda and the expected learning objectives

Coach welcome apprentices to lesson and run through the session outline and learning objectives. Coach can run an ice breaker from here .

Contents

Topic Data Modelling Duration 15 Minutes
Objectives
  • Explain the difference between a conceptual, logical and physical model
Notes

Session 2 focusses more on the design and maintenance of a DBMS after the previous introduction to them. In this section you will be guiding apprentices through the basic design process, from concept to fleshing them out logically and finally considering what the final DBMS will look like. Each stage has its own requirements and considerations which you should go through. There is a brief introduction to common data types, keep this low level as it will be explored in more detail during the SQL bootcamp. Apprentices just need to know what we mean by 'string', not anything more detailed like how they used or analysed.

Contents

Topic Entity Relationship Diagrams Duration 20 Minutes
Objectives
  • Explain why ERDs are useful
Notes

The final form of any data model is the ERD, as it shows within the DBMS the different entities (tables) and how they are related. Go through the key features of an ERD and discuss with apprentices examples. The crow foot notation can be confusing, so make sure you are clear on the relationships in the examples and what they mean. The examples are built in Lucid Chart and embedded, they may be slow to load, so refresh the page if this happens. Give apprentices time to explain each ERD example so they are clear on the steps involved and how to define relationship.s

Contents

Topic DIY Database Design Duration 25 Minutes
Objectives
  • Design an ERD for a database
Notes

Don't spend too much time on the Spotify example, but use it to reinforce the process of building an ERD. Give apprentices 15 minutes to design the ERD from the example and time for feedback and discussion.

Contents


Topic Data Warehouse Duration 15 Minutes
Objectives
  • Introduce Data Warehouses
  • Understand the difference between a Data Warehouse and a Database
Notes

This module has been about using databases to help with our analysis. Apprentices also need to be aware of data warehouses and how they are used in analysis. In this section, apprentices will be introduced to the concept of warehouses, their benefits and how they differ from a database. Terms such as OLAP and OLTP will be defined. If apprentices use warehouses currently, find out what they use them for and what they think the advantages are.

Contents

Topic Data Integration Duration 10 Minutes
Objectives
  • Define Data Integration
Notes

In this section you will be defining what data integration means and leading a discussion around why it is important in data management. These concepts will be appearing in the KM1 exam as well. You will also be showing apprentices examples of data sources as well as discussing the benefits of data integration.

Contents

Topic Types of Data Integration Duration 5 Minutes
Objectives
  • Understand the difference between batch and real time integration
Notes

In this short section you will be showing the difference between batch and real time integration so apprentices can appreciate the processes and imagine benefits and challenges of each.

Contents

Topic Data Integration Life Cycle Duration 10 Minutes
Objectives
  • Understand and explain the data integration life cycle
Notes

In this section you will be explaining the end to end process of how someone might carry out data integration, from selecting and profiling the data, understanding various requirements and designing the system to testing to make sure it works. This section incorporates vertical slides so you can lead discussions around each stage.

Contents


Topic Rules and Policies Duration 5 Minutes
Objectives
  • Consider methods to follow for keeping data integration accurate and safe
Notes

When using a data integration process it is important to consider rules and policies to ensure it works and is safe. Guide apprentices to consider what sort of rules around access, ensuring clean data and that the process has worked should be put into place. Link this back to GDPR and why data security is essential.

Contents

Topic ETL Duration 25 Minutes
Objectives
  • Understand the ETL process
Notes

Extract, Transform, Load is an important process which is followed when unifying data into a warehouse. In this section you will guide apprentices through the process by first aksing them to consider how to count up all the words in a library. This is to get them to think about the difficulties in doing analysis when your data is spread across multiple locations. You will then explain the concept of ETL and how it works. Important things to point out- data is never removed or changed in ETL, the extracted data is always a copy. Also, apprentices may get confused about verification and validation. Verification is an internal process where you are checking the accuracy of your dataset (i.e. does your warehouse contain what it should?). Validation is an external process where you check against requirements of other systems (i.e. does the source files conform to what the target warehouse needs?). Validation happens during te extraction process and verification happens post loading.

Contents

Topic Combining Data Duration 10 Minutes
Objectives
  • Explain how to combine data in your portfolio
Notes

This section is aimed to help apprentices think about their portfolio. If they ever combined data (through integration, joins, ETL, etc) they need to explain what they did, why they did it and any problems they encountered. You could run a pop quiz, and ask apprentices what they would include if they are asked to describe a particular process.

Contents

Topic Recap Duration 20 Minutes
Objectives
  • Recap the day
Notes

Go through the learning objectives and assignments and clarify anything apprentices may still not understand. Go to Kahoot! and run the module 1 Kahoot quiz (apprentices can download the app on their phone, or just use their browser). Make sure apprentices update tehir OTJ and complete the SAL.

Contents