Database Fundamentals


Session 2

Learning Objectives
  • Design and visualise an entity relationship diagram for a database
  • Identify starting points for overcoming project and technical problems
  • Implement quality control measures and know how to assure the quality of data uploads and query output
Data Modelling
Why use a Data Model?

All data objects required by the database are accurately represented

A data model helps design the database at the conceptual, physical and logical levels

Data model structure helps define relational tables, primary and foreign keys as well as stored procedures

It provides a clear picture of the base data

It can help identify missing or redundant data

Types of Data Models


Conceptual

Logical

Physical

Types of Data Models


Conceptual

Logical

Physical

An organised view of database concepts and their relationships. The purpose is to establish:

Entities

Attributes

Relationships

customer
Customer Name Customer Number

Sale

product
Product Name Product Price

Characteristics of a Conceptual Data Model


Offers organisation wide coverage of business concepts

Designed and developed for business audiences

Developed independently of hardware specifications like storage capacity, or software specifications like DBMS technology. The focus is to represent data as a user would see it in the 'real world'

Types of Data Models


Conceptual

Logical

Physical

Used to define the structure of data elements and set relationships between them. This type of model:

Adds further information to the conceptual data model

Provides a foundation for a physical model (yet retains a generic structure)

Does not require keys, you just need to verify the connector details set for earlier relationships

customer
Customer Name Customer Number string
 
integer

Sale

product
Product Name Product Price string
 
decimal

Common Data Types

string/character/varchar

integer

number/float/decimal

datetime

boolean

Characteristics of a Logical Data Model


Describes data needs for a single project but could integrate with other logical data models based on the scope of the project

Designed and developed independently from the DBMS

Data attributes will have data types with exact precisions and length

Normalisation processes to the model is applied typically till 3NF

Types of Data Models


Conceptual

Logical

Physical

Describes a database specific implementation of the data model. This type of model:

Offers database abstraction and helps generate schema through the rich meta-data

Helps visualise database structure by replicating column keys, constraints and other RDBMS features

customer
Customer Name Customer Number string
integer
Primary Key Customer Number

Sale

product
Product Name
Product Price
string
decimal
Primary Key Product Name

Characteristics of a Physical Data Model


Contains relationships between tables

Developed for a specific version of a DBMS, location, data storage or technology to be used in a project

Columns should have exact data types, lengths and default values assigned

Primary and foreign keys, views, indices, access profiles and authorisations are defined

Entity Relationship Diagrams
An Entity Relationship Diagram (ERD) lets you see how different objects (e.g. customers, products) relate to each other in a database.

Components

Entity

Relationship

Attribute

Components

Entity

Relationship

Attribute

A defined object within a database (e.g. customers, products, contractors, etc)


Entity
Attribute1
Attribute2
Attribute3

Strong Entity


Has a primary key and does not depend on another entity to exist

E.g. student information

Weak Entity


Depends on another entity to exist

I.e. the primary key is a foreign key in another table

E.g. student enrolment information

Components

Entity

Relationship

Attribute

Defines how two entities are related to each other

Represented as lines with a "cardinality" that explains the number of instances between two entities

One to One Relationship


One record of an entity is directly related to another record of an entitiy

One to Many Relationship


One record of an entity is related to one or more records of another entity

Many to Many Relationship


Many records of one entity can be related to many records of another entity

One or Zero to to Many Relationship


One record of an entity is related to zero, one or more records of another entity

Components

Entity

Relationship

Attribute

A property of an entity or something that can be used to describe an entity

Attribute Type Description Example
Simple Cannot be split into other attributes First name, surname
Composite Can be split into other attributes Name (can be split into forename, middle name and surname)
Derived Calculated or determined from another attribute Age of record calculated from creation date

Barker Notation

DIY Database Design

Spotify

What is its purpose?

Stream music based on artists, albumns, playlists or genres

What is the user input?

Track title, album title, artist name or playlist title

What will the user receive?

Music content

Spotify

What database tables are required?

Artists

Albums

Playlists

Tracks

Users

Activity


The state of Iowa wants to design a database that records alcohol sales from all stores in the state.

Using the information on the following slide, design an ERD to show how how the database should be structured.

Entity Transaction Product Store County
Attributes ID
Date
item_no
store_id
item_no
description
case_cost
proof
store_id
store_name
address
county
county
population
Overcoming Problems

Debugging!

Technical things not doing what you expect or need them to do can be a very frustrating experience

What experience do you have with debugging?

Read

Rubberduck

Ask

blank

Check the error message, what does the software say the problem is?

Read

Rubberduck

Ask

blank

Go through the code line by line and explain to a 'rubber duck' what is going on and try and spot the error

Read

Rubberduck

Ask

blank

Talk to an expert in your organisation, or search for the solution on StackOverflow

To improve accuracy, quality and usefulness any errors should be corrected, imputed or rejected

Activity


In groups choose a couple of the scenarios on the next slide and discuss how you would mitigate or solve them

Data isn't available as planned
A team member has not delivered what you need
Technical failure resulting in work lost
Project is taking longer than forecast
Critical team member leaves the company suddenly
Key stakeholder changes the scope mid project
Client wants delivery deadline brought forward
Becomes clear you cannot deliver what was promised to client

Form an Action Plan


1. Confirm and identify the problem

2. Quantify and describe- what is the impact?

3. What are the potential next steps?

4. Communicate to stakeholders and confirm desired next steps

5. Adjust plan and circulate

6. Continue to new plan, repeat if necessary

Can this problem be solved?

Do one thing at a time

Focus on what you can control

Sometimes doing nothing is best

Quality Control

What issues might we have creating a table with these field names?

Subscribers
subID
firstName
surname
email
birthday

Truncation

Corruption

Missing Data

Data Types

Translation

blank

Precision is lost when the data is stored somewhere too short to hold its entire length

Truncation

Corruption

Missing Data

Data Types

Translation

blank

Commas, apostrophes and other delimeters are moved/lost maing the table unreadable

Truncation

Corruption

Missing Data

Data Types

Translation

blank

Rows go missing when only a portion of the data is uploaded

Truncation

Corruption

Missing Data

Data Types

Translation

blank

Data mismatch from wrong data types being entered into fields

Truncation

Corruption

Missing Data

Data Types

Translation

blank

Encoding is wrong or symbols or rich characters are lost

Common tools to help us be confident in our data accuracy

Count the data

Spot checks

Calculate aggregates

Export comparison

Activity


In groups discuss:

  • Why is quality control important?
  • Whose responsibility is quality control?
  • When should quality control occur?
  • How should quality control be recorded?
  • What should you if somehting fails quality control?
6 Core Data Quality Dimensions
Completeness No missing values
Uniqueness Data only stored in one place
Timeliness Data should be reasonably up to date
Validity Conforms to the correct syntax
Accuracy Value stored is correct for its occurrence
Consistency Form and content of field is always the same

In summary...

Quality control gives us confidence in results

Simple techniques can be used to check uploads, simple queries can be used to check results

Checking the quality as you go is easier and more efficient than doing it at the end

Code can be designed to be more robust

Performance Techniques

Performance issues can arise due to...

Too much data

Inefficient queries

Over complex queries (too many joins)

Database is not normalised properly

What is the quickest way to find a topic in an encyclopedia?

Indexing


Every record in a database is assigned an index

Data is retrieved by referencing the index

Index
1
2
3
4
...
25
26
Index Letter Phonetic
1AAlpha
2BBravo
3CCharlie
4DDelta
.........
25EYankee
26FZulu

Denormalisation


A database optimisation technique where redundant data is added to one or more tables to help avoid costly joins in a relational database.

Denormalisation

Pros

  • Data retrieval is faster through fewer joins
  • Queries are simpler through looking at fewer tables

Cons

  • Updates and inserts are more expensive and harder to write
  • Data may be inconsistent
  • Data redundancy requires more storage
date month sales country code continent
08/06/2020 6 791 France FR Europe
08/06/2020 6 582 Canada CA North America
08/06/2020 6 915 Egypt EG Africa
08/06/2020 6 787 Norway NO Europe
country wins draws losses win %
France 6 1 0 85.7%
Croatia 6 0 1 85.7%
Belgium 6 0 1 85.7%
England 4 0 3 57.1%
Database Maintenance

Maintaining a database is critical to ensuring a database environment performs reliably and efficiently. We need these tasks to:


Increase performance

Free up disk space

Check for data errors

Check for hardware faults

Log File Maintenance


Log files are invaluable for diagnosing problems in your database and should therefore be saved.

They can however be quite large (especially at high debug levels) and so should be discarded after a reasonable period of time.

Read more here.

Fragmentation


Indexes can become fragmented as new data is introduced which can affect performance

id name
1 Jane
2 Peter
3 Kim
4 Olivia
id name
1 Jane
   
3 Kim
4 Olivia
5 Suresh
id name
1 Jane
   
3 Kim
   
5 Suresh

Other Techniques:


Data Compaction

Integrity Check

Backing Up

 

Compress the data so it can be stored more efficiently and quickly

Other Techniques:


Data Compaction

Integrity Check

Backing Up

 

Ensure relationships between entities have been observed

Other Techniques:


Data Compaction

Integrity Check

Backing Up

 

Regularly back your data up!

Recap
Learning Objectives
  • Design and visualise an entity relationship diagram for a database
  • Identify starting points for overcoming project and technical problems
  • Implement Quality Control measures and know how to assure the quality of data uploads and query output
Assignment
Database Design
Use a work-related dataset to design your own relational database. You should describe the dataset, follow the normalisation steps and create an Entity Relationship Diagram (ERD).
Word Count Max 1500 words
Deadline 3 weeks
Deliverables Word Document, PowerPoint, Excel File, PDF, Lucid Chart
Complete Session Attendance Log and Update Your OTJ