Relationship between Database Normalization and RDF

Relationship between Database Normalization and RDF#

Normalization - 1NF, 2NF, 3NF and 4NF, https://www.youtube.com/watch?v=UrYLYV7WSHM, User channel5567

Items: XBOX1, PS4, PSVITA

Bottom up#

Ausgangspunkt: Eine Tabelle. Ziel: mehrere normalisierte Tabellen.

NF 1#

nf1, Table Sales Records, 5:00, columns:

Cust ID, Cust Name, Shipping Address, Newsletter, Supplier, Supplier Phone, Price

NF 2#

nf2, 8:46

#C-CN-SA-N: #Cust ID, Cust Name, Shipping Address, Newsletter
#Item, Supplier, Supplier Phone, Price
C-I: Cust ID, Item (Table Transactions, keine ID)

NF 3#

Wir betrachten nur die rechte Hälfte … 2 Abbildungen, ähnliche Farbgebung wie im Video

2NF, 9:58, Tables:

Table #I-S-SP-P: #Item, has_Supplier, has_Supplier_Phone, has_Price

Wir haben hier eine einzige Tabelle, mithin eine Entity, mit 3 Attributen: Ein Item hat einen Supplier, ein Supplier Phone, einen Price …

Problem (9:58): wir entdecken eine logische Abhängigkeit zwischen den Attribute has_Supplier und has_Supplier_Phone , “seperate it out” … das Telefon des Suppliers ist nicht vom Item, sondern vom Supplier abhängig ; “obviously we have to sepererate it out” .. we will get:

3NF, Tables 12:06

Table #I-S-P: #Item, has_Supplier, has_Price
Table #S-SP: #Supplier, has_Supplier Phone

Was passiert hier? Wir erkennen, dass wir nicht nur eine einzige Entity Item haben, sondern dass eine weitere Entity Supplier gibt. Für diese legen wir eine eigene Tabelle an, und moven die Spalte `has_Supplier_Phone” in diese Tabelle.

Nachteil: Wenn wir in unsere Datenbank ein neues Item einpflegen, müssen wir bei Bedarf auch einen neuen Supplier einpflegen. Vorteil: Vermeidung von Inkonsistenzen wie z.B. zwei unterschiedliche Telefonnummern eine Providers.

ChatGPT#

(We simply upload the former image to ChatGPT withot any other question. This is the answer:)

This diagram is a visual representation of Third Normal Form (3NF) in database normalization. The key principle of 3NF is that:

All fields (columns) must depend only on the primary key, and nothing else.

Here’s how each table supports 3NF:

🟧 Customer Table

Primary Key: Cust ID
Other fields like Cust Name, Shipping Address, and Newsletter are dependent only on Cust ID.
Ensures no transitive dependencies (i.e., no column depends on a non-key column).

customers = [
    {"Cust_ID": "at_smith", "Cust_Name": "Alan Smith", "Shipping_Address": "35 Palm St, Miami", "Newsletter": "Xbox News"},
    {"Cust_ID": "roger25", "Cust_Name": "Roger Banks", "Shipping_Address": "47 Campus Rd, Boston", "Newsletter": "PlayStation News"},
    {"Cust_ID": "wilson44", "Cust_Name": "Evan Wilson", "Shipping_Address": "28 Rock Av, Denver", "Newsletter": "Xbox News"},
    {"Cust_ID": "wilson44", "Cust_Name": "Evan Wilson", "Shipping_Address": "28 Rock Av, Denver", "Newsletter": "PlayStation News"},
    {"Cust_ID": "am_smith", "Cust_Name": "Alan Smith", "Shipping_Address": "47 Campus Rd, Boston", "Newsletter": "PlayStation News"},
]

🟥 Customer-Item Relationship Table

Composite Primary Key: Cust ID + Item
Shows which customer bought or subscribes to which item.
Maintains many-to-many relationships between customers and items.

customer_items = [
    {"Cust_ID": "at_smith", "Item": "Xbox One"},
    {"Cust_ID": "roger25", "Item": "PlayStation 4"},
    {"Cust_ID": "wilson44", "Item": "Xbox One"},
    {"Cust_ID": "wilson44", "Item": "PS Vita"},
    {"Cust_ID": "am_smith", "Item": "PlayStation 4"},
]

🟨 Item Table

Primary Key: Item
Has a Foreign Key to the Supplier.
Price is dependent only on Item, not on Supplier, preserving 3NF.

items = [
    {"Item": "Xbox One", "Supplier": "Microsoft", "Price": 250},
    {"Item": "PlayStation 4", "Supplier": "Sony", "Price": 300},
    {"Item": "PS Vita", "Supplier": "Sony", "Price": 200},
]

🟩 Supplier Table

Primary Key: Supplier
Supplier Phone is directly dependent on the Supplier.

suppliers = [
    {"Supplier": "Microsoft", "Supplier_Phone": "(800) BUY-XBOX"},
    {"Supplier": "Sony", "Supplier_Phone": "(800) BUY-SONY"},
]

✅ How It Meets 3NF

No transitive dependencies.
Each non-key attribute is fully functionally dependent on the whole key.
Repetition and redundancy are minimized.
Tables are modular and can be joined by keys when needed.

Of course! Here’s how you can convert each of those Python data structures into pandas DataFrames:

import pandas as pd

# 🟧 Customer Table
customers_df = pd.DataFrame([
    {"Cust_ID": "at_smith", "Cust_Name": "Alan Smith", "Shipping_Address": "35 Palm St, Miami", "Newsletter": "Xbox News"},
    {"Cust_ID": "roger25", "Cust_Name": "Roger Banks", "Shipping_Address": "47 Campus Rd, Boston", "Newsletter": "PlayStation News"},
    {"Cust_ID": "wilson44", "Cust_Name": "Evan Wilson", "Shipping_Address": "28 Rock Av, Denver", "Newsletter": "Xbox News"},
    {"Cust_ID": "wilson44", "Cust_Name": "Evan Wilson", "Shipping_Address": "28 Rock Av, Denver", "Newsletter": "PlayStation News"},
    {"Cust_ID": "am_smith", "Cust_Name": "Alan Smith", "Shipping_Address": "47 Campus Rd, Boston", "Newsletter": "PlayStation News"},
])

# 🟥 Customer-Item Table
customer_items_df = pd.DataFrame([
    {"Cust_ID": "at_smith", "Item": "Xbox One"},
    {"Cust_ID": "roger25", "Item": "PlayStation 4"},
    {"Cust_ID": "wilson44", "Item": "Xbox One"},
    {"Cust_ID": "wilson44", "Item": "PS Vita"},
    {"Cust_ID": "am_smith", "Item": "PlayStation 4"},
])

# 🟨 Item Table
items_df = pd.DataFrame([
    {"Item": "Xbox One", "Supplier": "Microsoft", "Price": 250},
    {"Item": "PlayStation 4", "Supplier": "Sony", "Price": 300},
    {"Item": "PS Vita", "Supplier": "Sony", "Price": 200},
])

# 🟩 Supplier Table
suppliers_df = pd.DataFrame([
    {"Supplier": "Microsoft", "Supplier_Phone": "(800) BUY-XBOX"},
    {"Supplier": "Sony", "Supplier_Phone": "(800) BUY-SONY"},
])

You can now run these in any Python environment with pandas installed to manipulate and query the data.

Want an example of joining these tables together (like a SQL JOIN) using pandas?

Here’s how you can join all these tables together using pandas, similar to SQL joins.

We’ll perform the joins in this order:

customer_items_df ⬅️ join with customers_df on Cust_ID
Result ⬅️ join with items_df on Item
Result ⬅️ join with suppliers_df on Supplier

# Step 1: Join customer_items with customers on Cust_ID
merged_df_1 = pd.merge(customer_items_df, customers_df, on="Cust_ID", how="left")

# Step 2: Join the result with items on Item
merged_df_2 = pd.merge(merged_df_1, items_df, on="Item", how="left")

# Step 3: Join the result with suppliers on Supplier
final_df = pd.merge(merged_df_2, suppliers_df, on="Supplier", how="left")

# Show the final merged DataFrame
print(final_df)

🔍 Final Output Columns#

The resulting DataFrame (final_df) will include:

Cust_ID
Item
Cust_Name
Shipping_Address
Newsletter
Supplier
Price
Supplier_Phone

Let me know if you want to filter this data (e.g., only see PlayStation buyers or group by supplier).

NF 4#

Top-Down#

Ausgangspunkt: Ein Verständnis von Welt. Fragestellung: Wie modellieren wir das? ein Team setzt sich zusammen … legt Attribute auf den Tisch: has_Supplier, has_Supplier_Phone, has_Price … Brainstorming: Welche Dinge der Welt werden damit beschrieben? … Weltwissen: Handelswaren, die einen Preis haben, wollen wir Produkt (trading good, allgemeiner: Item) nennen; oder: typischerweise sind es Handelswaren, die einen Preis haben. Zum Attribut Telefonnummer: Typischerweise sind es Menschen oder Unternehmen, die eine Telefonnummer haben; Handelswaren haben i.A. keine Telefonnummer; die Benennungen der Attribut has_Supplier_Phone und has_Supplier weisen darauf hin, dass es auch eine Entity Supplier gibt; also scheint das Attribut has_Supplier_Phone ein Attribut eines Dings Supplier zu sein … wie kommunizieren, wie notieren wir das? Willy Chen hat hier eine Notation entworfen, nämlich ER-Diagramme …

BSP ER-Diagramm Situation 1

BSP ER-Diagramm Situation 2, ausführlich mit Relation has_Supplier; auch in UML

ER-Modelle gehen von einer Welt aus, in der es Dinge gibt, die beschrieben werden (a) durch ihre Eigenschaften und (b) ihre Beziehungen zueinander … Ziel ist ein sog. relationales Datenmodell, eine relationale Datenbank