Relationship between Database Normalization and RDF#
Normalization - 1NF, 2NF, 3NF and 4NF, https://www.youtube.com/watch?v=UrYLYV7WSHM, User channel5567
Items: XBOX1, PS4, PSVITA
Bottom up#
Ausgangspunkt: Eine Tabelle. Ziel: mehrere normalisierte Tabellen.
NF 1#
nf1, Table Sales Records, 5:00, columns:
Cust ID, Cust Name, Shipping Address, Newsletter, Supplier, Supplier Phone, Price
NF 2#
nf2, 8:46
#C-CN-SA-N: #Cust ID, Cust Name, Shipping Address, Newsletter
#Item, Supplier, Supplier Phone, Price
C-I: Cust ID, Item (Table Transactions, keine ID)
NF 3#
Wir betrachten nur die rechte Hälfte … 2 Abbildungen, ähnliche Farbgebung wie im Video
2NF, 9:58, Tables:
Table #I-S-SP-P: #Item, has_Supplier, has_Supplier_Phone, has_Price
Wir haben hier eine einzige Tabelle, mithin eine Entity, mit 3 Attributen: Ein Item hat einen Supplier, ein Supplier Phone, einen Price …
Problem (9:58): wir entdecken eine logische Abhängigkeit zwischen den Attribute has_Supplier und has_Supplier_Phone , “seperate it out” … das Telefon des Suppliers ist nicht vom Item, sondern vom Supplier abhängig ; “obviously we have to sepererate it out” .. we will get:
3NF, Tables 12:06
Table #I-S-P: #Item, has_Supplier, has_Price
Table #S-SP: #Supplier, has_Supplier Phone
Was passiert hier? Wir erkennen, dass wir nicht nur eine einzige Entity Item
haben, sondern dass eine weitere Entity Supplier
gibt. Für diese legen wir eine eigene Tabelle an, und moven die Spalte `has_Supplier_Phone” in diese Tabelle.
Nachteil: Wenn wir in unsere Datenbank ein neues Item einpflegen, müssen wir bei Bedarf auch einen neuen Supplier einpflegen. Vorteil: Vermeidung von Inkonsistenzen wie z.B. zwei unterschiedliche Telefonnummern eine Providers.
ChatGPT#
(We simply upload the former image to ChatGPT withot any other question. This is the answer:)
This diagram is a visual representation of Third Normal Form (3NF) in database normalization. The key principle of 3NF is that:
All fields (columns) must depend only on the primary key, and nothing else.
Here’s how each table supports 3NF:
🟧 Customer Table
Primary Key:
Cust ID
Other fields like
Cust Name
,Shipping Address
, andNewsletter
are dependent only onCust ID
.Ensures no transitive dependencies (i.e., no column depends on a non-key column).
customers = [
{"Cust_ID": "at_smith", "Cust_Name": "Alan Smith", "Shipping_Address": "35 Palm St, Miami", "Newsletter": "Xbox News"},
{"Cust_ID": "roger25", "Cust_Name": "Roger Banks", "Shipping_Address": "47 Campus Rd, Boston", "Newsletter": "PlayStation News"},
{"Cust_ID": "wilson44", "Cust_Name": "Evan Wilson", "Shipping_Address": "28 Rock Av, Denver", "Newsletter": "Xbox News"},
{"Cust_ID": "wilson44", "Cust_Name": "Evan Wilson", "Shipping_Address": "28 Rock Av, Denver", "Newsletter": "PlayStation News"},
{"Cust_ID": "am_smith", "Cust_Name": "Alan Smith", "Shipping_Address": "47 Campus Rd, Boston", "Newsletter": "PlayStation News"},
]
🟥 Customer-Item Relationship Table
Composite Primary Key:
Cust ID + Item
Shows which customer bought or subscribes to which item.
Maintains many-to-many relationships between customers and items.
customer_items = [
{"Cust_ID": "at_smith", "Item": "Xbox One"},
{"Cust_ID": "roger25", "Item": "PlayStation 4"},
{"Cust_ID": "wilson44", "Item": "Xbox One"},
{"Cust_ID": "wilson44", "Item": "PS Vita"},
{"Cust_ID": "am_smith", "Item": "PlayStation 4"},
]
🟨 Item Table
Primary Key:
Item
Has a Foreign Key to the Supplier.
Price
is dependent only onItem
, not onSupplier
, preserving 3NF.
items = [
{"Item": "Xbox One", "Supplier": "Microsoft", "Price": 250},
{"Item": "PlayStation 4", "Supplier": "Sony", "Price": 300},
{"Item": "PS Vita", "Supplier": "Sony", "Price": 200},
]
🟩 Supplier Table
Primary Key:
Supplier
Supplier Phone
is directly dependent on theSupplier
.
suppliers = [
{"Supplier": "Microsoft", "Supplier_Phone": "(800) BUY-XBOX"},
{"Supplier": "Sony", "Supplier_Phone": "(800) BUY-SONY"},
]
✅ How It Meets 3NF
No transitive dependencies.
Each non-key attribute is fully functionally dependent on the whole key.
Repetition and redundancy are minimized.
Tables are modular and can be joined by keys when needed.
Of course! Here’s how you can convert each of those Python data structures into pandas DataFrames:
import pandas as pd
# 🟧 Customer Table
customers_df = pd.DataFrame([
{"Cust_ID": "at_smith", "Cust_Name": "Alan Smith", "Shipping_Address": "35 Palm St, Miami", "Newsletter": "Xbox News"},
{"Cust_ID": "roger25", "Cust_Name": "Roger Banks", "Shipping_Address": "47 Campus Rd, Boston", "Newsletter": "PlayStation News"},
{"Cust_ID": "wilson44", "Cust_Name": "Evan Wilson", "Shipping_Address": "28 Rock Av, Denver", "Newsletter": "Xbox News"},
{"Cust_ID": "wilson44", "Cust_Name": "Evan Wilson", "Shipping_Address": "28 Rock Av, Denver", "Newsletter": "PlayStation News"},
{"Cust_ID": "am_smith", "Cust_Name": "Alan Smith", "Shipping_Address": "47 Campus Rd, Boston", "Newsletter": "PlayStation News"},
])
# 🟥 Customer-Item Table
customer_items_df = pd.DataFrame([
{"Cust_ID": "at_smith", "Item": "Xbox One"},
{"Cust_ID": "roger25", "Item": "PlayStation 4"},
{"Cust_ID": "wilson44", "Item": "Xbox One"},
{"Cust_ID": "wilson44", "Item": "PS Vita"},
{"Cust_ID": "am_smith", "Item": "PlayStation 4"},
])
# 🟨 Item Table
items_df = pd.DataFrame([
{"Item": "Xbox One", "Supplier": "Microsoft", "Price": 250},
{"Item": "PlayStation 4", "Supplier": "Sony", "Price": 300},
{"Item": "PS Vita", "Supplier": "Sony", "Price": 200},
])
# 🟩 Supplier Table
suppliers_df = pd.DataFrame([
{"Supplier": "Microsoft", "Supplier_Phone": "(800) BUY-XBOX"},
{"Supplier": "Sony", "Supplier_Phone": "(800) BUY-SONY"},
])
You can now run these in any Python environment with pandas
installed to manipulate and query the data.
Want an example of joining these tables together (like a SQL JOIN) using pandas?
Here’s how you can join all these tables together using pandas, similar to SQL joins.
We’ll perform the joins in this order:
customer_items_df
⬅️ join withcustomers_df
onCust_ID
Result ⬅️ join with
items_df
onItem
Result ⬅️ join with
suppliers_df
onSupplier
# Step 1: Join customer_items with customers on Cust_ID
merged_df_1 = pd.merge(customer_items_df, customers_df, on="Cust_ID", how="left")
# Step 2: Join the result with items on Item
merged_df_2 = pd.merge(merged_df_1, items_df, on="Item", how="left")
# Step 3: Join the result with suppliers on Supplier
final_df = pd.merge(merged_df_2, suppliers_df, on="Supplier", how="left")
# Show the final merged DataFrame
print(final_df)
🔍 Final Output Columns#
The resulting DataFrame (final_df
) will include:
Cust_ID
Item
Cust_Name
Shipping_Address
Newsletter
Supplier
Price
Supplier_Phone
Let me know if you want to filter this data (e.g., only see PlayStation buyers or group by supplier).
NF 4#
Top-Down#
Ausgangspunkt: Ein Verständnis von Welt. Fragestellung: Wie modellieren wir das? ein Team setzt sich zusammen … legt Attribute auf den Tisch: has_Supplier, has_Supplier_Phone, has_Price … Brainstorming: Welche Dinge der Welt werden damit beschrieben? … Weltwissen: Handelswaren, die einen Preis haben, wollen wir Produkt (trading good, allgemeiner: Item) nennen; oder: typischerweise sind es Handelswaren, die einen Preis haben. Zum Attribut Telefonnummer: Typischerweise sind es Menschen oder Unternehmen, die eine Telefonnummer haben; Handelswaren haben i.A. keine Telefonnummer; die Benennungen der Attribut has_Supplier_Phone
und has_Supplier
weisen darauf hin, dass es auch eine Entity Supplier gibt; also scheint das Attribut has_Supplier_Phone
ein Attribut eines Dings Supplier
zu sein … wie kommunizieren, wie notieren wir das? Willy Chen hat hier eine Notation entworfen, nämlich ER-Diagramme …
BSP ER-Diagramm Situation 1
BSP ER-Diagramm Situation 2, ausführlich mit Relation has_Supplier; auch in UML
ER-Modelle gehen von einer Welt aus, in der es Dinge gibt, die beschrieben werden (a) durch ihre Eigenschaften und (b) ihre Beziehungen zueinander … Ziel ist ein sog. relationales Datenmodell, eine relationale Datenbank