19_Pipeline
Code-Dateien
| Dateiname | Aktion |
|---|---|
| CODECode_pizza.zip | Download |
Videos
| Dateiname | Aktion |
|---|---|
| VIDEOVideo_Pizza_D | Abspielen |
Lernmaterialien
Aggregation pipeline
In MongoDB, the pipe concept usually means the aggregation pipeline.
Think of it like an assembly line for your data:
MongoDB takes documents from a collection
sends them through a sequence of steps
each step transforms, filters, groups, or reshapes the data
the output of one step becomes the input of the next step
That is why it is called a pipeline.
Basic idea
A pipeline is an array of stages:
db.orders.aggregate([
{ $match: { status: "paid" } },
{ $group: { _id: "$customerId", total: { $sum: "$amount" } } },
{ $sort: { total: -1 } }
])Here is what happens:
$matchFilters documents. Only orders withstatus: "paid"continue.$groupGroups remaining documents bycustomerIdand calculates the total amount per customer.$sortSorts the grouped result by total descending.
So the data flows like this:
collection → filter → group → sort → result
Why it is useful
The pipeline lets you do data processing directly in MongoDB, such as:
filtering rows
selecting fields
calculating sums, averages, counts
joining with another collection
transforming document structure
It is similar to:
SQL
WHERE→GROUP BY→ORDER BYor Unix pipes, where output of one command goes into the next
Common stages
Some important pipeline stages are:
$match→ likeWHERE$project→ choose or rename fields$group→ likeGROUP BY$sort→ sort results$limit→ restrict number of results$lookup→ join another collection$unwind→ split array elements into separate documents
Small example
Suppose you have:
{ name: "Anna", city: "Vienna", age: 22 }
{ name: "Ben", city: "Vienna", age: 30 }
{ name: "Clara", city: "Graz", age: 25 }Pipeline:
db.people.aggregate([
{ $match: { city: "Vienna" } },
{ $project: { name: 1, age: 1, _id: 0 } }
])Result:
{ name: "Anna", age: 22 }
{ name: "Ben", age: 30 }Example with group
With your data, that means MongoDB can take customer documents from
customer.json and process nested fields like
adress, order, driver, and
product through a pipeline. Your uploaded structure
contains customers with embedded orders and embedded ordered products,
plus a separate product collection with product metadata and restaurant
information.
Your structure
In your customer data, one customer document looks
roughly like this:
customer data
embedded address
an
orderarrayinside each order: driver data
inside each order: a
productarray
So the structure is:
Customer
├── first_name
├── last_name
├── adress
└── order[]
├── order_date
├── delivery_date
├── order_state
├── delivery_costs
├── driver
└── product[]
├── product_id
├── product_name
├── selling_price_piece
├── number_of_pieces
├── restaurant_nameYour separate product file stores product master data
such as product description, type, alcohol flag, and restaurant
details.
Order
db.customer.aggregate([
{ $limit: 1 },
{ $unwind: "$order" }
]);{
_id: '16',
first_name: 'Anna',
last_name: 'Schmidt',
order: {
order_date: '2025-01-05',
product: [
{ product_id: '5', product_name: 'Margherita Pizza' },
{ product_id: '7', product_name: 'Garlic Bread' }
]
}
}
{
_id: '16',
first_name: 'Anna',
last_name: 'Schmidt',
order: {
order_date: '2025-01-22',
product: [
{ product_id: '6', product_name: 'Pepperoni Pizza' }
]
}
}db.customer.aggregate([
{ $unwind: "$order" },
{ $limit: 1 }
]);{
_id: '16',
first_name: 'Anna',
last_name: 'Schmidt',
order: {
order_date: '2025-01-05',
product: [
{ product_id: '5', product_name: 'Margherita Pizza' },
{ product_id: '7', product_name: 'Garlic Bread' }
]
}
}What “pipe” means in this structure
A pipeline means MongoDB reads these documents and sends them through several operations.
Example idea:
first stage: choose only customers from Berlin
second stage: open the
orderarraythird stage: open the
productarrayfourth stage: group products by restaurant
fifth stage: calculate totals
So the flow is:
customer document → orders → products → grouping/calculation → result
That is the pipe concept.
Why the pipe is useful for your structure
Your structure is nested.
Without a pipeline, it is harder to analyze questions like:
Which customer ordered the most products?
How many times was
Margherita Pizzaordered?What is the total revenue per restaurant?
Which driver delivered the most orders?
Which products are currently
in_transit?
Because order and product are arrays inside
arrays, MongoDB pipelines are very useful here.
Example with your structure
Example question:
How many ordered items came from each restaurant?
For your structure, MongoDB can process it like this:
db.customer.aggregate([
{ $unwind: "$order" },
{ $unwind: "$order.product" },
{
$group: {
_id: "$order.product.restaurant_name",
totalOrderedPieces: {
$sum: "$order.product.number_of_pieces"
}
}
},
{ $sort: { totalOrderedPieces: -1 } }
])Explanation of the stages using your structure
Stage 1:
$unwind: "$order"
Each customer has an order array. This stage splits one
customer document into multiple documents, one for each order.
Input - one document:
"Anna"
orders [
"2026-01-10", products [ {"Margherita", "Pizza Place"}, {"Bread", "Pizza Place"}, {"Cola", "Pizza Place"}, {"Tiramisu", "Pizza Place"} ],
"2026-03-20", products [ {"Margherita", "Mega Pizza"}, {"Pepperoni", "Mega Pizza"} ],
"2026-03-29", products [ {"Margherita", "Mega Pizza"}, {"Cola", "Pizza Place"} ]
]Output - three documents:
"Anna" order { "2026-01-10", products [ {"Margherita", "Pizza Place"}, {"Bread", "Pizza Place"}, {"Cola", "Pizza Place"}, {"Tiramisu", "Pizza Place"} ] },
"Anna" order { "2026-03-20", products [ {"Margherita", "Mega Pizza"}, {"Pepperoni", "Mega Pizza"} ] },
"Anna" order { "2026-03-29", products [ {"Margherita", "Mega Pizza"}, {"Cola", "Pizza Place"} ] }This is needed because in your file, orders are stored inside the customer document.
Stage 2:
$unwind: "$order.product"
Each order has a product array. This stage splits one
order into multiple product rows.
So for Anna’s first order, products like
Margherita Pizza, Garlic Bread,
Cola 0.5L, and Tiramisu become separate
pipeline documents.
Input - three documents:
"Anna" order { "2026-01-10", products [ {"Margherita", "Pizza Place"}, {"Bread", "Pizza Place"}, {"Cola", "Pizza Place"}, {"Tiramisu", "Pizza Place"} ] },
"Anna" order { "2026-03-20", products [ {"Margherita", "Mega Pizza"}, {"Pepperoni", "Mega Pizza"} ] },
"Anna" order { "2026-03-29", products [ {"Margherita", "Mega Pizza"}, {"Cola", "Pizza Place"} ] }Output - eight documents:
"Anna" order { "2026-01-10", product {"Margherita", "Pizza Place"}
"Anna" order { "2026-01-10", product {"Bread", "Pizza Place"}
"Anna" order { "2026-01-10", product {"Cola", "Pizza Place"}
"Anna" order { "2026-01-10", product {"Tiramisu", "Pizza Place"}
"Anna" order { "2026-03-20", product {"Margherita", "Mega Pizza"}
"Anna" order { "2026-03-20", product {"Pepperoni", "Mega Pizza"}
"Anna" order { "2026-03-29", product {"Margherita", "Mega Pizza"}
"Anna" order { "2026-03-29", product {"Cola", "Pizza Place"}Stage 3: $group
Now MongoDB groups all product rows by:
"$order.product.restaurant_name"That means all products from "Pizza Palace" go together,
all from "Mega Pizza" go together, and so on. Those
restaurant names appear in your embedded product order data.
Stage 4: $sum
MongoDB adds up:
"$order.product.number_of_pieces"So it calculates how many pieces were ordered from each restaurant.
{ _id: 'Mega Pizza', totalOrderedPieces: 3 }
{ _id: 'Pizza Palace', totalOrderedPieces: 5 }Stage 5: $sort
Finally, the result is sorted from highest to lowest.
{ _id: 'Pizza Palace', totalOrderedPieces: 5 }
{ _id: 'Mega Pizza', totalOrderedPieces: 3 }Example with project
Another example with your structure
Question:
Show only customers from Berlin and their order states
Your structure contains city information in:
adress.city_nameFor example, Anna and Lukas are in Berlin.
Pipeline:
db.customer.aggregate([
{ $match: { "adress.city_name": "Berlin" } },
{ $unwind: "$order" },
{
$project: {
_id: 0,
customer: { $concat: ["$first_name", " ", "$last_name"] },
city: "$adress.city_name",
order_state: "$order.order_state",
order_date: "$order.order_date"
}
}
])Meaning:
$matchfilters only Berlin customers$unwindsplits orders$projectshows only the fields you want
Pipe concept compared to SQL
With your structure, MongoDB pipeline stages are similar to SQL operations:
$match→WHERE$project→SELECT$group→GROUP BY$sort→ORDER BY$unwind→ expand nested arrays into rows$lookup→ join with another collection
This is especially important in your case because your customer documents contain embedded arrays, not flat tables.
Example with $lookup
The idea is:
customercontains the orderswith
$unwindyou openorderandorder.productwith
$lookupyou fetch additional information from theproductcollectionthen you can output things like product type, description, and revenue
In your data, the customer collection stores ordered products with
fields like product_id, product_name,
selling_price_piece, and number_of_pieces,
while the separate product collection stores master data such as
product_type and product_decription.
Example with $lookup
db.customer.aggregate([
{ $unwind: "$order" },
{ $unwind: "$order.product" },
{
$lookup: {
from: "product",
localField: "order.product.product_id",
foreignField: "_id",
as: "productDetails"
}
},
{ $unwind: "$productDetails" },
{
$group: {
_id: {
restaurant: "$order.product.restaurant_name",
productType: "$productDetails.product_type"
},
totalOrderedPieces: {
$sum: "$order.product.number_of_pieces"
},
totalRevenue: {
$sum: {
$multiply: [
"$order.product.selling_price_piece",
"$order.product.number_of_pieces"
]
}
}
}
},
{ $sort: { "_id.restaurant": 1, totalRevenue: -1 } }
])What happens here?
$unwind: "$order" Each customer order becomes a separate
pipeline document.
$unwind: "$order.product" Each product inside an order
becomes a separate pipeline document.
$lookup MongoDB now joins: from customer
with _id from the product collection.
$unwind: "$productDetails" $lookup always
returns an array, so this turns that array into a single embedded
object.
$group Now the pipeline groups by: restaurant and
product type. Those product types come from your product
collection.
Example output
The result could look like this:
{
_id: {
restaurant: "Pizza Palace",
productType: "pizza"
},
totalOrderedPieces: 9,
totalRevenue: 88.1
}
{
_id: {
restaurant: "Burger House",
productType: "burger"
},
totalOrderedPieces: 4,
totalRevenue: 37.4
}Simpler
example: use $lookup just to show extra details
If you do not want to group yet and only want to see how the join works, you can use this version:
db.customer.aggregate([
{ $unwind: "$order" },
{ $unwind: "$order.product" },
{
$lookup: {
from: "product",
localField: "order.product.product_id",
foreignField: "_id",
as: "productDetails"
}
},
{ $unwind: "$productDetails" },
{
$project: {
_id: 0,
customer: { $concat: ["$first_name", " ", "$last_name"] },
restaurant: "$order.product.restaurant_name",
orderedProduct: "$order.product.product_name",
productType: "$productDetails.product_type",
description: "$productDetails.product_decription",
quantity: "$order.product.number_of_pieces",
pricePerPiece: "$order.product.selling_price_piece",
revenue: {
$multiply: [
"$order.product.selling_price_piece",
"$order.product.number_of_pieces"
]
}
}
}
])What this returns
For each ordered product, you will see:
customer
restaurant
ordered product
product type from the
productcollectiondescription from the
productcollectionquantity
revenue
Key idea of $lookup
With $lookup, you join two
collections.
In your case:
customer= order dataproduct= product master data
So the logic is:
order data + product details = richer result
Short version
For your structure, the typical join is:
$lookup: {
from: "product",
localField: "order.product.product_id",
foreignField: "_id",
as: "productDetails"
}This adds the product details from the product
collection to each ordered product in customer.