19_Pipeline

Code-Dateien

Dateiname	Aktion
CODECode_pizza.zip	Download

Videos

Dateiname	Aktion
VIDEOVideo_Pizza_D	Abspielen

Lernmaterialien

Aggregation pipeline

In MongoDB, the pipe concept usually means the aggregation pipeline.

Think of it like an assembly line for your data:

MongoDB takes documents from a collection
sends them through a sequence of steps
each step transforms, filters, groups, or reshapes the data
the output of one step becomes the input of the next step

That is why it is called a pipeline.

Basic idea

A pipeline is an array of stages:

db.orders.aggregate([
  { $match: { status: "paid" } },
  { $group: { _id: "$customerId", total: { $sum: "$amount" } } },
  { $sort: { total: -1 } }
])

Here is what happens:

$match Filters documents. Only orders with status: "paid" continue.
$group Groups remaining documents by customerId and calculates the total amount per customer.
$sort Sorts the grouped result by total descending.

So the data flows like this:

collection → filter → group → sort → result

Why it is useful

The pipeline lets you do data processing directly in MongoDB, such as:

filtering rows
selecting fields
calculating sums, averages, counts
joining with another collection
transforming document structure

It is similar to:

SQL WHERE → GROUP BY → ORDER BY
or Unix pipes, where output of one command goes into the next

Common stages

Some important pipeline stages are:

$match → like WHERE
$project → choose or rename fields
$group → like GROUP BY
$sort → sort results
$limit → restrict number of results
$lookup → join another collection
$unwind → split array elements into separate documents

Small example

Suppose you have:

{ name: "Anna", city: "Vienna", age: 22 }
{ name: "Ben", city: "Vienna", age: 30 }
{ name: "Clara", city: "Graz", age: 25 }

Pipeline:

db.people.aggregate([
  { $match: { city: "Vienna" } },
  { $project: { name: 1, age: 1, _id: 0 } }
])

Result:

{ name: "Anna", age: 22 }
{ name: "Ben", age: 30 }

Example with `group`

With your data, that means MongoDB can take customer documents from customer.json and process nested fields like adress, order, driver, and product through a pipeline. Your uploaded structure contains customers with embedded orders and embedded ordered products, plus a separate product collection with product metadata and restaurant information.

Your structure

In your customer data, one customer document looks roughly like this:

customer data
embedded address
an order array
inside each order: driver data
inside each order: a product array

So the structure is:

Customer
 ├── first_name
 ├── last_name
 ├── adress
 └── order[]
      ├── order_date
      ├── delivery_date
      ├── order_state
      ├── delivery_costs
      ├── driver
      └── product[]
           ├── product_id
           ├── product_name
           ├── selling_price_piece
           ├── number_of_pieces
           ├── restaurant_name

Your separate product file stores product master data such as product description, type, alcohol flag, and restaurant details.

Order

db.customer.aggregate([
  { $limit: 1 },
  { $unwind: "$order" }
]);

{
  _id: '16',
  first_name: 'Anna',
  last_name: 'Schmidt',
  order: {
    order_date: '2025-01-05',
    product: [
      { product_id: '5', product_name: 'Margherita Pizza' },
      { product_id: '7', product_name: 'Garlic Bread' }
    ]
  }
}
{
  _id: '16',
  first_name: 'Anna',
  last_name: 'Schmidt',
  order: {
    order_date: '2025-01-22',
    product: [
      { product_id: '6', product_name: 'Pepperoni Pizza' }
    ]
  }
}

db.customer.aggregate([
  { $unwind: "$order" },
  { $limit: 1 }
]);

{
  _id: '16',
  first_name: 'Anna',
  last_name: 'Schmidt',
  order: {
    order_date: '2025-01-05',
    product: [
      { product_id: '5', product_name: 'Margherita Pizza' },
      { product_id: '7', product_name: 'Garlic Bread' }
    ]
  }
}

What “pipe” means in this structure

A pipeline means MongoDB reads these documents and sends them through several operations.

Example idea:

first stage: choose only customers from Berlin
second stage: open the order array
third stage: open the product array
fourth stage: group products by restaurant
fifth stage: calculate totals

So the flow is:

customer document → orders → products → grouping/calculation → result

That is the pipe concept.

Why the pipe is useful for your structure

Your structure is nested.

Without a pipeline, it is harder to analyze questions like:

Which customer ordered the most products?
How many times was Margherita Pizza ordered?
What is the total revenue per restaurant?
Which driver delivered the most orders?
Which products are currently in_transit?

Because order and product are arrays inside arrays, MongoDB pipelines are very useful here.

Example with your structure

Example question:

How many ordered items came from each restaurant?

For your structure, MongoDB can process it like this:

db.customer.aggregate([
  { $unwind: "$order" },
  { $unwind: "$order.product" },
  {
    $group: {
      _id: "$order.product.restaurant_name",
      totalOrderedPieces: {
        $sum: "$order.product.number_of_pieces"
      }
    }
  },
  { $sort: { totalOrderedPieces: -1 } }
])

Explanation of the stages using your structure

Stage 1: `$unwind: "$order"`

Each customer has an order array. This stage splits one customer document into multiple documents, one for each order.

Input - one document:

"Anna"
orders [
  "2026-01-10", products [ {"Margherita", "Pizza Place"}, {"Bread", "Pizza Place"}, {"Cola", "Pizza Place"}, {"Tiramisu", "Pizza Place"} ],
  "2026-03-20", products [ {"Margherita", "Mega Pizza"}, {"Pepperoni", "Mega Pizza"} ],
  "2026-03-29", products [ {"Margherita", "Mega Pizza"}, {"Cola", "Pizza Place"} ]
]

Output - three documents:

"Anna" order { "2026-01-10", products [ {"Margherita", "Pizza Place"}, {"Bread", "Pizza Place"}, {"Cola", "Pizza Place"}, {"Tiramisu", "Pizza Place"} ] },
"Anna" order { "2026-03-20", products [ {"Margherita", "Mega Pizza"}, {"Pepperoni", "Mega Pizza"} ] },
"Anna" order { "2026-03-29", products [ {"Margherita", "Mega Pizza"}, {"Cola", "Pizza Place"} ] }

This is needed because in your file, orders are stored inside the customer document.

Stage 2: `$unwind: "$order.product"`

Each order has a product array. This stage splits one order into multiple product rows.

So for Anna’s first order, products like Margherita Pizza, Garlic Bread, Cola 0.5L, and Tiramisu become separate pipeline documents.

Input - three documents:

"Anna" order { "2026-01-10", products [ {"Margherita", "Pizza Place"}, {"Bread", "Pizza Place"}, {"Cola", "Pizza Place"}, {"Tiramisu", "Pizza Place"} ] },
"Anna" order { "2026-03-20", products [ {"Margherita", "Mega Pizza"}, {"Pepperoni", "Mega Pizza"} ] },
"Anna" order { "2026-03-29", products [ {"Margherita", "Mega Pizza"}, {"Cola", "Pizza Place"} ] }

Output - eight documents:

"Anna" order { "2026-01-10", product {"Margherita", "Pizza Place"}
"Anna" order { "2026-01-10", product {"Bread", "Pizza Place"}
"Anna" order { "2026-01-10", product {"Cola", "Pizza Place"}
"Anna" order { "2026-01-10", product {"Tiramisu", "Pizza Place"}

"Anna" order { "2026-03-20", product {"Margherita", "Mega Pizza"}
"Anna" order { "2026-03-20", product {"Pepperoni", "Mega Pizza"}

"Anna" order { "2026-03-29", product {"Margherita", "Mega Pizza"}
"Anna" order { "2026-03-29", product {"Cola", "Pizza Place"}

Stage 3: `$group`

Now MongoDB groups all product rows by:

"$order.product.restaurant_name"

That means all products from "Pizza Palace" go together, all from "Mega Pizza" go together, and so on. Those restaurant names appear in your embedded product order data.

Stage 4: `$sum`

MongoDB adds up:

"$order.product.number_of_pieces"

So it calculates how many pieces were ordered from each restaurant.

{ _id: 'Mega Pizza', totalOrderedPieces: 3 }
{ _id: 'Pizza Palace', totalOrderedPieces: 5 }

Stage 5: `$sort`

Finally, the result is sorted from highest to lowest.

{ _id: 'Pizza Palace', totalOrderedPieces: 5 }
{ _id: 'Mega Pizza', totalOrderedPieces: 3 }

Example with `project`

Another example with your structure

Question:

Show only customers from Berlin and their order states

Your structure contains city information in:

adress.city_name

For example, Anna and Lukas are in Berlin.

Pipeline:

db.customer.aggregate([
  { $match: { "adress.city_name": "Berlin" } },
  { $unwind: "$order" },
  {
    $project: {
      _id: 0,
      customer: { $concat: ["$first_name", " ", "$last_name"] },
      city: "$adress.city_name",
      order_state: "$order.order_state",
      order_date: "$order.order_date"
    }
  }
])

Meaning:

$match filters only Berlin customers
$unwind splits orders
$project shows only the fields you want

Pipe concept compared to SQL

With your structure, MongoDB pipeline stages are similar to SQL operations:

$match → WHERE
$project → SELECT
$group → GROUP BY
$sort → ORDER BY
$unwind → expand nested arrays into rows
$lookup → join with another collection

This is especially important in your case because your customer documents contain embedded arrays, not flat tables.

Example with `$lookup`

The idea is:

customer contains the orders
with $unwind you open order and order.product
with $lookup you fetch additional information from the product collection
then you can output things like product type, description, and revenue

In your data, the customer collection stores ordered products with fields like product_id, product_name, selling_price_piece, and number_of_pieces, while the separate product collection stores master data such as product_type and product_decription.

Example with `$lookup`

db.customer.aggregate([
  { $unwind: "$order" },
  { $unwind: "$order.product" },
  {
    $lookup: {
      from: "product",
      localField: "order.product.product_id",
      foreignField: "_id",
      as: "productDetails"
    }
  },
  { $unwind: "$productDetails" },
  {
    $group: {
      _id: {
        restaurant: "$order.product.restaurant_name",
        productType: "$productDetails.product_type"
      },
      totalOrderedPieces: {
        $sum: "$order.product.number_of_pieces"
      },
      totalRevenue: {
        $sum: {
          $multiply: [
            "$order.product.selling_price_piece",
            "$order.product.number_of_pieces"
          ]
        }
      }
    }
  },
  { $sort: { "_id.restaurant": 1, totalRevenue: -1 } }
])

What happens here?

$unwind: "$order" Each customer order becomes a separate pipeline document.

$unwind: "$order.product" Each product inside an order becomes a separate pipeline document.

$lookup MongoDB now joins: from customer with _id from the product collection.

$unwind: "$productDetails" $lookup always returns an array, so this turns that array into a single embedded object.

$group Now the pipeline groups by: restaurant and product type. Those product types come from your product collection.

Example output

The result could look like this:

{
  _id: {
    restaurant: "Pizza Palace",
    productType: "pizza"
  },
  totalOrderedPieces: 9,
  totalRevenue: 88.1
}
{
  _id: {
    restaurant: "Burger House",
    productType: "burger"
  },
  totalOrderedPieces: 4,
  totalRevenue: 37.4
}

Simpler example: use `$lookup` just to show extra details

If you do not want to group yet and only want to see how the join works, you can use this version:

db.customer.aggregate([
  { $unwind: "$order" },
  { $unwind: "$order.product" },
  {
    $lookup: {
      from: "product",
      localField: "order.product.product_id",
      foreignField: "_id",
      as: "productDetails"
    }
  },
  { $unwind: "$productDetails" },
  {
    $project: {
      _id: 0,
      customer: { $concat: ["$first_name", " ", "$last_name"] },
      restaurant: "$order.product.restaurant_name",
      orderedProduct: "$order.product.product_name",
      productType: "$productDetails.product_type",
      description: "$productDetails.product_decription",
      quantity: "$order.product.number_of_pieces",
      pricePerPiece: "$order.product.selling_price_piece",
      revenue: {
        $multiply: [
          "$order.product.selling_price_piece",
          "$order.product.number_of_pieces"
        ]
      }
    }
  }
])

What this returns

For each ordered product, you will see:

customer
restaurant
ordered product
product type from the product collection
description from the product collection
quantity
revenue

Key idea of `$lookup`

With $lookup, you join two collections.

In your case:

customer = order data
product = product master data

So the logic is:

order data + product details = richer result

Short version

For your structure, the typical join is:

$lookup: {
  from: "product",
  localField: "order.product.product_id",
  foreignField: "_id",
  as: "productDetails"
}

This adds the product details from the product collection to each ordered product in customer.

Semester_8

19_Pipeline

Code-Dateien

Videos

Lernmaterialien

Aggregation pipeline

Basic idea

Why it is useful

Common stages

Small example

Example with group

Your structure

Order

What “pipe” means in this structure

Why the pipe is useful for your structure

Example with your structure

Example question:

Explanation of the stages using your structure

Stage 1: $unwind: "$order"

Stage 2: $unwind: "$order.product"

Stage 3: $group

Stage 4: $sum

Stage 5: $sort

Example with project

Another example with your structure

Question:

Meaning:

Pipe concept compared to SQL

Example with $lookup

Example with $lookup

What happens here?

Example output

Simpler example: use $lookup just to show extra details

What this returns

Key idea of $lookup

Short version

Example with `group`

Stage 1: `$unwind: "$order"`

Stage 2: `$unwind: "$order.product"`

Stage 3: `$group`

Stage 4: `$sum`

Stage 5: `$sort`

Example with `project`

Example with `$lookup`

Example with `$lookup`

Simpler example: use `$lookup` just to show extra details

Key idea of `$lookup`