19_Pipeline

Code-Dateien

DateinameAktion
CODECode_pizza.zipDownload

Videos

DateinameAktion
VIDEOVideo_Pizza_DAbspielen

Lernmaterialien

Aggregation pipeline

In MongoDB, the pipe concept usually means the aggregation pipeline.

Think of it like an assembly line for your data:

  • MongoDB takes documents from a collection

  • sends them through a sequence of steps

  • each step transforms, filters, groups, or reshapes the data

  • the output of one step becomes the input of the next step

That is why it is called a pipeline.

Basic idea

A pipeline is an array of stages:

db.orders.aggregate([
  { $match: { status: "paid" } },
  { $group: { _id: "$customerId", total: { $sum: "$amount" } } },
  { $sort: { total: -1 } }
])

Here is what happens:

  1. $match Filters documents. Only orders with status: "paid" continue.

  2. $group Groups remaining documents by customerId and calculates the total amount per customer.

  3. $sort Sorts the grouped result by total descending.

So the data flows like this:

collection → filter → group → sort → result

Why it is useful

The pipeline lets you do data processing directly in MongoDB, such as:

  • filtering rows

  • selecting fields

  • calculating sums, averages, counts

  • joining with another collection

  • transforming document structure

It is similar to:

  • SQL WHEREGROUP BYORDER BY

  • or Unix pipes, where output of one command goes into the next

Common stages

Some important pipeline stages are:

  • $match → like WHERE

  • $project → choose or rename fields

  • $group → like GROUP BY

  • $sort → sort results

  • $limit → restrict number of results

  • $lookup → join another collection

  • $unwind → split array elements into separate documents

Small example

Suppose you have:

{ name: "Anna", city: "Vienna", age: 22 }
{ name: "Ben", city: "Vienna", age: 30 }
{ name: "Clara", city: "Graz", age: 25 }

Pipeline:

db.people.aggregate([
  { $match: { city: "Vienna" } },
  { $project: { name: 1, age: 1, _id: 0 } }
])

Result:

{ name: "Anna", age: 22 }
{ name: "Ben", age: 30 }

Example with group

With your data, that means MongoDB can take customer documents from customer.json and process nested fields like adress, order, driver, and product through a pipeline. Your uploaded structure contains customers with embedded orders and embedded ordered products, plus a separate product collection with product metadata and restaurant information.

Your structure

In your customer data, one customer document looks roughly like this:

  • customer data

  • embedded address

  • an order array

  • inside each order: driver data

  • inside each order: a product array

So the structure is:

Customer
 ├── first_name
 ├── last_name
 ├── adress
 └── order[]
      ├── order_date
      ├── delivery_date
      ├── order_state
      ├── delivery_costs
      ├── driver
      └── product[]
           ├── product_id
           ├── product_name
           ├── selling_price_piece
           ├── number_of_pieces
           ├── restaurant_name

Your separate product file stores product master data such as product description, type, alcohol flag, and restaurant details.

Order

db.customer.aggregate([
  { $limit: 1 },
  { $unwind: "$order" }
]);
{
  _id: '16',
  first_name: 'Anna',
  last_name: 'Schmidt',
  order: {
    order_date: '2025-01-05',
    product: [
      { product_id: '5', product_name: 'Margherita Pizza' },
      { product_id: '7', product_name: 'Garlic Bread' }
    ]
  }
}
{
  _id: '16',
  first_name: 'Anna',
  last_name: 'Schmidt',
  order: {
    order_date: '2025-01-22',
    product: [
      { product_id: '6', product_name: 'Pepperoni Pizza' }
    ]
  }
}
db.customer.aggregate([
  { $unwind: "$order" },
  { $limit: 1 }
]);
{
  _id: '16',
  first_name: 'Anna',
  last_name: 'Schmidt',
  order: {
    order_date: '2025-01-05',
    product: [
      { product_id: '5', product_name: 'Margherita Pizza' },
      { product_id: '7', product_name: 'Garlic Bread' }
    ]
  }
}

What “pipe” means in this structure

A pipeline means MongoDB reads these documents and sends them through several operations.

Example idea:

  • first stage: choose only customers from Berlin

  • second stage: open the order array

  • third stage: open the product array

  • fourth stage: group products by restaurant

  • fifth stage: calculate totals

So the flow is:

customer document → orders → products → grouping/calculation → result

That is the pipe concept.

Why the pipe is useful for your structure

Your structure is nested.

Without a pipeline, it is harder to analyze questions like:

  • Which customer ordered the most products?

  • How many times was Margherita Pizza ordered?

  • What is the total revenue per restaurant?

  • Which driver delivered the most orders?

  • Which products are currently in_transit?

Because order and product are arrays inside arrays, MongoDB pipelines are very useful here.

Example with your structure

Example question:

How many ordered items came from each restaurant?

For your structure, MongoDB can process it like this:

db.customer.aggregate([
  { $unwind: "$order" },
  { $unwind: "$order.product" },
  {
    $group: {
      _id: "$order.product.restaurant_name",
      totalOrderedPieces: {
        $sum: "$order.product.number_of_pieces"
      }
    }
  },
  { $sort: { totalOrderedPieces: -1 } }
])

Explanation of the stages using your structure

Stage 1: $unwind: "$order"

Each customer has an order array. This stage splits one customer document into multiple documents, one for each order.

Input - one document:

"Anna"
orders [
  "2026-01-10", products [ {"Margherita", "Pizza Place"}, {"Bread", "Pizza Place"}, {"Cola", "Pizza Place"}, {"Tiramisu", "Pizza Place"} ],
  "2026-03-20", products [ {"Margherita", "Mega Pizza"}, {"Pepperoni", "Mega Pizza"} ],
  "2026-03-29", products [ {"Margherita", "Mega Pizza"}, {"Cola", "Pizza Place"} ]
]

Output - three documents:

"Anna" order { "2026-01-10", products [ {"Margherita", "Pizza Place"}, {"Bread", "Pizza Place"}, {"Cola", "Pizza Place"}, {"Tiramisu", "Pizza Place"} ] },
"Anna" order { "2026-03-20", products [ {"Margherita", "Mega Pizza"}, {"Pepperoni", "Mega Pizza"} ] },
"Anna" order { "2026-03-29", products [ {"Margherita", "Mega Pizza"}, {"Cola", "Pizza Place"} ] }

This is needed because in your file, orders are stored inside the customer document.

Stage 2: $unwind: "$order.product"

Each order has a product array. This stage splits one order into multiple product rows.

So for Anna’s first order, products like Margherita Pizza, Garlic Bread, Cola 0.5L, and Tiramisu become separate pipeline documents.

Input - three documents:

"Anna" order { "2026-01-10", products [ {"Margherita", "Pizza Place"}, {"Bread", "Pizza Place"}, {"Cola", "Pizza Place"}, {"Tiramisu", "Pizza Place"} ] },
"Anna" order { "2026-03-20", products [ {"Margherita", "Mega Pizza"}, {"Pepperoni", "Mega Pizza"} ] },
"Anna" order { "2026-03-29", products [ {"Margherita", "Mega Pizza"}, {"Cola", "Pizza Place"} ] }

Output - eight documents:

"Anna" order { "2026-01-10", product {"Margherita", "Pizza Place"}
"Anna" order { "2026-01-10", product {"Bread", "Pizza Place"}
"Anna" order { "2026-01-10", product {"Cola", "Pizza Place"}
"Anna" order { "2026-01-10", product {"Tiramisu", "Pizza Place"}

"Anna" order { "2026-03-20", product {"Margherita", "Mega Pizza"}
"Anna" order { "2026-03-20", product {"Pepperoni", "Mega Pizza"}

"Anna" order { "2026-03-29", product {"Margherita", "Mega Pizza"}
"Anna" order { "2026-03-29", product {"Cola", "Pizza Place"}

Stage 3: $group

Now MongoDB groups all product rows by:

"$order.product.restaurant_name"

That means all products from "Pizza Palace" go together, all from "Mega Pizza" go together, and so on. Those restaurant names appear in your embedded product order data.

Stage 4: $sum

MongoDB adds up:

"$order.product.number_of_pieces"

So it calculates how many pieces were ordered from each restaurant.

{ _id: 'Mega Pizza', totalOrderedPieces: 3 }
{ _id: 'Pizza Palace', totalOrderedPieces: 5 }

Stage 5: $sort

Finally, the result is sorted from highest to lowest.

{ _id: 'Pizza Palace', totalOrderedPieces: 5 }
{ _id: 'Mega Pizza', totalOrderedPieces: 3 }

Example with project

Another example with your structure

Question:

Show only customers from Berlin and their order states

Your structure contains city information in:

adress.city_name

For example, Anna and Lukas are in Berlin.

Pipeline:

db.customer.aggregate([
  { $match: { "adress.city_name": "Berlin" } },
  { $unwind: "$order" },
  {
    $project: {
      _id: 0,
      customer: { $concat: ["$first_name", " ", "$last_name"] },
      city: "$adress.city_name",
      order_state: "$order.order_state",
      order_date: "$order.order_date"
    }
  }
])

Meaning:

  • $match filters only Berlin customers

  • $unwind splits orders

  • $project shows only the fields you want

Pipe concept compared to SQL

With your structure, MongoDB pipeline stages are similar to SQL operations:

  • $matchWHERE

  • $projectSELECT

  • $groupGROUP BY

  • $sortORDER BY

  • $unwind → expand nested arrays into rows

  • $lookup → join with another collection

This is especially important in your case because your customer documents contain embedded arrays, not flat tables.

Example with $lookup

The idea is:

  • customer contains the orders

  • with $unwind you open order and order.product

  • with $lookup you fetch additional information from the product collection

  • then you can output things like product type, description, and revenue

In your data, the customer collection stores ordered products with fields like product_id, product_name, selling_price_piece, and number_of_pieces, while the separate product collection stores master data such as product_type and product_decription.

Example with $lookup

db.customer.aggregate([
  { $unwind: "$order" },
  { $unwind: "$order.product" },
  {
    $lookup: {
      from: "product",
      localField: "order.product.product_id",
      foreignField: "_id",
      as: "productDetails"
    }
  },
  { $unwind: "$productDetails" },
  {
    $group: {
      _id: {
        restaurant: "$order.product.restaurant_name",
        productType: "$productDetails.product_type"
      },
      totalOrderedPieces: {
        $sum: "$order.product.number_of_pieces"
      },
      totalRevenue: {
        $sum: {
          $multiply: [
            "$order.product.selling_price_piece",
            "$order.product.number_of_pieces"
          ]
        }
      }
    }
  },
  { $sort: { "_id.restaurant": 1, totalRevenue: -1 } }
])

What happens here?

$unwind: "$order" Each customer order becomes a separate pipeline document.

$unwind: "$order.product" Each product inside an order becomes a separate pipeline document.

$lookup MongoDB now joins: from customer with _id from the product collection.

$unwind: "$productDetails" $lookup always returns an array, so this turns that array into a single embedded object.

$group Now the pipeline groups by: restaurant and product type. Those product types come from your product collection.

Example output

The result could look like this:

{
  _id: {
    restaurant: "Pizza Palace",
    productType: "pizza"
  },
  totalOrderedPieces: 9,
  totalRevenue: 88.1
}
{
  _id: {
    restaurant: "Burger House",
    productType: "burger"
  },
  totalOrderedPieces: 4,
  totalRevenue: 37.4
}

Simpler example: use $lookup just to show extra details

If you do not want to group yet and only want to see how the join works, you can use this version:

db.customer.aggregate([
  { $unwind: "$order" },
  { $unwind: "$order.product" },
  {
    $lookup: {
      from: "product",
      localField: "order.product.product_id",
      foreignField: "_id",
      as: "productDetails"
    }
  },
  { $unwind: "$productDetails" },
  {
    $project: {
      _id: 0,
      customer: { $concat: ["$first_name", " ", "$last_name"] },
      restaurant: "$order.product.restaurant_name",
      orderedProduct: "$order.product.product_name",
      productType: "$productDetails.product_type",
      description: "$productDetails.product_decription",
      quantity: "$order.product.number_of_pieces",
      pricePerPiece: "$order.product.selling_price_piece",
      revenue: {
        $multiply: [
          "$order.product.selling_price_piece",
          "$order.product.number_of_pieces"
        ]
      }
    }
  }
])

What this returns

For each ordered product, you will see:

  • customer

  • restaurant

  • ordered product

  • product type from the product collection

  • description from the product collection

  • quantity

  • revenue

Key idea of $lookup

With $lookup, you join two collections.

In your case:

  • customer = order data

  • product = product master data

So the logic is:

order data + product details = richer result

Short version

For your structure, the typical join is:

$lookup: {
  from: "product",
  localField: "order.product.product_id",
  foreignField: "_id",
  as: "productDetails"
}

This adds the product details from the product collection to each ordered product in customer.