Aggregation function with examples part 01 ($group, $sum, $avg, $min, $max, $count, $first, $last, $push, $addToSet, $unwind, $sort, $project, $switch, $toUpper, $toLower)

MongoDB Aggregation Functions with Examples
  • Step 1: Insert a Dataset into MongoDB Collection
  • First, we'll insert a sample dataset into a MongoDB collection named `orders`. Here’s how you can insert the data using the MongoDB shell or a Node.js script.
  • Insert the Dataset:

    db.orders.insertMany([
        {
            "_id": 1,
            "customerId": "C001",
            "items": 2,
            "amount": 500,
            "status": "Completed"
        },
        {
            "_id": 2,
            "customerId": "C002",
            "items": 1,
            "amount": 300,
            "status": "Completed"
        },
        {
            "_id": 3,
            "customerId": "C001",
            "items": 4,
            "amount": 1200,
            "status": "Pending"
        },
        {
            "_id": 4,
            "customerId": "C003",
            "items": 5,
            "amount": 1500,
            "status": "Completed"
        },
        {
            "_id": 5,
            "customerId": "C002",
            "items": 3,
            "amount": 900,
            "status": "Pending"
        },
        {
            "_id": 6,
            "customerId": "C001",
            "items": 1,
            "amount": 400,
            "status": "Completed"
        }
    ])

  • Step 2: MongoDB Aggregation Functions
  • MongoDB provides several aggregation functions that help in processing data in various ways. Below, we'll go through common aggregation functions such as `$sum`, `$avg`, `$min`, `$max`, `$count`, `$first`, and `$last`, using the above dataset.
  • $group (Group by Fields): The $group operator groups documents by a specified identifier field and allows for calculations like sum, average, etc.
  • Example: Group orders by status and calculate the total amount for each status.

    db.orders.aggregate([
        { $group: { _id: "$status", totalAmount: { $sum: "$amount" } } }
    ])

  • Explanation: This query groups the orders by status and calculates the total sum of the amount for each group.

    [
        { _id: 'Completed', totalAmount: 2700 },
        { _id: 'Pending', totalAmount: 2100 }
    ]

  • $sum (Summation): The `$sum` function calculates the total sum of a field across documents or grouped documents.
  • Example: Calculate the total amount spent by each customer.

    db.orders.aggregate([
        { $group: { _id: "$customerId", totalAmount: { $sum: "$amount" } } }
    ])

  • Explanation: This query groups the documents by `customerId` and calculates the total sum of the `amount` field for each customer.
  • Output:

    [
        { "_id": "C001", "totalAmount": 2100 },
        { "_id": "C002", "totalAmount": 1200 },
        { "_id": "C003", "totalAmount": 1500 }
    ]

  • $avg (Average): The `$avg` function computes the average of the values across documents or grouped documents.
  • Example: Find the average number of items ordered by each customer.

    db.orders.aggregate([
        { $group: { _id: "$customerId", avgItems: { $avg: "$items" } } }
    ])

  • Explanation: This query calculates the average number of items ordered by each customer by grouping the orders by `customerId`.
  • Output:

    [
        { "_id": "C001", "avgItems": 2.3333333333333335 },
        { "_id": "C002", "avgItems": 2 },
        { "_id": "C003", "avgItems": 5 }
    ]

  • $min (Minimum): The `$min` function returns the minimum value from the documents in the group.
  • Example: Find the minimum amount spent by each customer.

    db.orders.aggregate([
        { $group: { _id: "$customerId", minAmount: { $min: "$amount" } } }
    ])

  • Explanation: This query finds the minimum amount spent by each customer by grouping the documents by `customerId`.
  • Output:

    [
        { "_id": "C001", "minAmount": 400 },
        { "_id": "C002", "minAmount": 300 },
        { "_id": "C003", "minAmount": 1500 }
    ]

  • $max (Maximum): The `$max` function returns the maximum value from the documents in the group.
  • Example: Find the maximum amount spent by each customer.

    db.orders.aggregate([
        { $group: { _id: "$customerId", maxAmount: { $max: "$amount" } } }
    ])

  • Explanation: This query finds the maximum amount spent by each customer by grouping the documents by `customerId`.
  • Output:

    [
        { "_id": "C001", "maxAmount": 1200 },
        { "_id": "C002", "maxAmount": 900 },
        { "_id": "C003", "maxAmount": 1500 }
    ]

  • $count (Document Count): The `$count` operator counts the number of documents in a group.
  • Example: Count how many orders each customer has placed.

    db.orders.aggregate([
        { $group: { _id: "$customerId", orderCount: { $count: {} } } }
    ])

  • Explanation: This query counts the number of orders placed by each customer by grouping the documents by `customerId`.
  • Output:

    [
        { "_id": "C001", "orderCount": 3 },
        { "_id": "C002", "orderCount": 2 },
        { "_id": "C003", "orderCount": 1 }
    ]

  • $first (First Document): The `$first` function returns the first document in each group based on the sort order.
  • Example: Get the first order placed by each customer.

    db.orders.aggregate([
        // Sort by _id to ensure the first document is returned based on _id
        { $sort: { _id: 1 } },
        { $group: { _id: "$customerId", firstOrder: { $first: "$_id" } } }
    ])

  • Explanation: This query groups the documents by `customerId` and returns the first order `_id` for each customer.
  • Output:

    [
        { "_id": "C001", "firstOrder": 1 },
        { "_id": "C002", "firstOrder": 2 },
        { "_id": "C003", "firstOrder": 4 }
    ]

  • $last (Last Document): The `$last` function returns the last document in each group based on the sort order.
  • Example: Get the last order placed by each customer.

    db.orders.aggregate([
        // Sort by _id to ensure the last document is returned based on _id
        { $sort: { _id: 1 } },
        { $group: { _id: "$customerId", lastOrder: { $last: "$_id" } } }
    ])

  • Explanation: This query groups the documents by `customerId` and returns the last order `_id` for each customer.
  • Output:

    [
        { "_id": "C001", "lastOrder": 6 },
        { "_id": "C002", "lastOrder": 5 },
        { "_id": "C003", "lastOrder": 4 }
    ]

  • $push: The $push operator appends a value to an array of values in a group. It is often used when you want to collect all values of a particular field into an array within a grouped result.
  • Example: Collect all order amounts for each customer.

    db.orders.aggregate([
        { $group: { _id: "$customerId", allAmounts: { $push: "$amount" } } }
    ])

  • Explanation: This groups the documents by customerId and returns an array of all amount values for each customer.
  • Output:

    [
        { "_id": "C001", "allAmounts": [500, 1200, 400] },
        { "_id": "C002", "allAmounts": [300, 900] },
        { "_id": "C003", "allAmounts": [1500] }
    ]

  • $addToSet: The $addToSet operator adds a value to an array only if it doesn’t already exist in the array. It ensures there are no duplicate values in the result array.
  • Example: Collect unique order statuses for each customer.

    db.orders.aggregate([
        { $group: { _id: "$customerId", uniqueStatuses: { $addToSet: "$status" } } }
    ])

  • Explanation: This groups the documents by customerId and collects unique status values for each customer.
  • Output:

    [
        { "_id": "C001", "uniqueStatuses": ["Completed", "Pending"] },
        { "_id": "C002", "uniqueStatuses": ["Completed", "Pending"] },
        { "_id": "C003", "uniqueStatuses": ["Completed"] }
    ]

  • $unwind: The $unwind operator deconstructs an array field from the input documents and outputs one document for each element of the array.
  • Example: Deconstruct orders with multiple items.
  • Assume the orders collection has an items array field:

    {
        "_id": 1,
        "customerId": "C001",
        "items": [
            {
                "product": "A",
                "quantity": 2
            },
            {
                "product": "B",
                "quantity": 1
            }
        ],
        "amount": 500
    }

  • Now, deconstruct this array using $unwind:

    db.orders.aggregate([
        { $unwind: "$items" }
    ])

  • Explanation: This query deconstructs the items array, producing a document for each item.
  • Output:

    [
        {
            "_id": 1,
            "customerId": "C001",
            "items": {
                "product": "A",
                "quantity": 2
            },
            "amount": 500
        },
        {
            "_id": 1,
            "customerId": "C001",
            "items": {
                "product": "B",
                "quantity": 1
            },
            "amount": 500
        }
    ]

  • $sort: The $sort operator sorts the documents based on the specified fields in ascending (1) or descending (-1) order.
  • Example: Sort orders by amount in descending order.

    db.orders.aggregate([
        { $sort: { amount: -1 } }
    ])

  • Explanation: This query sorts the orders by the amount field in descending order.
  • Output:

    [
        {
            "_id": 4,
            "customerId": "C003",
            "items": 5,
            "amount": 1500,
            "status": "Completed"
        },
        {
            "_id": 3,
            "customerId": "C001",
            "items": 4,
            "amount": 1200,
            "status": "Pending"
        },
        {
            "_id": 5,
            "customerId": "C002",
            "items": 3,
            "amount": 900,
            "status": "Pending"
        },
        {
            "_id": 1,
            "customerId": "C001",
            "items": 2,
            "amount": 500,
            "status": "Completed"
        },
        {
            "_id": 6,
            "customerId": "C001",
            "items": 1,
            "amount": 400,
            "status": "Completed"
        },
        {
            "_id": 2,
            "customerId": "C002",
            "items": 1,
            "amount": 300,
            "status": "Completed"
        }
    ]

  • $project: The $project operator is used to include, exclude, or reshape fields in the output documents.
  • Example: Project only the customerId and amount fields.

    db.orders.aggregate([
        { $project: { customerId: 1, amount: 1, _id: 0 } }
    ])

  • Explanation: This query projects only the customerId and amount fields, excluding the _id field from the output.
  • Output:

    [
        { "customerId": "C001", "amount": 500 },
        { "customerId": "C002", "amount": 300 },
        { "customerId": "C001", "amount": 1200 },
        { "customerId": "C003", "amount": 1500 },
        { "customerId": "C002", "amount": 900 },
        { "customerId": "C001", "amount": 400 }
    ]

  • $switch (Multiple Conditional Expressions): The $switch operator allows for multiple conditional branches (similar to a switch-case statement).
  • Example: Classify orders based on the amount.

    db.orders.aggregate([
        {
            $project: {
                amountClassification: {
                    $switch: {
                        branches: [{
                            case: {
                                $lt: ["$amount", 500]
                            }, then: "Small"
                        }, {
                            case: {
                                $and: [{
                                    $gte: ["$amount", 500]
                                }, {
                                    $lt: ["$amount", 1000]
                                }]
                            }, then: "Medium"
                        }, {
                            case: {
                                $gte: ["$amount", 1000]
                            }, then: "Large"
                        }],
                        default: "Unknown"
                    }
                }
            }
        }
    ])

  • Explanation: This query classifies each order based on its amount into "Small", "Medium", or "Large".
  • Output:

    [
        { _id: 1, amountClassification: 'Medium' },
        { _id: 2, amountClassification: 'Small' },
        { _id: 3, amountClassification: 'Large' },
        { _id: 4, amountClassification: 'Large' },
        { _id: 5, amountClassification: 'Medium' },
        { _id: 6, amountClassification: 'Small' }
    ]

  • '$toUpper' and '$toLower' (String Case Conversion)
    • $toUpper converts a string to uppercase.
    • $toLower converts a string to lowercase.
  • Example: Convert the status field to uppercase.

    db.orders.aggregate([
        { $project: { upperCaseStatus: { $toUpper: "$status" } } }
    ])

  • Explanation: This query converts the status field to uppercase.

  [
        { _id: 1, upperCaseStatus: 'COMPLETED' },
        { _id: 2, upperCaseStatus: 'COMPLETED' },
        { _id: 3, upperCaseStatus: 'PENDING' },
        { _id: 4, upperCaseStatus: 'COMPLETED' },
        { _id: 5, upperCaseStatus: 'PENDING' },
        { _id: 6, upperCaseStatus: 'COMPLETED' }
    ]




Aggregation framework in MongoDB

  • The aggregation framework in MongoDB is a powerful tool for processing data and transforming it into a desired result. It allows you to perform complex data manipulations such as filtering, grouping, sorting, reshaping, and calculating aggregate values directly within the database.
  • The aggregation framework works by creating a pipeline of operations that documents pass through. Each stage in the pipeline transforms the documents in some way, either by filtering them, modifying them, or aggregating values. The documents then proceed to the next stage, where further transformations can be applied.
Key Concepts of MongoDB Aggregation Framework
  • Aggregation Pipeline:
    • A sequence of stages that are processed in order.
    • Each stage transforms the documents that pass through it.
    • The output of one stage is the input to the next.
  • Stages:
    • Each stage in the pipeline performs a specific operation on the data (e.g., filtering, grouping, sorting).
    • Some common stages include $match, $group, $project, $sort, and $limit.
  • Operators:
    • Operators define the transformations or operations applied in each stage.
    • Examples include $sum, $avg, $min, $max, $addToSet, $push, etc.
Aggregation Pipeline Stages
  • $match:
    • Filters documents based on the given criteria, similar to a find() query.
    • It is usually the first stage to limit the number of documents that need further processing.
  • Example:

    db.orders.aggregate([
        { $match: { status: "shipped" } }
    ])

  • This query filters documents in the orders collection where status is "shipped".
  • $group:
    • Groups documents by a specified field and applies aggregate functions (e.g., sum, average) to each group.
    • It is similar to the GROUP BY clause in SQL.
  • Example:

    db.orders.aggregate([
        { $group: { _id: "$customerId", totalAmount: { $sum: "$amount" } } }
    ])

  • This groups the documents by customerId and calculates the total amount for each customer by summing the amount field.
  • $project:
    • Reshapes documents by including or excluding fields.
    • You can also create new computed fields in this stage.
  • Example:

    db.orders.aggregate([
        {
            $project: {
                customerId: 1,
                orderDate: 1,
                totalAmount: {
                    $multiply: ["$quantity", "$price"]
                }
            }
        }
    ])

  • This projects the customerId, orderDate, and creates a new totalAmount field by multiplying quantity and price.
  • $sort:
    • Sorts the documents in the pipeline by the specified field in either ascending (1) or descending (-1) order.
  • Example:

    db.orders.aggregate([
        { $sort: { orderDate: -1 } }
    ])

  • This sorts the orders by orderDate in descending order.
  • $limit:
    • Limits the number of documents passing through the pipeline.
  • Example:

    db.orders.aggregate([
        { $limit: 5 }
    ])

  • This limits the results to the first 5 documents.
  • $skip:
    • Skips a specified number of documents before passing the remaining documents to the next stage.
  • Example:

    db.orders.aggregate([
        { $skip: 10 }
    ])

  • This skips the first 10 documents.
  • $unwind:
    • Deconstructs an array field in each document into multiple documents, each containing a single element of the array.
  • Example:

    db.orders.aggregate([
        { $unwind: "$items" }
    ])

  • If the items field is an array, this operation will create a separate document for each item in the array.
  • $lookup:
    • Performs a left outer join with another collection.
  • Example:

    db.orders.aggregate([
        {
            $lookup: {
                from: "customers",
                localField: "customerId",
                foreignField: "_id",
                as: "customerDetails"
            }
        }
    ])

  • This joins the orders collection with the customers collection, matching the customerId field with the _id field in the customers collection.
  • $addFields:
    • Adds new fields to documents or modifies existing fields.
  • Example:

    db.orders.aggregate([
        { $addFields: { totalCost: { $multiply: ["$price", "$quantity"] } } }
    ])

  • This creates a new field totalCost by multiplying the price and quantity fields.
  • $out:
    • Writes the resulting documents to a new collection.
  • Example:

    db.orders.aggregate([
        { $match: { status: "shipped" } },
        { $out: "shippedOrders" }
    ])

  • This writes all documents with a status of "shipped" to a new collection called shippedOrders.
Aggregation Operators
  • Mathematical Operators:
    • $sum: Adds values together.
    • $avg: Calculates the average.
    • $min/$max: Returns the minimum or maximum value.
  • Example:

    db.orders.aggregate([
        { $group: { _id: null, totalAmount: { $sum: "$amount" }, avgAmount: { $avg: "$amount" } } }
    ])

  • Array Operators:
    • $push: Adds elements to an array.
    • $addToSet: Adds unique elements to an array.
  • Example:

    db.orders.aggregate([
        { $group: { _id: "$customerId", orders: { $push: "$_id" } } }
    ])

  • String Operators:
    • $concat: Concatenates strings.
    • $substr: Extracts substrings.
  • Example:

    db.users.aggregate([
        { $project: { fullName: { $concat: ["$firstName", " ", "$lastName"] } } }
    ])

  • Conditional Operators:
    • $cond: Conditional logic (like an if-else).
    • $ifNull: Returns a default value if the field is null.
  • Example:

    db.orders.aggregate([
        {
            $project: {
                status: {
                    $cond: {
                        if: {
                            $gt: ["$amount", 100]
                        },
                        then: "High",
                        else: "Low"
                    }
                }
            }
        }
    ])


Example Aggregation Pipeline
  • Let's say you want to find the total sales per customer for orders that were placed in 2023, and you want the results sorted by total sales.

    db.orders.aggregate([
        {
            $match: {
                orderDate: {
                    $gte: ISODate("2023-01-01"),
                    $lte: ISODate("2023-12-31")
                }
            }
        },
        {
            $group: {
                _id: "$customerId",
                totalSales: {
                    $sum: "$amount"
                }
            }
        },
        {
            $sort: {
                totalSales: -1
            }
        }
    ])

  • $match: Filters the documents to only include orders placed in 2023.
  • $group: Groups the documents by customerId and calculates the total sales for each customer.
  • $sort: Sorts the results by totalSales in descending order.
Benefits of Aggregation Framework
  • Efficiency: Aggregation operations are performed on the database server, reducing the amount of data that needs to be transferred to the client.
  • Flexibility: The aggregation framework provides a wide variety of stages and operators, allowing for complex data transformations and calculations.
  • Powerful Data Processing: It can perform tasks like grouping, filtering, joining, and reshaping data, eliminating the need to perform these operations in the application code.
Limitations of Aggregation Framework
  • Complexity: For very large datasets or highly complex queries, aggregation pipelines can become complex to design and debug.
  • Memory Usage: Aggregation operations that require large amounts of data to be held in memory can be resource-intensive.
  • Sharding: Aggregation operations can be more challenging to optimize when working with sharded collections.
Conclusion
  • The MongoDB Aggregation Framework is a robust and flexible tool for data analysis and transformation. It allows you to create powerful pipelines that can process and manipulate data in various ways, all within the database. By using different stages like $match, $group, $project, and operators, you can build complex queries that provide meaningful insights from your data.

Phase 3 — Components Deep Dive

Chapter 1 — What We Are Going to Learn and Why In Phase 2 you learned what a component is and how to create one. You know that a component h...