MongoDB Aggregation Functions with Examples
- Step 1: Insert a Dataset into MongoDB Collection
- First, we'll insert a sample dataset into a MongoDB collection named `orders`. Here’s how you can insert the data using the MongoDB shell or a Node.js script.
- Insert the Dataset:
    db.orders.insertMany([
        {
            "_id": 1,
            "customerId": "C001",
            "items": 2,
            "amount": 500,
            "status": "Completed"
        },
        {
            "_id": 2,
            "customerId": "C002",
            "items": 1,
            "amount": 300,
            "status": "Completed"
        },
        {
            "_id": 3,
            "customerId": "C001",
            "items": 4,
            "amount": 1200,
            "status": "Pending"
        },
        {
            "_id": 4,
            "customerId": "C003",
            "items": 5,
            "amount": 1500,
            "status": "Completed"
        },
        {
            "_id": 5,
            "customerId": "C002",
            "items": 3,
            "amount": 900,
            "status": "Pending"
        },
        {
            "_id": 6,
            "customerId": "C001",
            "items": 1,
            "amount": 400,
            "status": "Completed"
        }
    ])
- Step 2: MongoDB Aggregation Functions
- MongoDB provides several aggregation functions that help in processing data in various ways. Below, we'll go through common aggregation functions such as `$sum`, `$avg`, `$min`, `$max`, `$count`, `$first`, and `$last`, using the above dataset.
- $group (Group by Fields): The $group operator groups documents by a specified identifier field and allows for calculations like sum, average, etc.
- Example: Group orders by status and calculate the total amount for each status.
    db.orders.aggregate([
        { $group: { _id: "$status", totalAmount: { $sum: "$amount" } } }
    ])
- Explanation: This query groups the orders by status and calculates the total sum of the amount for each group.
    [
        { _id: 'Completed', totalAmount: 2700 },
        { _id: 'Pending', totalAmount: 2100 }
    ]
- $sum (Summation): The `$sum` function calculates the total sum of a field across documents or grouped documents.
- Example: Calculate the total amount spent by each customer.
    db.orders.aggregate([
        { $group: { _id: "$customerId", totalAmount: { $sum: "$amount" } } }
    ])
- Explanation: This query groups the documents by `customerId` and calculates the total sum of the `amount` field for each customer.
- Output:
    [
        { "_id": "C001", "totalAmount": 2100 },
        { "_id": "C002", "totalAmount": 1200 },
        { "_id": "C003", "totalAmount": 1500 }
    ]
- $avg (Average): The `$avg` function computes the average of the values across documents or grouped documents.
- Example: Find the average number of items ordered by each customer.
    db.orders.aggregate([
        { $group: { _id: "$customerId", avgItems: { $avg: "$items" } } }
    ])
- Explanation: This query calculates the average number of items ordered by each customer by grouping the orders by `customerId`.
- Output:
    [
        { "_id": "C001", "avgItems": 2.3333333333333335 },
        { "_id": "C002", "avgItems": 2 },
        { "_id": "C003", "avgItems": 5 }
    ]
- $min (Minimum): The `$min` function returns the minimum value from the documents in the group.
- Example: Find the minimum amount spent by each customer.
    db.orders.aggregate([
        { $group: { _id: "$customerId", minAmount: { $min: "$amount" } } }
    ])
- Explanation: This query finds the minimum amount spent by each customer by grouping the documents by `customerId`.
- Output:
    [
        { "_id": "C001", "minAmount": 400 },
        { "_id": "C002", "minAmount": 300 },
        { "_id": "C003", "minAmount": 1500 }
    ]
- $max (Maximum): The `$max` function returns the maximum value from the documents in the group.
- Example: Find the maximum amount spent by each customer.
    db.orders.aggregate([
        { $group: { _id: "$customerId", maxAmount: { $max: "$amount" } } }
    ])
- Explanation: This query finds the maximum amount spent by each customer by grouping the documents by `customerId`.
- Output:
    [
        { "_id": "C001", "maxAmount": 1200 },
        { "_id": "C002", "maxAmount": 900 },
        { "_id": "C003", "maxAmount": 1500 }
    ]
- $count (Document Count): The `$count` operator counts the number of documents in a group.
- Example: Count how many orders each customer has placed.
    db.orders.aggregate([
        { $group: { _id: "$customerId", orderCount: { $count: {} } } }
    ])
- Explanation: This query counts the number of orders placed by each customer by grouping the documents by `customerId`.
- Output:
    [
        { "_id": "C001", "orderCount": 3 },
        { "_id": "C002", "orderCount": 2 },
        { "_id": "C003", "orderCount": 1 }
    ]
- $first (First Document): The `$first` function returns the first document in each group based on the sort order.
- Example: Get the first order placed by each customer.
    db.orders.aggregate([
        // Sort by _id to ensure the first document is returned based on _id
        { $sort: { _id: 1 } }, 
        { $group: { _id: "$customerId", firstOrder: { $first: "$_id" } } }
    ])
- Explanation: This query groups the documents by `customerId` and returns the first order `_id` for each customer.
- Output:
    [
        { "_id": "C001", "firstOrder": 1 },
        { "_id": "C002", "firstOrder": 2 },
        { "_id": "C003", "firstOrder": 4 }
    ]
- $last (Last Document): The `$last` function returns the last document in each group based on the sort order.
- Example: Get the last order placed by each customer.
    db.orders.aggregate([
        // Sort by _id to ensure the last document is returned based on _id
        { $sort: { _id: 1 } }, 
        { $group: { _id: "$customerId", lastOrder: { $last: "$_id" } } }
    ])
- Explanation: This query groups the documents by `customerId` and returns the last order `_id` for each customer.
- Output:
    [
        { "_id": "C001", "lastOrder": 6 },
        { "_id": "C002", "lastOrder": 5 },
        { "_id": "C003", "lastOrder": 4 }
    ]
- $push: The $push operator appends a value to an array of values in a group. It is often used when you want to collect all values of a particular field into an array within a grouped result.
- Example: Collect all order amounts for each customer.
    db.orders.aggregate([
        { $group: { _id: "$customerId", allAmounts: { $push: "$amount" } } }
    ])
- Explanation: This groups the documents by customerId and returns an array of all amount values for each customer.
- Output:
    [
        { "_id": "C001", "allAmounts": [500, 1200, 400] },
        { "_id": "C002", "allAmounts": [300, 900] },
        { "_id": "C003", "allAmounts": [1500] }
    ]
- $addToSet: The $addToSet operator adds a value to an array only if it doesn’t already exist in the array. It ensures there are no duplicate values in the result array.
- Example: Collect unique order statuses for each customer.
    db.orders.aggregate([
        { $group: { _id: "$customerId", uniqueStatuses: { $addToSet: "$status" } } }
    ])
- Explanation: This groups the documents by customerId and collects unique status values for each customer.
- Output:
    [
        { "_id": "C001", "uniqueStatuses": ["Completed", "Pending"] },
        { "_id": "C002", "uniqueStatuses": ["Completed", "Pending"] },
        { "_id": "C003", "uniqueStatuses": ["Completed"] }
    ]
- $unwind: The $unwind operator deconstructs an array field from the input documents and outputs one document for each element of the array.
- Example: Deconstruct orders with multiple items.
- Assume the orders collection has an items array field:
    {
        "_id": 1,
        "customerId": "C001",
        "items": [
            {
                "product": "A",
                "quantity": 2
            },
            {
                "product": "B",
                "quantity": 1
            }
        ],
        "amount": 500
    }
- Now, deconstruct this array using $unwind:
    db.orders.aggregate([
        { $unwind: "$items" }
    ])
- Explanation: This query deconstructs the items array, producing a document for each item.
- Output:
    [
        {
            "_id": 1,
            "customerId": "C001",
            "items": {
                "product": "A",
                "quantity": 2
            },
            "amount": 500
        },
        {
            "_id": 1,
            "customerId": "C001",
            "items": {
                "product": "B",
                "quantity": 1
            },
            "amount": 500
        }
    ]
- $sort: The $sort operator sorts the documents based on the specified fields in ascending (1) or descending (-1) order.
- Example: Sort orders by amount in descending order.
    db.orders.aggregate([
        { $sort: { amount: -1 } }
    ])
- Explanation: This query sorts the orders by the amount field in descending order.
- Output:
    [
        {
            "_id": 4,
            "customerId": "C003",
            "items": 5,
            "amount": 1500,
            "status": "Completed"
        },
        {
            "_id": 3,
            "customerId": "C001",
            "items": 4,
            "amount": 1200,
            "status": "Pending"
        },
        {
            "_id": 5,
            "customerId": "C002",
            "items": 3,
            "amount": 900,
            "status": "Pending"
        },
        {
            "_id": 1,
            "customerId": "C001",
            "items": 2,
            "amount": 500,
            "status": "Completed"
        },
        {
            "_id": 6,
            "customerId": "C001",
            "items": 1,
            "amount": 400,
            "status": "Completed"
        },
        {
            "_id": 2,
            "customerId": "C002",
            "items": 1,
            "amount": 300,
            "status": "Completed"
        }
    ]
- $project: The $project operator is used to include, exclude, or reshape fields in the output documents.
- Example: Project only the customerId and amount fields.
    db.orders.aggregate([
        { $project: { customerId: 1, amount: 1, _id: 0 } }
    ])
- Explanation: This query projects only the customerId and amount fields, excluding the _id field from the output.
- Output:
    [
        { "customerId": "C001", "amount": 500 },
        { "customerId": "C002", "amount": 300 },
        { "customerId": "C001", "amount": 1200 },
        { "customerId": "C003", "amount": 1500 },
        { "customerId": "C002", "amount": 900 },
        { "customerId": "C001", "amount": 400 }
    ]
- $switch (Multiple Conditional Expressions): The $switch operator allows for multiple conditional branches (similar to a switch-case statement).
- Example: Classify orders based on the amount.
    db.orders.aggregate([
        {
            $project: {
                amountClassification: {
                    $switch: {
                        branches: [{
                            case: {
                                $lt: ["$amount", 500]
                            }, then: "Small"
                        }, {
                            case: {
                                $and: [{
                                    $gte: ["$amount", 500]
                                }, {
                                    $lt: ["$amount", 1000]
                                }]
                            }, then: "Medium"
                        }, {
                            case: {
                                $gte: ["$amount", 1000]
                            }, then: "Large"
                        }],
                        default: "Unknown"
                    }
                }
            }
        }
    ])
- Explanation: This query classifies each order based on its amount into "Small", "Medium", or "Large".
- Output:
    [
        { _id: 1, amountClassification: 'Medium' },
        { _id: 2, amountClassification: 'Small' },
        { _id: 3, amountClassification: 'Large' },
        { _id: 4, amountClassification: 'Large' },
        { _id: 5, amountClassification: 'Medium' },
        { _id: 6, amountClassification: 'Small' }
    ]
- '$toUpper' and '$toLower' (String Case Conversion)
- $toUpper converts a string to uppercase.
- $toLower converts a string to lowercase.
- Example: Convert the status field to uppercase.
    db.orders.aggregate([
        { $project: { upperCaseStatus: { $toUpper: "$status" } } }
    ])
- Explanation: This query converts the status field to uppercase.
    [
        { _id: 1, upperCaseStatus: 'COMPLETED' },
        { _id: 2, upperCaseStatus: 'COMPLETED' },
        { _id: 3, upperCaseStatus: 'PENDING' },
        { _id: 4, upperCaseStatus: 'COMPLETED' },
        { _id: 5, upperCaseStatus: 'PENDING' },
        { _id: 6, upperCaseStatus: 'COMPLETED' }
    ]
No comments:
Post a Comment