In MongoDB, a one-to-many relationship can be modeled in two main ways:
- Embedding: Embed many related documents in an array within the parent document.
- Referencing: Store related documents in a separate collection and reference them via an identifier.
- I'll provide an example using both approaches in the MongoDB shell.
- Scenario: Modeling Authors and Books
- We will model an author who writes multiple books, which is a classic one-to-many relationship.
1. Embedding Approach (One-to-Many)
- In this approach, the books are embedded directly inside the author document as an array.
- Step 1: Inserting Data (Embedding)
  use libraryDB  # Switch to or create the database
  db.authors.insertOne({
    _id: 1,
    name: "George Orwell",
    age: 46,
    books: [
      {
        title: "1984",
        genre: "Dystopian",
        published_year: 1949
      },
      {
        title: "Animal Farm",
        genre: "Political Satire",
        published_year: 1945
      }
    ]
  })
- In this case, the books field is an array, and each book is stored as an embedded document within the author document.
- Step 2: Querying Data (Embedding)
- To retrieve an author along with their books, you can simply query the authors collection:
  db.authors.findOne({ _id: 1 })
  // Output:
  {
    "_id": 1,
    "name": "George Orwell",
    "age": 46,
    "books": [
      {
        "title": "1984",
        "genre": "Dystopian",
        "published_year": 1949
      },
      {
        "title": "Animal Farm",
        "genre": "Political Satire",
        "published_year": 1945
      }
    ]
  }
Explanation of Embedding:
Advantages:
- Simpler data retrieval: Since the books are stored directly within the author document, you don’t need to perform additional queries.
- Single query for updates: You can update the entire author and their books in one go.
Disadvantages:
- Document size limit: MongoDB has a 16 MB document size limit. If an author writes too many books, the document can grow large.
- Data duplication: If books are referenced by other entities (e.g., publishers), duplication can occur.
2. Referencing Approach (One-to-Many)
- In this approach, the books are stored in a separate collection, and the author document references the book_ids in an array. This approach avoids embedding large amounts of data inside a single document.
- Step 1: Inserting Data (Referencing)
- Insert data into the books collection:
    db.books.insertMany([
      {
        _id: 101,
        title: "1984",
        genre: "Dystopian",
        published_year: 1949
      },
      {
        _id: 102,
        title: "Animal Farm",
        genre: "Political Satire",
        published_year: 1945
      }
    ])
- Insert data into the authors collection with references to book_ids:
    db.authors.insertOne({
      _id: 1,
      name: "George Orwell",
      age: 46,
      book_ids: [101, 102]  # Array of references to the books
    })
- Step 2: Querying Data (Referencing)
- To get an author and their books, you need to:
- 1. Query the authors collection to get the book_ids.
- 2. Use the book_ids to query the books collection.
- Step 2.1: Find the author:
    var author = db.authors.findOne({ _id: 1 })
- Step 2.2: Find the books using the book_ids:
    db.books.find({ _id: { $in: author.book_ids } })
    // Output (from the books collection):
    [
      {
        "_id": 101,
        "title": "1984",
        "genre": "Dystopian",
        "published_year": 1949
      },
      {
        "_id": 102,
        "title": "Animal Farm",
        "genre": "Political Satire",
        "published_year": 1945
      }
    ]
- Step 3: Using $lookup to Join Collections
- Alternatively, you can use the $lookup operator to join the authors and books collections in a single query:
    db.authors.aggregate([
      {
        $lookup: {
          from: "books",         // Collection to join with
          localField: "book_ids", // Field in the authors collection
          foreignField: "_id",    // Field in the books collection
          as: "books"             // Output array field name
        }
      }
    ])
    // Output:
    [
      {
        "_id": 1,
        "name": "George Orwell",
        "age": 46,
        "book_ids": [101, 102],
        "books": [
          {
            "_id": 101,
            "title": "1984",
            "genre": "Dystopian",
            "published_year": 1949
          },
          {
            "_id": 102,
            "title": "Animal Farm",
            "genre": "Political Satire",
            "published_year": 1945
          }
        ]
      }
    ]
Explanation of Referencing:
Advantages:
- Flexible and scalable: The books can grow in number without causing the author document to become too large.
- No duplication: Since the books are stored in a separate collection, other entities (like publishers or libraries) can reference them without duplicating data.
Disadvantages:
- More complex queries: You need to perform multiple queries or use $lookup to retrieve related data.
- Data consistency: It’s possible for an author to reference a book that doesn’t exist, which introduces potential data integrity issues unless you enforce checks at the application level.
Conclusion: When to Use Embedding vs. Referencing in One-to-Many Relationships
- Use Embedding when:
- The related data (e.g., books) is always accessed with the parent document (e.g., author).
- The size of the embedded data is small and won’t grow indefinitely.
- You want simplicity in your data model with fewer collections to manage.
- Use Referencing when:
- The related data (e.g., books) might be accessed independently of the parent document (e.g., author).
- The size of the related data is large or could grow over time.
- You want to share related data between different entities (e.g., a book is written by an author but also published by a publisher).
- You need to avoid hitting the document size limit (16 MB in MongoDB).
- By using either approach, you can effectively model one-to-many relationships based on the specific requirements of your application.
No comments:
Post a Comment