CodeWithGagan | Programming Language and IT Lectures: NumPy

What We're Covering Today

Copy vs View — one of the most important NumPy concepts
Fancy Indexing
np.where — conditional operations
Sorting
Combining Arrays
Real dataset simulation

These are the concepts that separate beginners from people who actually use NumPy in real projects.

Copy vs View — Most Important Concept

This is the number one source of bugs for NumPy beginners. Pay close attention.

The Problem


    import numpy as np

    original = np.array([1, 2, 3, 4, 5])

    # This looks like a copy but it is NOT
    slice_view = original[1:4]
    print(slice_view)    # [2 3 4]

    # Modify the slice
    slice_view[0] = 999
    print(slice_view)    # [999   3   4]

    # Original is also changed!
    print(original)      # [  1 999   3   4   5]

When you slice a NumPy array — you get a view — not a copy. Both variables point to the same data in memory. Changing one changes the other.

This is completely different from Python lists:


    # Python list — slicing gives a COPY
    py_list = [1, 2, 3, 4, 5]
    slice_copy = py_list[1:4]
    slice_copy[0] = 999

    print(py_list)     # [1, 2, 3, 4, 5]  — unchanged
    print(slice_copy)  # [999, 3, 4]

How to Check — View or Copy?


    arr = np.array([1, 2, 3, 4, 5])

    view = arr[1:4]
    copy = arr[1:4].copy()

    # .base attribute — None means it owns its data (copy)
    # Not None means it's a view of something else
    print(view.base is arr)    # True  — it's a view
    print(copy.base is arr)    # False — it's a copy

Always Use .copy() When You Need Independence


    original = np.array([10, 20, 30, 40, 50])

    # Safe way — proper copy
    safe_copy = original.copy()

    safe_copy[0] = 999
    print(safe_copy)    # [999  20  30  40  50]
    print(original)     # [10   20  30  40  50]  — unchanged

Rule: Whenever you slice and plan to modify — use .copy(). This will save you hours of debugging.

Fancy Indexing

Fancy indexing = using an array of indices to select elements.

1D Fancy Indexing


    arr = np.array([10, 20, 30, 40, 50, 60, 70])

    # Select specific indices
    indices = [0, 2, 5]
    print(arr[indices])    # [10 30 60]

    # Select in any order
    print(arr[[4, 1, 6, 0]])    # [50 20 70 10]

With regular slicing you can only select consecutive elements. Fancy indexing lets you pick any elements in any order.

2D Fancy Indexing


    import numpy as np

    matrix = np.array([
        [1,  2,  3,  4],
        [5,  6,  7,  8],
        [9, 10, 11, 12],
        [13, 14, 15, 16]
    ])


    # Select specific rows
    print(matrix[[0, 2]])
    # [[ 1  2  3  4]
    #  [ 9 10 11 12]]


    # Select specific rows AND columns
    row_indices = [0, 1, 2]
    col_indices = [0, 2, 3]
    print(matrix[row_indices, col_indices])
    # [1  7  12]  — matrix[0,0], matrix[1,2], matrix[2,3]

np.where — Conditional Operations

np.where is like an if/else but for entire arrays at once. You'll use this constantly.

Basic Usage


    # np.where(condition, value_if_true, value_if_false)

    marks = np.array([85, 42, 90, 38, 75, 55, 29, 91])

    # Replace values — pass/fail label
    result = np.where(marks >= 50, "Pass", "Fail")
    print(result)
    # ['Pass' 'Fail' 'Pass' 'Fail' 'Pass' 'Pass' 'Fail' 'Pass']

np.where with Numbers


    marks = np.array([85, 42, 90, 38, 75, 55, 29, 91])

    # Add 10 bonus marks to failing students
    adjusted = np.where(marks < 50, marks + 10, marks)
    print(adjusted)
    # [85 52 90 48 75 55 39 91]
    # 42→52, 38→48, 29→39 (got bonus), rest unchanged

np.where — Get Indices

When called with only condition — returns indices where condition is True:


    marks = np.array([85, 42, 90, 38, 75, 55, 29, 91])

    # Find indices of failing students
    fail_indices = np.where(marks < 50)
    print(fail_indices)         # (array([1, 3, 6]),)
    print(fail_indices[0])      # [1 3 6]  — indices 1, 3, 6 have marks below 50

    # Use indices to get the actual values
    print(marks[fail_indices])  # [42 38 29]

Nested np.where — Multiple Conditions


    marks = np.array([92, 78, 55, 42, 88, 61, 35, 95])

    grades = np.where(marks >= 90, "A",
            np.where(marks >= 75, "B",
            np.where(marks >= 60, "C",
            np.where(marks >= 50, "D", "F"))))

    print(grades)
    # ['A' 'B' 'D' 'F' 'B' 'C' 'F' 'A']

Nested np.where is the NumPy equivalent of if/elif/else chains.

Sorting


    arr = np.array([64, 34, 25, 12, 22, 11, 90])

    # Sort ascending
    print(np.sort(arr))       # [11 12 22 25 34 64 90]

    # Sort descending
    print(np.sort(arr)[::-1]) # [90 64 34 25 22 12 11]

    # argsort — returns INDICES that would sort the array
    indices = np.argsort(arr)
    print(indices)            # [5 3 4 2 1 0 6]
    print(arr[indices])       # [11 12 22 25 34 64 90]  — sorted

argsort is extremely useful when you need to sort one array based on another:


    students = np.array(["Rahul", "Priya", "Gagan", "Amit", "Neha"])
    marks    = np.array([78,       92,      65,       88,     71])

    # Sort students by their marks
    sorted_indices = np.argsort(marks)[::-1]    # descending
    print(students[sorted_indices])    # ['Priya' 'Amit' 'Rahul' 'Neha' 'Gagan']
    print(marks[sorted_indices])       # [92 88 78 71 65]

Top students ranked by marks — clean and easy.

Sorting 2D Arrays


    matrix = np.array([
        [3, 1, 4],
        [1, 5, 9],
        [2, 6, 5]
    ])

    # Sort each row
    print(np.sort(matrix, axis=1))
    # [[1 3 4]
    #  [1 5 9]
    #  [2 5 6]]

    # Sort each column
    print(np.sort(matrix, axis=0))
    # [[1 1 4]
    #  [2 5 5]
    #  [3 6 9]]

Combining Arrays

np.concatenate — Join Arrays


    a = np.array([1, 2, 3])
    b = np.array([4, 5, 6])
    c = np.array([7, 8, 9])

    # Join 1D arrays
    print(np.concatenate([a, b]))        # [1 2 3 4 5 6]
    print(np.concatenate([a, b, c]))     # [1 2 3 4 5 6 7 8 9]

    # Join 2D arrays
    m1 = np.array([[1, 2], [3, 4]])
    m2 = np.array([[5, 6], [7, 8]])

    # Stack vertically (add rows)
    print(np.concatenate([m1, m2], axis=0))
    # [[1 2]
    #  [3 4]
    #  [5 6]
    #  [7 8]]

    # Stack horizontally (add columns)
    print(np.concatenate([m1, m2], axis=1))
    # [[1 2 5 6]
    #  [3 4 7 8]]

np.vstack and np.hstack — Easier Syntax


    a = np.array([1, 2, 3])
    b = np.array([4, 5, 6])

    # vstack — vertical stack (adds rows)
    print(np.vstack([a, b]))
    # [[1 2 3]
    #  [4 5 6]]

    # hstack — horizontal stack (adds columns)
    print(np.hstack([a, b]))
    # [1 2 3 4 5 6]

    m1 = np.ones((3, 2))
    m2 = np.zeros((3, 2))

    print(np.hstack([m1, m2]))
    # [[1. 1. 0. 0.]
    #  [1. 1. 0. 0.]
    #  [1. 1. 0. 0.]]

    print(np.vstack([m1, m2]))
    # [[1. 1.]
    #  [1. 1.]
    #  [1. 1.]
    #  [0. 0.]
    #  [0. 0.]
    #  [0. 0.]]

Unique Values and Counts


    arr = np.array([1, 2, 2, 3, 3, 3, 4, 4, 4, 4])

    # Unique values
    print(np.unique(arr))
    # [1 2 3 4]

    # Unique values with their counts
    values, counts = np.unique(arr, return_counts=True)
    print(values)    # [1 2 3 4]
    print(counts)    # [1 2 3 4]

    for val, count in zip(values, counts):
        print(f"Value {val} appears {count} times")

Output:

Value 1 appears 1 times
Value 2 appears 2 times
Value 3 appears 3 times
Value 4 appears 4 times

Real use case — finding most common category in a dataset.

np.clip — Limit Values to a Range

    arr = np.array([2, 8, 15, -3, 22, 5, -10, 18])

    # Clip values between 0 and 10
    clipped = np.clip(arr, 0, 10)
    print(clipped)    # [ 2  8 10  0 10  5  0 10]

Values below 0 become 0. Values above 10 become 10. Values in range stay unchanged.

Real use case — clamping pixel values between 0-255, clamping scores between 0-100.

np.percentile — Finding Percentiles

    data = np.array([23, 45, 12, 67, 34, 89, 56, 78, 43, 21])

    print(np.percentile(data, 25))   # 25th percentile (Q1) = 23.75
    print(np.percentile(data, 50))   # 50th percentile = 44.0 (same as median)
    print(np.percentile(data, 75))   # 75th percentile (Q3) = 63.25
    print(np.percentile(data, 90))   # 90th percentile = 79.9

Output:

25.75
44.0
64.25
79.1

Percentiles are used heavily in data analysis — finding outliers, understanding data distribution.

Real World Example — Sales Data Analysis

Let's simulate and analyze a real business dataset:

import numpy as np

np.random.seed(42)

# Simulate 1 year of daily sales for 3 products
# 365 days, 3 products
days = 365
products = 3
product_names = ["Laptop", "Phone", "Tablet"]

# Generate realistic sales numbers
sales = np.random.randint(5, 50, size=(days, products))

# Add seasonal pattern — higher in Nov/Dec (days 300-365)
sales[300:] = sales[300:] * 2

print("=== Sales Dataset Shape ===")
print(f"Shape: {sales.shape}")    # (365, 3)
print(f"Total records: {sales.size}")

print("\n=== Basic Statistics ===")
for i, product in enumerate(product_names):
    product_sales = sales[:, i]
    print(f"\n{product}:")
    print(f"  Total annual sales : {np.sum(product_sales)}")
    print(f"  Daily average      : {np.mean(product_sales):.1f}")
    print(f"  Best day           : {np.max(product_sales)}")
    print(f"  Worst day          : {np.min(product_sales)}")
    print(f"  Std deviation      : {np.std(product_sales):.1f}")

print("\n=== Monthly Analysis ===")
month_names = ["Jan","Feb","Mar","Apr","May","Jun",
               "Jul","Aug","Sep","Oct","Nov","Dec"]
days_per_month = [31,28,31,30,31,30,31,31,30,31,30,31]

start = 0
monthly_totals = []
for i, days_in_month in enumerate(days_per_month):
    end = start + days_in_month
    month_sales = np.sum(sales[start:end])
    monthly_totals.append(month_sales)
    start = end

monthly_totals = np.array(monthly_totals)
best_month_idx = np.argmax(monthly_totals)
worst_month_idx = np.argmin(monthly_totals)

print(f"Best month  : {month_names[best_month_idx]} ({monthly_totals[best_month_idx]} units)")
print(f"Worst month : {month_names[worst_month_idx]} ({monthly_totals[worst_month_idx]} units)")

print("\n=== Product Rankings ===")
annual_totals = np.sum(sales, axis=0)
ranked_indices = np.argsort(annual_totals)[::-1]

for rank, idx in enumerate(ranked_indices):
    print(f"#{rank+1} {product_names[idx]}: {annual_totals[idx]} units")

print("\n=== Performance Categories ===")
daily_total = np.sum(sales, axis=1)
avg_daily = np.mean(daily_total)

excellent = np.sum(daily_total > avg_daily * 1.5)
good      = np.sum((daily_total >= avg_daily) & (daily_total <= avg_daily * 1.5))
poor      = np.sum(daily_total < avg_daily)

print(f"Average daily total : {avg_daily:.1f} units")
print(f"Excellent days (>150% avg) : {excellent}")
print(f"Good days (>=avg)          : {good}")
print(f"Poor days (<avg)           : {poor}")

print("\n=== Top 5 Best Sales Days ===")
top5_indices = np.argsort(daily_total)[-5:][::-1]
for rank, idx in enumerate(top5_indices):
    print(f"#{rank+1} Day {idx+1}: {daily_total[idx]} total units")