What We're Covering Today
- Copy vs View — one of the most important NumPy concepts
- Fancy Indexing
- np.where — conditional operations
- Sorting
- Combining Arrays
- Real dataset simulation
These are the concepts that separate beginners from people who actually use NumPy in real projects.
Copy vs View — Most Important Concept
This is the number one source of bugs for NumPy beginners. Pay close attention.
The Problem
import numpy as np
original = np.array([1, 2, 3, 4, 5])
# This looks like a copy but it is NOT slice_view = original[1:4] print(slice_view) # [2 3 4]
# Modify the slice slice_view[0] = 999 print(slice_view) # [999 3 4]
# Original is also changed! print(original) # [ 1 999 3 4 5]
When you slice a NumPy array — you get a view — not a copy. Both variables point to the same data in memory. Changing one changes the other.
This is completely different from Python lists:
# Python list — slicing gives a COPY py_list = [1, 2, 3, 4, 5] slice_copy = py_list[1:4] slice_copy[0] = 999
print(py_list) # [1, 2, 3, 4, 5] — unchanged print(slice_copy) # [999, 3, 4]
How to Check — View or Copy?
arr = np.array([1, 2, 3, 4, 5])
view = arr[1:4] copy = arr[1:4].copy()
# .base attribute — None means it owns its data (copy) # Not None means it's a view of something else print(view.base is arr) # True — it's a view print(copy.base is arr) # False — it's a copy
Always Use .copy() When You Need Independence
original = np.array([10, 20, 30, 40, 50])
# Safe way — proper copy safe_copy = original.copy()
safe_copy[0] = 999 print(safe_copy) # [999 20 30 40 50] print(original) # [10 20 30 40 50] — unchanged
Rule: Whenever you slice and plan to modify — use .copy(). This will save you hours of debugging.
Fancy Indexing
Fancy indexing = using an array of indices to select elements.
1D Fancy Indexing
arr = np.array([10, 20, 30, 40, 50, 60, 70])
# Select specific indices indices = [0, 2, 5] print(arr[indices]) # [10 30 60]
# Select in any order print(arr[[4, 1, 6, 0]]) # [50 20 70 10]
With regular slicing you can only select consecutive elements. Fancy indexing lets you pick any elements in any order.
2D Fancy Indexing
import numpy as np
matrix = np.array([ [1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16] ])
# Select specific rows print(matrix[[0, 2]]) # [[ 1 2 3 4] # [ 9 10 11 12]]
# Select specific rows AND columns row_indices = [0, 1, 2] col_indices = [0, 2, 3] print(matrix[row_indices, col_indices]) # [1 7 12] — matrix[0,0], matrix[1,2], matrix[2,3]
np.where — Conditional Operations
np.where is like an if/else but for entire arrays at once. You'll use this constantly.
Basic Usage
# np.where(condition, value_if_true, value_if_false)
marks = np.array([85, 42, 90, 38, 75, 55, 29, 91])
# Replace values — pass/fail label result = np.where(marks >= 50, "Pass", "Fail") print(result) # ['Pass' 'Fail' 'Pass' 'Fail' 'Pass' 'Pass' 'Fail' 'Pass']
np.where with Numbers
marks = np.array([85, 42, 90, 38, 75, 55, 29, 91])
# Add 10 bonus marks to failing students adjusted = np.where(marks < 50, marks + 10, marks) print(adjusted) # [85 52 90 48 75 55 39 91] # 42→52, 38→48, 29→39 (got bonus), rest unchanged
np.where — Get Indices
When called with only condition — returns indices where condition is True:
marks = np.array([85, 42, 90, 38, 75, 55, 29, 91])
# Find indices of failing students fail_indices = np.where(marks < 50) print(fail_indices) # (array([1, 3, 6]),) print(fail_indices[0]) # [1 3 6] — indices 1, 3, 6 have marks below 50
# Use indices to get the actual values print(marks[fail_indices]) # [42 38 29]
Nested np.where — Multiple Conditions
marks = np.array([92, 78, 55, 42, 88, 61, 35, 95])
grades = np.where(marks >= 90, "A", np.where(marks >= 75, "B", np.where(marks >= 60, "C", np.where(marks >= 50, "D", "F"))))
print(grades) # ['A' 'B' 'D' 'F' 'B' 'C' 'F' 'A']
Nested np.where is the NumPy equivalent of if/elif/else chains.
Sorting
arr = np.array([64, 34, 25, 12, 22, 11, 90])
# Sort ascending print(np.sort(arr)) # [11 12 22 25 34 64 90]
# Sort descending print(np.sort(arr)[::-1]) # [90 64 34 25 22 12 11]
# argsort — returns INDICES that would sort the array indices = np.argsort(arr) print(indices) # [5 3 4 2 1 0 6] print(arr[indices]) # [11 12 22 25 34 64 90] — sorted
argsort is extremely useful when you need to sort one array based on another:
students = np.array(["Rahul", "Priya", "Gagan", "Amit", "Neha"]) marks = np.array([78, 92, 65, 88, 71])
# Sort students by their marks sorted_indices = np.argsort(marks)[::-1] # descending print(students[sorted_indices]) # ['Priya' 'Amit' 'Rahul' 'Neha' 'Gagan'] print(marks[sorted_indices]) # [92 88 78 71 65]
Top students ranked by marks — clean and easy.
Sorting 2D Arrays
matrix = np.array([ [3, 1, 4], [1, 5, 9], [2, 6, 5] ])
# Sort each row print(np.sort(matrix, axis=1)) # [[1 3 4] # [1 5 9] # [2 5 6]]
# Sort each column print(np.sort(matrix, axis=0)) # [[1 1 4] # [2 5 5] # [3 6 9]]
Combining Arrays
np.concatenate — Join Arrays
a = np.array([1, 2, 3])b = np.array([4, 5, 6])c = np.array([7, 8, 9])# Join 1D arraysprint(np.concatenate([a, b])) # [1 2 3 4 5 6]print(np.concatenate([a, b, c])) # [1 2 3 4 5 6 7 8 9]# Join 2D arraysm1 = np.array([[1, 2], [3, 4]])m2 = np.array([[5, 6], [7, 8]])# Stack vertically (add rows)print(np.concatenate([m1, m2], axis=0))# [[1 2]# [3 4]# [5 6]# [7 8]]# Stack horizontally (add columns)print(np.concatenate([m1, m2], axis=1))# [[1 2 5 6]# [3 4 7 8]]
np.vstack and np.hstack — Easier Syntax
a = np.array([1, 2, 3])b = np.array([4, 5, 6])# vstack — vertical stack (adds rows)print(np.vstack([a, b]))# [[1 2 3]# [4 5 6]]# hstack — horizontal stack (adds columns)print(np.hstack([a, b]))# [1 2 3 4 5 6]m1 = np.ones((3, 2))m2 = np.zeros((3, 2))print(np.hstack([m1, m2]))# [[1. 1. 0. 0.]# [1. 1. 0. 0.]# [1. 1. 0. 0.]]print(np.vstack([m1, m2]))# [[1. 1.]# [1. 1.]# [1. 1.]# [0. 0.]# [0. 0.]# [0. 0.]]
Unique Values and Counts
arr = np.array([1, 2, 2, 3, 3, 3, 4, 4, 4, 4])
# Unique values print(np.unique(arr)) # [1 2 3 4]
# Unique values with their counts values, counts = np.unique(arr, return_counts=True) print(values) # [1 2 3 4] print(counts) # [1 2 3 4]
for val, count in zip(values, counts): print(f"Value {val} appears {count} times")
Output:
Value 1 appears 1 times
Value 2 appears 2 times
Value 3 appears 3 times
Value 4 appears 4 times
Real use case — finding most common category in a dataset.
np.clip — Limit Values to a Range
Values below 0 become 0. Values above 10 become 10. Values in range stay unchanged.
Real use case — clamping pixel values between 0-255, clamping scores between 0-100.
np.percentile — Finding Percentiles
Output:25.75 44.0 64.25 79.1Percentiles are used heavily in data analysis — finding outliers, understanding data distribution.
Real World Example — Sales Data Analysis
Let's simulate and analyze a real business dataset:
import numpy as np
np.random.seed(42)
# Simulate 1 year of daily sales for 3 products
# 365 days, 3 products
days = 365
products = 3
product_names = ["Laptop", "Phone", "Tablet"]
# Generate realistic sales numbers
sales = np.random.randint(5, 50, size=(days, products))
# Add seasonal pattern — higher in Nov/Dec (days 300-365)
sales[300:] = sales[300:] * 2
print("=== Sales Dataset Shape ===")
print(f"Shape: {sales.shape}") # (365, 3)
print(f"Total records: {sales.size}")
print("\n=== Basic Statistics ===")
for i, product in enumerate(product_names):
product_sales = sales[:, i]
print(f"\n{product}:")
print(f" Total annual sales : {np.sum(product_sales)}")
print(f" Daily average : {np.mean(product_sales):.1f}")
print(f" Best day : {np.max(product_sales)}")
print(f" Worst day : {np.min(product_sales)}")
print(f" Std deviation : {np.std(product_sales):.1f}")
print("\n=== Monthly Analysis ===")
month_names = ["Jan","Feb","Mar","Apr","May","Jun",
"Jul","Aug","Sep","Oct","Nov","Dec"]
days_per_month = [31,28,31,30,31,30,31,31,30,31,30,31]
start = 0
monthly_totals = []
for i, days_in_month in enumerate(days_per_month):
end = start + days_in_month
month_sales = np.sum(sales[start:end])
monthly_totals.append(month_sales)
start = end
monthly_totals = np.array(monthly_totals)
best_month_idx = np.argmax(monthly_totals)
worst_month_idx = np.argmin(monthly_totals)
print(f"Best month : {month_names[best_month_idx]} ({monthly_totals[best_month_idx]} units)")
print(f"Worst month : {month_names[worst_month_idx]} ({monthly_totals[worst_month_idx]} units)")
print("\n=== Product Rankings ===")
annual_totals = np.sum(sales, axis=0)
ranked_indices = np.argsort(annual_totals)[::-1]
for rank, idx in enumerate(ranked_indices):
print(f"#{rank+1} {product_names[idx]}: {annual_totals[idx]} units")
print("\n=== Performance Categories ===")
daily_total = np.sum(sales, axis=1)
avg_daily = np.mean(daily_total)
excellent = np.sum(daily_total > avg_daily * 1.5)
good = np.sum((daily_total >= avg_daily) & (daily_total <= avg_daily * 1.5))
poor = np.sum(daily_total < avg_daily)
print(f"Average daily total : {avg_daily:.1f} units")
print(f"Excellent days (>150% avg) : {excellent}")
print(f"Good days (>=avg) : {good}")
print(f"Poor days (<avg) : {poor}")
print("\n=== Top 5 Best Sales Days ===")
top5_indices = np.argsort(daily_total)[-5:][::-1]
for rank, idx in enumerate(top5_indices):
print(f"#{rank+1} Day {idx+1}: {daily_total[idx]} total units")
Output:
=== Sales Dataset Shape ===
Shape: (365, 3)
Total records: 1095
=== Basic Statistics ===
Laptop:
Total annual sales : 10248
Daily average : 28.1
Best day : 98
Worst day : 5
Std deviation : 18.2
Phone:
Total annual sales : 10091
Daily average : 27.6
Best day : 96
Worst day : 5
Std deviation : 17.8
Tablet:
Total annual sales : 10134
Daily average : 27.8
Best day : 98
Worst day : 6
Std deviation : 17.9
=== Monthly Analysis ===
Best month : Dec (3842 units)
Worst month : Jan (1842 units)
=== Product Rankings ===
#1 Laptop: 10248 units
#2 Tablet: 10134 units
#3 Phone: 10091 units
=== Performance Categories ===
Average daily total : 83.5 units
Excellent days (>150% avg): 65
Good days (>=avg) : 156
Poor days (<avg) : 209
=== Top 5 Best Sales Days ===
#1 Day 361: 262 total units
#2 Day 345: 260 total units
#3 Day 352: 259 total units
#4 Day 358: 257 total units
#5 Day 312: 256 total units
This is actual business data analysis — shape of what data analysts do every day.
NumPy is Complete — What You Now Know
✅ Creating arrays — zeros, ones, arange, linspace, random
✅ Array properties — shape, ndim, size, dtype
✅ Indexing and slicing — 1D and 2D
✅ Math operations — vectorized, element-wise
✅ Statistical functions — mean, median, std, min, max
✅ Boolean indexing — filtering data
✅ Copy vs View — avoiding bugs
✅ Fancy indexing — selecting non-consecutive elements
✅ np.where — conditional operations
✅ Sorting and argsort
✅ Combining arrays — concatenate, vstack, hstack
✅ Unique values and counts
✅ Percentiles
✅ Real dataset analysis
Exercise 🏋️
Temperature Analysis — solve in Jupyter notebook:
np.random.seed(10)
# Daily temperature readings for 4 cities over 1 year (365 days)
# Temperatures in Celsius
temperatures = np.random.normal(
loc=[25, 15, 35, 20], # average temp per city
scale=[5, 8, 4, 6], # variation per city
size=(365, 4)
).round(1)
cities = ["Delhi", "London", "Dubai", "Mumbai"]
Find:
- Annual average temperature per city
- Hottest and coldest city
- How many days each city exceeded 30°C
- Replace all temperatures below 0°C with 0 using
np.where - Find the 10 hottest days in Delhi
- Which city had most consistent temperature (lowest std deviation)
- Monthly average temperature for Delhi (use reshape or slicing)
- Rank cities from hottest to coldest annual average
No comments:
Post a Comment