๐ Day 21 : NumPy Fundamentals
๐ฏ Enterprise Objective
Welcome to Phase 2. Data Analytics requires crunching millions of numbers. Standard Python is too slow. NumPy bridges Python with fast C-code, providing the N-Dimensional Array (ndarray). Mastering vectorized math is the gateway to Pandas and Machine Learning.
๐ Strategic Overview
| # | Topic | Concept |
|---|---|---|
| 1 | ndarrays | Homogeneous matrices |
| 2 | Generation | arange, zeros, linspace |
| 3 | Vectorization | Fast math without loops |
1. NumPy Arrays (ndarrays) : The Core of Data Science
Welcome to Phase 2: Data Analytics. NumPy (Numerical Python) is the foundation of Pandas and Scikit-Learn. Its core structure is the ndarray (N-dimensional array), which is up to 50x faster than Python lists because it uses fixed-type C-arrays under the hood.
import numpy as np
# Create a 1D array
arr = np.array([1, 2, 3, 4, 5])
๐ผ Why Data Analysts Care
โข Performance: Processing millions of data points instantly
โข Memory Efficiency: NumPy arrays take up significantly less RAM than standard Python lists
โ ๏ธ Mixed Data Types
Unlike Python lists which can hold [1, 'a', True], a NumPy array must be homogenous (all elements must be the same type, e.g., all floats or all ints). If you mix them, NumPy will force them into strings.
๐งช Concept Checks: NumPy Basics
Q1. Import numpy as np. Create an array from [10, 20, 30]. Print it.
Q2. Create a 2D array (matrix) representing a 3x3 grid of numbers. Print its .shape.
Q3. Try creating an array from [1, "two", 3]. Print its .dtype. Notice how the numbers became strings (
Q4. Print the number of dimensions .ndim of the 3x3 array.
Q5. Print the total number of elements in the array using .size.
2. Array Generation : Built-in Constructors
You rarely build massive arrays by manually typing lists. NumPy provides powerful generation functions like np.zeros(), np.arange(), and np.linspace().
| Function | Purpose | Example |
|---|---|---|
np.zeros(shape) | Array of 0s | np.zeros((3,3)) |
np.ones(shape) | Array of 1s | np.ones(5) |
np.arange(start, stop, step) | Like Python range | np.arange(0, 10, 2) |
np.linspace(start, stop, num) | Evenly spaced points | np.linspace(0, 1, 5) |
๐ผ Why Data Analysts Care
โข Initialization: Creating empty matrices to fill with model weights
โข Plotting: Using linspace to generate perfectly spaced X-axis values for charts
๐ง Pro Tip
np.arange() works with floats (e.g., step=0.5), unlike Python's built-in range() which only takes integers!
๐งช Concept Checks: Generators
Q1. Create an array of 10 zeros. Print it.
Q2. Create a 3x3 matrix of ones using np.ones((3, 3)).
Q3. Use np.arange() to create an array of numbers from 10 to 50, stepping by 5.
Q4. Use np.linspace() to generate exactly 11 evenly spaced points between 0 and 1.
Q5. Generate a 2x2 matrix of random integers between 1 and 100 using np.random.randint(1, 100, size=(2,2)).
3. Vectorized Operations : No More For-Loops
The most important concept in NumPy is Vectorization. You can perform math on entire arrays at once without writing for loops. This is executed in optimized C code, making it blazing fast.
# โ Bad (Python Loop)
for i in range(len(arr)): arr[i] *= 2
# โ
Good (Vectorized)
arr = arr * 2
๐ผ Why Data Analysts Care
โข Financial Math: Applying a 5% interest rate to a million bank accounts instantly (balances * 1.05)
โข Image Processing: Brightening an image matrix by adding a constant (pixels + 50)
๐ง Pro Tip
Arrays operate element-wise. If you add two arrays A + B, they must be the exact same shape (or compatible via 'broadcasting'). They add position 0 to position 0, 1 to 1, etc.
๐งช Concept Checks: Vectorization
Q1. Create arr = np.array([10, 20, 30]). Divide every element by 10 and print the result.
Q2. Given A = np.array([1, 2]) and B = np.array([10, 20]), multiply them together. Print the result.
Q3. Create an array using arange(1, 6). Square all elements (**2). Print the result.
Q4. Write a boolean operation: arr > 15. Print the result (you get a boolean array!).
Q5. Measure the speed difference between sum(range(100000)) and np.arange(100000).sum().
๐ ๏ธ Professional Practice Tasks
Theory is useless without muscle memory. Complete these tasks to solidify your understanding.
Task 1 (Celsius to Fahrenheit): Create an array of Celsius temperatures: [0, 10, 20, 30, 40]. Use vectorized math to convert them to Fahrenheit (C * 9/5) + 32. Print the result.
Task 2 (Distance Formula): Given two 1D arrays p1 = np.array([1, 2, 3]) and p2 = np.array([4, 5, 6]). Calculate the Euclidean distance using vector math: sqrt(sum((p1 - p2)**2)). (Use np.sqrt and .sum()).
Task 3 (Identity Matrix): Research and use np.eye() to create a 5x5 Identity matrix (1s on the diagonal, 0s elsewhere). Multiply it by 5 to make the diagonal 5s.
Task 4 (Random Noise): Create a 100-element array using np.linspace(0, 10, 100). Add random uniform noise to it using np.random.rand(100). Plotting this is the foundation of data visualization!
Task 5 (Shape Manipulation): Create a 1D array of 12 elements. Use .reshape((3, 4)) to turn it into a 3x4 matrix. Print the new matrix and its .shape.
๐ป Pure Coding Interview Questions
Q1.
Why is NumPy so much faster than standard Python lists?
Q2.
What is the difference between an ndarray and a Python list?
Q3.
Explain what 'Vectorization' means in NumPy.
Q4.
What happens if you try to put a string into an array of integers? Explain dtype casting.
Q5.
What does the .shape attribute return? What type of object is it?
Q6.
How do you create a 3D array in NumPy? What does its shape look like?
Q7.
Explain the difference between np.arange() and np.linspace().
Q8.
What is 'Element-wise' operation?
Q9.
How does NumPy handle missing data (NaN) compared to Python None?
Q10.
Write a one-liner to generate an array of 100 random numbers drawn from a normal distribution (np.random.randn).
Q11.
What is the output of np.array([1, 2]) + np.array([3, 4, 5])? Explain what happens.
Q12.
How do you check the data type of a NumPy array?
Q13.
How do you explicitly force a NumPy array to be floats instead of ints upon creation? (Hint: dtype=float).
Q14.
Explain what np.zeros_like(arr) does.
Q15.
What is the difference between np.random.rand() and np.random.randint()?
Q16.
How do you find the total memory consumed by a NumPy array? (Hint: .nbytes).
Q17.
Why is it a bad idea to use append() in a loop with NumPy arrays?
Q18.
What is broadcasting in NumPy (briefly)?
Q19.
How do you flatten a 2D matrix into a 1D array? (.flatten() or .ravel()).
Q20.
Explain the difference between deep copy and shallow copy (views) in NumPy arrays.
Q21.
What happens when you do arr > 5? What does it return?
Q22.
How do you get the transpose of a matrix in NumPy? (Hint: .T).
Q23.
What is the dot product of two vectors, and how do you compute it in NumPy? (np.dot or @).
Q24.
How do you find the maximum value in an array? How do you find its index? (argmax).
Q25.
What is the difference between np.nan and np.inf?
๐ Day 21 Executive Summary
| # | Topic | Key Takeaway |
|---|---|---|
| 1 | Array | shape, ndim, and dtype define the structure |
| 2 | Gen | Never build big arrays by hand |
| 3 | Vector | Replace for loops with arr * 2. It's 100x faster. |
โ Instructor's End-of-Day Checklist
โข [ ] I can create 1D and 2D arrays.
โข [ ] I can generate arrays using linspace and arange.
โข [ ] I understand vectorized math vs for-loops.