Lesson 27: Introduction to Data Analysis with pandas

How to load, explore, clean, and analyze data using Python’s most powerful data library

What Is pandas?

pandas is a Python library designed for working with structured data. It provides fast, flexible tools for loading, cleaning, transforming, and analyzing datasets. It is widely used in data science, finance, research, and automation.

Installing pandas

pip install pandas

Loading Data

pandas can load CSV, Excel, JSON, SQL, and more.

Loading a CSV File

import pandas as pd

df = pd.read_csv("data.csv")
print(df.head())

Loading an Excel File

df = pd.read_excel("data.xlsx")
print(df.head())

Exploring Data

df.head()      # first 5 rows
df.tail()      # last 5 rows
df.info()      # column types and memory usage
df.describe()  # summary statistics

Selecting Data

Use column names or index-based selection.

Selecting Columns

df["Name"]
df[["Name", "Age"]]

Selecting Rows

df.loc[0]      # by label
df.iloc[0]     # by index

Filtering Data

adults = df[df["Age"] >= 18]
print(adults)

Adding and Modifying Columns

df["AgePlus10"] = df["Age"] + 10

Handling Missing Data

df.dropna()          # remove rows with missing values
df.fillna(0)         # replace missing values

Sorting Data

df.sort_values("Age")
df.sort_values("Age", ascending=False)

Grouping and Aggregating

grouped = df.groupby("Category")["Value"].mean()
print(grouped)

Merging and Joining

merged = pd.merge(df1, df2, on="id")

Plotting with pandas

pandas integrates with matplotlib for quick visualizations.

df["Age"].plot(kind="hist")
df.plot(x="Name", y="Age", kind="bar")

Why pandas Is Essential

Next Steps

Now that you can analyze data with pandas, you're ready to explore more advanced topics in Lesson 28: Introduction to NumPy for Numerical Computing.

← Back to Lesson Index