How to load, explore, clean, and analyze data using Python’s most powerful data library
pandas is a Python library designed for working with structured data. It provides fast, flexible tools for loading, cleaning, transforming, and analyzing datasets. It is widely used in data science, finance, research, and automation.
pip install pandas
pandas can load CSV, Excel, JSON, SQL, and more.
import pandas as pd
df = pd.read_csv("data.csv")
print(df.head())
df = pd.read_excel("data.xlsx")
print(df.head())
df.head() # first 5 rows
df.tail() # last 5 rows
df.info() # column types and memory usage
df.describe() # summary statistics
Use column names or index-based selection.
df["Name"]
df[["Name", "Age"]]
df.loc[0] # by label
df.iloc[0] # by index
adults = df[df["Age"] >= 18]
print(adults)
df["AgePlus10"] = df["Age"] + 10
df.dropna() # remove rows with missing values
df.fillna(0) # replace missing values
df.sort_values("Age")
df.sort_values("Age", ascending=False)
grouped = df.groupby("Category")["Value"].mean()
print(grouped)
merged = pd.merge(df1, df2, on="id")
pandas integrates with matplotlib for quick visualizations.
df["Age"].plot(kind="hist")
df.plot(x="Name", y="Age", kind="bar")
Now that you can analyze data with pandas, you're ready to explore more advanced topics in Lesson 28: Introduction to NumPy for Numerical Computing.
← Back to Lesson Index