Data Wrangling
& Cleaning
Master data preparation with Pandas — handling dirty data, transformations, missing values, and normalization techniques.
Enter raw data with missing values, outliers, and invalid entries. See how it gets cleaned using IQR-based outlier removal and median imputation.
import pandas as pd
import numpy as np
# Replace common missing value placeholders with NaN
df = df.replace(['?', 'NA', ''], np.nan)
# Fill numeric missing values with median (robust to outliers)
numeric_cols = df.select_dtypes(include=[np.number]).columns
for col in numeric_cols:
df[col].fillna(df[col].median(), inplace=True)
# Remove outliers using IQR method
def remove_outliers(df, column):
Q1 = df[column].quantile(0.25)
Q3 = df[column].quantile(0.75)
IQR = Q3 - Q1
lower = Q1 - 1.5 * IQR
upper = Q3 + 1.5 * IQR
return df[(df[column] >= lower) & (df[column] <= upper)]
Compare different mathematical transformations and their effect on data distribution.
EDA & Visualization
Explore data relationships and create insightful visualizations using correlation analysis, distributions, and KDE plots.
Model Development
& Evaluation
Build, fit, and evaluate machine learning models with Scikit-learn — from simple linear regression to regularization techniques.
Drag the alpha slider to see how L2 regularization shrinks model coefficients.
Statistical Tests
& Inference
Perform hypothesis tests and understand statistical significance with interactive Chi-Square, T-Test, ANOVA, and Pearson calculators.
Test for independence between two categorical variables. Enter a 2-row contingency table.
Practice Datasets
Explore real-world datasets interactively — visualize different features and understand the data before running analysis.
1. Correlation between RAM and price
2. Average price by laptop category
3. Weight vs price scatter analysis
4. Build a price prediction model
5. Feature importance ranking
Practice Arena
Test your data analysis knowledge with concept questions and data challenges. Track your progress as you go.