This project performs exploratory data analysis (EDA) on the Titanic passenger dataset using Python and Pandas. The objective is to clean the data, analyze passenger characteristics, and identify factors that influenced survival rates.
- Load and explore the Titanic dataset
- Perform data cleaning and preprocessing
- Analyze passenger demographics
- Investigate survival patterns
- Generate summary statistics
- Create visualizations for insights
- Python
- Pandas
- NumPy
- Matplotlib
The dataset contains information about Titanic passengers, including:
- Survived
- Passenger Class (Pclass)
- Name
- Sex
- Age
- Siblings/Spouses Aboard
- Parents/Children Aboard
- Fare
The following preprocessing steps were performed:
- Checked for missing values
- Filled missing Age values using the median age
- Verified data types
- Removed inconsistencies where necessary
The analysis includes:
- Dataset shape
- Column information
- First few records
- Summary statistics
- Overall survival rate
- Survival by gender
- Survival by passenger class
- Average fare by passenger class
- Fare distribution insights
- Passenger age distribution
- Relationship between age and survival
- Female passengers had a significantly higher survival rate than male passengers.
- First-class passengers had better survival chances than passengers in lower classes.
- Passengers paying higher fares generally showed higher survival rates.
- Age showed some influence on survival outcomes.
The project includes visualizations such as:
- Survival Rate by Gender
- Survival Rate by Passenger Class
- Age Distribution
- Fare Distribution
data_analysis_project/
│
├── Titanic.csv
├── data_analysis.py
├── README.md
├── requirements.txt
└── screenshots/
├── dataset_preview.png
├── summary_statistics.png
├── survival_by_gender.png
└── survival_by_class.png
Clone the repository:
git clone <repository-url>Install dependencies:
pip install -r requirements.txtExecute the analysis script:
python data_analysis.pyThe analysis provides meaningful insights into passenger survival patterns and demonstrates the use of Python for real-world data exploration and statistical analysis.
Sinchana L Gowda