Skip to content

Sinchana586/data_analysis_project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Titanic Data Analysis Project

Overview

This project performs exploratory data analysis (EDA) on the Titanic passenger dataset using Python and Pandas. The objective is to clean the data, analyze passenger characteristics, and identify factors that influenced survival rates.

Objectives

  • Load and explore the Titanic dataset
  • Perform data cleaning and preprocessing
  • Analyze passenger demographics
  • Investigate survival patterns
  • Generate summary statistics
  • Create visualizations for insights

Technologies Used

  • Python
  • Pandas
  • NumPy
  • Matplotlib

Dataset Information

The dataset contains information about Titanic passengers, including:

  • Survived
  • Passenger Class (Pclass)
  • Name
  • Sex
  • Age
  • Siblings/Spouses Aboard
  • Parents/Children Aboard
  • Fare

Data Cleaning

The following preprocessing steps were performed:

  • Checked for missing values
  • Filled missing Age values using the median age
  • Verified data types
  • Removed inconsistencies where necessary

Exploratory Data Analysis

The analysis includes:

1. Dataset Overview

  • Dataset shape
  • Column information
  • First few records
  • Summary statistics

2. Survival Analysis

  • Overall survival rate
  • Survival by gender
  • Survival by passenger class

3. Fare Analysis

  • Average fare by passenger class
  • Fare distribution insights

4. Age Analysis

  • Passenger age distribution
  • Relationship between age and survival

Key Findings

  • Female passengers had a significantly higher survival rate than male passengers.
  • First-class passengers had better survival chances than passengers in lower classes.
  • Passengers paying higher fares generally showed higher survival rates.
  • Age showed some influence on survival outcomes.

Visualizations

The project includes visualizations such as:

  • Survival Rate by Gender
  • Survival Rate by Passenger Class
  • Age Distribution
  • Fare Distribution

Project Structure

data_analysis_project/
│
├── Titanic.csv
├── data_analysis.py
├── README.md
├── requirements.txt
└── screenshots/
    ├── dataset_preview.png
    ├── summary_statistics.png
    ├── survival_by_gender.png
    └── survival_by_class.png

Installation

Clone the repository:

git clone <repository-url>

Install dependencies:

pip install -r requirements.txt

Running the Project

Execute the analysis script:

python data_analysis.py

Results

The analysis provides meaningful insights into passenger survival patterns and demonstrates the use of Python for real-world data exploration and statistical analysis.

Author

Sinchana L Gowda

About

Data analysis project using the Titanic dataset to perform data cleaning, visualization, statistical analysis, and uncover factors influencing passenger survival.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors