Work

Titanic Machine Learning Prediction

Machine Learning
Classification
Python
SVM

Developed a machine learning model to predict the survival rate of Titanic passengers using various features such as age, gender, class, and fare. The project utilized the Titanic dataset from Kaggle and evaluated several machine learning algorithms to determine the most effective model.

Titanic Machine Learning Project
Colab Notebook

Overview

The Titanic Machine Learning project aimed to predict the survival outcomes of passengers aboard the ill-fated RMS Titanic. Using the Kaggle Titanic dataset, we built a classification model leveraging features such as age, gender, passenger class, and more. The project involved extensive data preprocessing, feature engineering, and evaluation of various machine learning algorithms to find the most accurate model.

Implementation Details
  1. Dataset Overview

    • Training Set (train.csv): Includes labeled data with survival outcomes.
    • Test Set (test.csv): Includes unlabeled data for testing the trained model.
    • Features include demographic data, ticket class, and survival status.
  2. Data Preprocessing

    • Missing Data: Handled missing values in features like Age, Cabin, and Embarked.
    • Feature Engineering: Created new features such as the ‘Age_Class’ and ‘Fare_Per_Person’, and converted categorical features like ‘Sex’ and ‘Embarked’ into numeric values.
    • Feature Scaling: Normalized numerical features to ensure the model performs optimally.
  3. Model Selection

    • Tried several machine learning algorithms, including:
      • Stochastic Gradient Descent (SGD)
      • Random Forest
      • Logistic Regression
      • K Nearest Neighbor (KNN)
      • Gaussian Naive Bayes
      • Support Vector Machine (SVM)
  4. Model Evaluation

    • Evaluated the models using accuracy, precision, recall, F1 score, and cross-validation.
    • The SVM model performed the best, achieving an accuracy of 77%.
Technologies Used
  • Python: For machine learning implementation and data analysis.
  • Pandas: For data manipulation and analysis.
  • NumPy: For numerical operations.
  • Scikit-learn: For building machine learning models and evaluations.
  • Seaborn & Matplotlib: For data visualization.
  • Google Colab: For cloud-based development and model training.
Results & Findings
  • The SVM algorithm achieved an accuracy of 77%, demonstrating its effectiveness in predicting survival on the Titanic.
  • Feature engineering improved model performance by creating relevant features like ‘Age_Class’ and ‘Fare_Per_Person’.
  • The project showed that machine learning can provide valuable insights into complex, historical datasets.
Future Improvements
  • Explore the use of Random Forest or XGBoost for potentially better performance.
  • Implement more advanced feature selection and hyperparameter tuning.
  • Apply model explainability techniques (e.g., SHAP) to improve transparency.

Contributors
  • Imad-Eddine NACIRI
  • Achraf Berriane
  • Errouji Oussama