Welcome to FOSP Cancer Data Project’s documentation!
Introduction
Here is presented the project documentation using all types of cancer provided by FOSP (Fundação Oncocentro de São Paulo).
The project aims to use machine learning models to predict and identify the characteristics that most interfere in the death and survival of cancer patients in the state of São Paulo, Brazil. The data were extracted from the Deputy Directorate of Information and Epidemiology of Fundação Oncocentro de São Paulo, coordinator of the Hospital Cancer Registry of São Paulo, and contains patients treated between 2000 and 2021.
The project was developed using the python language, using machine learning models, specifically the models Random Forest and XGBoost.
Complete notebooks are available on Github.
Summary
The project will be divided in topics to provide a better understanding and organization. The steps being:
Libraries and Functions;
Data analysis, creation of new columns and first preprocessing;
Classifiers: with preprocessing, training and validation of machine learning models. Using five labels:
obito_geral;
obito_cancer;
vivo_ano1;
vivo_ano3;
vivo_ano5.
For each label, some scenarios were used to analyze the performance of the models, such as grouping the years and removing some columns from the data.
Libraries and functions
Data Analysis
Classification obito_geral
Classification obito_cancer
Classification vivo_ano1
Classification vivo_ano3
Classification vivo_ano5