Welcome to FOSP Cancer Data Project’s documentation!

Authors

Gisele Fernandes - Epidemiology and Statistics on Cancer Group, International Research Center, A.C. Camargo Cancer Center
Lucas Buk Cardoso - Research Engineer at Núcleo de Sistemas Eletrônicos Embarcados, Instituto Mauá de Tecnologia
Maria Paula Curado - Epidemiology and Statistics on Cancer Group, International Research Center, A.C. Camargo Cancer Center
Stela Verzinhasse Peres - Director of Information and Epidemiology, Fundação Oncocentro de São Paulo
Tatiana Natasha Toporcov - Epidemiology Department, Faculdade de Saúde Pública, Universidade de São Paulo
Vanderlei Cunha Parro - Coordinator of Núcleo de Sistemas Eletrônicos Embarcados, Instituto Mauá de Tecnologia

Introduction

Here is presented the project documentation using all types of cancer provided by FOSP (Fundação Oncocentro de São Paulo).

The project aims to use machine learning models to predict and identify the characteristics that most interfere in the death and survival of cancer patients in the state of São Paulo, Brazil. The data were extracted from the Deputy Directorate of Information and Epidemiology of Fundação Oncocentro de São Paulo, coordinator of the Hospital Cancer Registry of São Paulo, and contains patients treated between 2000 and 2021.

The project was developed using the python language, using machine learning models, specifically the models Random Forest and XGBoost.

Complete notebooks are available on Github.

Summary

The project will be divided in topics to provide a better understanding and organization. The steps being:

Libraries and Functions;
Data analysis, creation of new columns and first preprocessing;
Classifiers: with preprocessing, training and validation of machine learning models. Using five labels:
- obito_geral;
- obito_cancer;
- vivo_ano1;
- vivo_ano3;
- vivo_ano5.

For each label, some scenarios were used to analyze the performance of the models, such as grouping the years and removing some columns from the data.

Libraries and functions

Data Analysis

Classification obito_geral

Classification obito_cancer

Classification vivo_ano1

Classification vivo_ano3

Classification vivo_ano5