The lifelines package is a well documented, easy-to-use Python package for survival analysis.
I had never done any survival analysis, but the fact that package has great documentation made me adventure in the field. From the documentation I was able to understand the key concepts of survival analysis and run a few simple analysis on clinical data gathered by our collaborators from a cohort of cancer patients. This obviously does not mean it is a replacement of proper study of the field, but nonetheless I highly recommend reading the whole documentation for begginers on the topic and the usage of the package to anyone working in the field.
Getting our hands dirty
Note: In these data, although already anonymized, I have added some jitter for the actual values to differ from the real ones.
Although all one needs for survival analysis is two arrays with the time duration patients were observed and whether death occured during that time, in reality you’re more likely to get from clinicians an Excel file with dates of birth, diagnosis, and death along with other relevant information on the clinical cohort.
Let’s read some data in and transform those fields into the time we have been observing the patient (from diagnosis to the last checkup):
Hint: make sure you tell pandas which columns hold dates and the format they are in for correct date parsing.
patient_last_checkup_date
diagnosis_date
patient_death_date
t1
t2
t3
t4
t5
t6
t7
t8
t9
t10
t11
t12
duration
0
2011-12-05
1977-08-23
2011-12-19
F
A0
1
1
1
False
False
False
True
False
False
False
12522 days
1
2015-01-15
1997-08-06
NaT
M
A0
1
NaN
NaN
True
False
False
False
True
False
False
6371 days
2
2011-11-14
1987-03-11
NaT
F
A0
1
1
1
True
False
False
False
False
False
False
9014 days
3
2008-11-15
1992-04-27
2008-12-7
F
A0
1
2
1
True
False
True
False
False
False
False
6046 days
4
2008-10-09
1994-07-19
2009-12-22
M
A0
2
2
2
True
True
False
False
False
False
False
5196 days
Let’s check globaly how our patients are doing:
<matplotlib.axes._subplots.AxesSubplot at 0x7f3812afaf10>
Now we want to split our cohort according to values in several variables (e.g. gender, age, presence/absence of a clinical marker), and check what’s the progression of survival, and if differences between groups are significant.
<matplotlib.offsetbox.AnchoredText at 0x7f3813692dd0>
We can also investigate hazard over time instead of survival:
<matplotlib.axes._subplots.AxesSubplot at 0x7f3812ef6fd0>
Great, so if we make the code more general and wrap it into a function, we can run see how survival or hazard of patients with certain traits differ.
We can also investigate variables with more than one class and compare them in a pairwise fashion.