Covid in Ontario Canada

The following analysis uses data published by the Government of Ontario.

https://data.ontario.ca/

At some point during the pandemic I started feeling like news outlets were not reporting on the things I cared about. I care about numbers and actual data, not some news outlets interpretation. Even worse is editorialized content that always puts a spin on the data to push an agenda. I don't care about any of that, I just want to know what is going on.

The best way to do this is to download the data yourself and analyse it. Even if you don't know programming you could easily import this data into Excel and do something similar.

Since I am a python hobbyist this feels like a great use case for Python Pandas, Matplotlib and Seaborn for visualizations.

I did my best to interpret the data in an unbiased way. However, its easy to make mistakes and if you see something that doesnt make sense or you don't agree with please drop me an email, I would like to hear from you.

You can reach out to me at alaudet@linuxnorth.org

Feedback is always welcome.

Load the libraries

In [1]:
import pandas as pd
pd.plotting.register_matplotlib_converters()
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set_style("whitegrid")

Import the data

In [2]:
# Dataset #1 - Covid Cases in Ontario
df = pd.read_csv('../data/conposcovidloc.csv', index_col="Row_ID")
# The conposcovidloc.csv file is over 100Mb. 
# If you prefer to download it directly from the source, use this instead;
# df = pd.read_csv('https://data.ontario.ca/dataset/f4112442-bdc8-45d2-be3c-12efae72fb27/resource/455fd63b-603d-4608-8216-7d8647f43350/download/conposcovidloc.csv', index_col="Row_ID")

# schema_df source: https://data.ontario.ca/dataset/f4112442-bdc8-45d2-be3c-12efae72fb27/resource/a2ea0536-1eae-4a17-aa04-e5a1ab89ca9a/download/conposcovidloc_data_dictionary.xlsx
# converted from xlsx to csv and available on linuxnorth.org
schema_df = pd.read_csv('https://www.linuxnorth.org/pandas/data/conposcovidloc_data_dictionary.csv', index_col="Variable Name", encoding = "ISO-8859-1", error_bad_lines=False)


# Dataset #2 - Covid Retransmission Rate in Ontario
dfre = pd.read_csv('https://data.ontario.ca/dataset/8da73272-8078-4cbd-ae35-1b5c60c57796/resource/1ffdf824-2712-4f64-b7fc-f8b2509f9204/download/re_estimates_on.csv')

# Dataset #3 - Vaccine data for Ontario
dfvaccine = pd.read_csv('https://data.ontario.ca/dataset/752ce2b7-c15a-4965-a3dc-397bf405e7cc/resource/8a89caa9-511c-4568-af89-7f2174b4378c/download/vaccine_doses.csv')

# Dataset #4 - Vaccine Status
dfvacstatus = pd.read_csv('https://data.ontario.ca/dataset/752ce2b7-c15a-4965-a3dc-397bf405e7cc/resource/eed63cf2-83dd-4598-b337-b288c0a89a16/download/vac_status.csv.csv')

Dataset 1 Analysing Covid in Ontario

In [3]:
# taking a peek
df.head(10)
Out[3]:
Accurate_Episode_Date Case_Reported_Date Test_Reported_Date Specimen_Date Age_Group Client_Gender Case_AcquisitionInfo Outcome1 Outbreak_Related Reporting_PHU_ID Reporting_PHU Reporting_PHU_Address Reporting_PHU_City Reporting_PHU_Postal_Code Reporting_PHU_Website Reporting_PHU_Latitude Reporting_PHU_Longitude
Row_ID
1 2019-05-30 2020-05-05 2020-05-05 2020-05-03 50s FEMALE CC Resolved NaN 2260 Simcoe Muskoka District Health Unit 15 Sperling Drive Barrie L4M 6K9 www.simcoemuskokahealth.org 44.410713 -79.686306
2 2019-11-20 2020-10-21 2020-11-21 2019-11-20 20s FEMALE NO KNOWN EPI LINK Resolved NaN 4913 Southwestern Public Health 1230 Talbot Street St. Thomas N5P 1G9 www.swpublichealth.ca 42.777804 -81.151156
3 2020-01-01 2020-04-24 2020-04-24 2020-04-23 80s MALE NO KNOWN EPI LINK Resolved NaN 2234 Haldimand-Norfolk Health Unit 12 Gilbertson Drive Simcoe N3Y 4N5 www.hnhu.org 42.847825 -80.303815
4 2020-01-01 2020-05-17 2020-05-17 2020-05-15 50s MALE CC Resolved NaN 2265 Region of Waterloo, Public Health 99 Regina Street South Waterloo N2J 4V3 www.regionofwaterloo.ca 43.462876 -80.520913
5 2020-01-01 2021-05-26 2021-03-31 2021-03-28 UNKNOWN MALE TRAVEL Resolved NaN 2263 Timiskaming Health Unit 247 Whitewood Avenue, Unit 43 New Liskeard P0J 1P0 www.timiskaminghu.com 47.509284 -79.681632
6 2020-01-10 2020-06-10 2020-06-10 2020-06-09 50s MALE CC Resolved NaN 2234 Haldimand-Norfolk Health Unit 12 Gilbertson Drive Simcoe N3Y 4N5 www.hnhu.org 42.847825 -80.303815
7 2020-01-13 2021-01-23 2021-01-23 2021-01-22 30s MALE NO KNOWN EPI LINK Resolved NaN 2260 Simcoe Muskoka District Health Unit 15 Sperling Drive Barrie L4M 6K9 www.simcoemuskokahealth.org 44.410713 -79.686306
8 2020-01-16 2020-10-08 2020-10-08 2020-10-06 50s FEMALE NO KNOWN EPI LINK Resolved NaN 2258 Eastern Ontario Health Unit 1000 Pitt Street Cornwall K6J 5T1 www.eohu.ca 45.029152 -74.736298
9 2020-01-21 2020-01-23 2020-01-26 2020-01-23 50s MALE TRAVEL Resolved NaN 3895 Toronto Public Health 277 Victoria Street, 5th Floor Toronto M5B 1W2 www.toronto.ca/community-people/health-wellnes... 43.656591 -79.379358
10 2020-01-22 2020-01-23 2020-01-27 2020-01-25 50s FEMALE TRAVEL Resolved NaN 3895 Toronto Public Health 277 Victoria Street, 5th Floor Toronto M5B 1W2 www.toronto.ca/community-people/health-wellnes... 43.656591 -79.379358
In [4]:
# Dataframe size (rows, columns)
df.shape
Out[4]:
(578048, 17)
In [5]:
# Looking at the schema provided
schema_df = schema_df[['Definition', 'Additional Notes']]
schema_df.sort_index(inplace=True)
schema_df
Out[5]:
Definition Additional Notes
Variable Name
Accurate_Episode_Date The field uses a number of dates entered in th... Blank records may exist where a Public Health ...
Age_Group Age group of the patient. Patient ages are clustered in 10-year interval...
Case_AcquisitionInfo Suspected method of exposure to COVID-19, if k... As of June 17, 2020, values include: ‘CC’ (clo...
Case_Reported_Date The date that the case was reported to the loc... NaN
Client_Gender Gender information of the patient. Values Include: 'FEMALE', 'MALE', 'GENDER DIV...
Outbreak_Related Describes whether a confirmed positive case is... A confirmed positive case that is associated w...
Outcome1 Patient outcome. Values include: Resolved, Not Resolved, Fatal.
Reporting_PHU Public Health Unit (PHU) where confirmed posit... For a list of Ontario's Public Health Units, p...
Reporting_PHU_Address Official physical street address of Public Hea... This variable does not indicate the specfic ph...
Reporting_PHU_City Official city of Public Health Unit (PHU). This variable does not indicate the specfic ci...
Reporting_PHU_ID Public Health Unit (PHU) ID where confirmed po... NaN
Reporting_PHU_Latitude Latitude of Public Health Unit (PHU) physical ... This variable does not indicate the specfic co...
Reporting_PHU_Longitude Longitude of Public Health Unit (PHU) physical... This variable does not indicate the specfic co...
Reporting_PHU_Postal_Code Official postal code of Public Health Unit (PHU). This variable does not indicate the specfic po...
Reporting_PHU_Website Official website of Public Health Unit (PHU). NaN
Row_ID Identifier for each individual row/record with... The values under this variable are not continu...
Specimen_Date Set to the earliest specimen date on record fo... NaN
Test_Reported_Date The test reported date as indicated on the lab... NaN
In [6]:
# How many missing values in each column
df.isna().sum()
Out[6]:
Accurate_Episode_Date             0
Case_Reported_Date                0
Test_Reported_Date            12665
Specimen_Date                  2382
Age_Group                         0
Client_Gender                     0
Case_AcquisitionInfo              0
Outcome1                          0
Outbreak_Related             480431
Reporting_PHU_ID                  0
Reporting_PHU                     0
Reporting_PHU_Address             0
Reporting_PHU_City                0
Reporting_PHU_Postal_Code         0
Reporting_PHU_Website             0
Reporting_PHU_Latitude            0
Reporting_PHU_Longitude           0
dtype: int64
In [7]:
# Looking only at columns of interest
columns_of_interest = ['Accurate_Episode_Date', 'Case_Reported_Date', 'Age_Group', 'Client_Gender', 'Case_AcquisitionInfo', 
                       'Outcome1', 'Outbreak_Related', 'Reporting_PHU_ID', 'Reporting_PHU']
df = df[columns_of_interest]
df.columns = ['adate','rdate', 'age', 'gender', 'source', 'outcome', 'outbreak', 'phuid', 'phu']
In [8]:
df.dtypes
Out[8]:
adate       object
rdate       object
age         object
gender      object
source      object
outcome     object
outbreak    object
phuid        int64
phu         object
dtype: object
In [9]:
# Dates are stored as strings. Change them to pandas datetime
df['rdate']= pd.to_datetime(df['rdate'])
df['adate']= pd.to_datetime(df['adate'])
In [10]:
df.dtypes
Out[10]:
adate       datetime64[ns]
rdate       datetime64[ns]
age                 object
gender              object
source              object
outcome             object
outbreak            object
phuid                int64
phu                 object
dtype: object
In [11]:
# Take another peek....that's better
df.tail()
Out[11]:
adate rdate age gender source outcome outbreak phuid phu
Row_ID
578044 2021-09-16 2021-09-16 60s MALE MISSING INFORMATION Not Resolved NaN 2253 Peel Public Health
578045 2021-09-16 2021-09-16 <20 FEMALE MISSING INFORMATION Not Resolved NaN 2253 Peel Public Health
578046 2021-09-16 2021-09-16 <20 FEMALE MISSING INFORMATION Not Resolved NaN 2227 Brant County Health Unit
578047 2021-09-16 2021-09-16 <20 FEMALE MISSING INFORMATION Not Resolved NaN 2227 Brant County Health Unit
578048 2021-09-16 2021-09-16 <20 FEMALE MISSING INFORMATION Not Resolved NaN 3895 Toronto Public Health
In [12]:
# Total number of covid cases reported in Ontario all time.
len(df)
Out[12]:
578048
In [13]:
# Change '<20' to '0-19'.  This will make age distribution charts easier to read later.
df['age'] = df['age'].replace(['<20'],'0-19')
df.head(2)
Out[13]:
adate rdate age gender source outcome outbreak phuid phu
Row_ID
1 2019-05-30 2020-05-05 50s FEMALE CC Resolved NaN 2260 Simcoe Muskoka District Health Unit
2 2019-11-20 2020-10-21 20s FEMALE NO KNOWN EPI LINK Resolved NaN 4913 Southwestern Public Health

Case distribution by date since the beginning of the pandemic

We can see three distinct waves of covid spread in Ontario. The initial smaller wave at the beginning that devastated the elderly in March/April of 2020, then two distinct larger waves in January and May 2021 which was mostly spread by younger people.

In [ ]:
 
In [55]:
plt.figure(figsize=(14,6))
plt.title('Ontario Covid Waves - Daily Cases', fontsize=20)
sns.lineplot(data=df['rdate'].value_counts())
plt.ylabel('Cases', fontsize=15)
plt.xlabel('Date', fontsize=15)
plt.yticks(fontsize=12)
plt.xticks(fontsize=12)
plt.show()

Gender breakdown of Covid Cases in Ontario

Covid infects all genders proportionally.

In [15]:
print(df['gender'].value_counts())
gender_filter = (df["gender"] == 'MALE') | (df["gender"] == 'FEMALE') | (df["gender"] == 'UNSPECIFIED') | (df["gender"] == 'GENDER DIVERSE')
gdf = df[gender_filter]
plt.figure(figsize=(10,6))
plt.title("Ontario - Covid Infections by Gender", fontsize=20)
sns.countplot(x=gdf["gender"], data=df)
plt.xlabel('Gender', fontsize=13)
plt.ylabel('Count', fontsize=13)
plt.show()
MALE              288058
FEMALE            285966
UNSPECIFIED         3990
GENDER DIVERSE        34
Name: gender, dtype: int64

Region specific covid cases

My hometown is Timmins and I am originally from Sudbury. Let's compare the two communities covid cases. Timmins is represented by the Porcupine Health Unit area.

The Porcupine Health Unit area had an explosion of cases in May, especially in the James Bay area.

You can compare multiple areas easily.

In [16]:
df_tim = df[df.phu == "Porcupine Health Unit"]
df_sud = df[df.phu == "Sudbury & District Health Unit"]
df_wat = df[df.phu == "Region of Waterloo, Public Health"]
In [17]:
plt.figure(figsize=(14,6))
plt.title('Cases in Porcupine and Sudbury Health Unit Areas', fontsize=20)
plt.xlabel("")
plt.ylabel("Daily Cases", fontsize=15)
plt.yticks(fontsize=12)
plt.xticks(fontsize=12)
sns.lineplot(data=df_tim['rdate'].value_counts(), label="Porcupine Health Unit")
sns.lineplot(data=df_sud['rdate'].value_counts(), label="Sudbury & District Health Unit")
#sns.lineplot(data=df_wat['rdate'].value_counts(), label="Grey Bruce Health Unit")
plt.show()

Distribution of cases by age group

We can see that young people have been hit especially hard by Covid.

In [18]:
plt.figure(figsize=(10,6))
plt.title("Ontario - Infections by Age Category", fontsize=18)
sns.countplot(data=df, x=df['age'],order=['0-19', '20s','30s','40s','50s','60s','70s','80s','90s'])#df["age"].value_counts().index)#.iloc[:10].index)
plt.ylabel('Age Group', fontsize=15)
plt.xlabel('Infections', fontsize=15)
plt.show()

Tracking age distribution of infections during the three Ontario covid waves.

Note: These dates are approximate by looking at the Ontario cases graph higher up.

  • Wave 1 - March to May 2020
  • Wave 2 - October 2020 to February 2021
  • Wave 3 - April 2021 to May 2021
In [19]:
wave1 = (df['rdate'] > '2020-03-01') & (df['rdate'] < '2020-05-30')
wave2 = (df['rdate'] > '2020-10-01') & (df['rdate'] < '2021-02-28')
wave3 = (df['rdate'] > '2021-04-01') & (df['rdate'] < '2021-05-21')
wave4 = (df['rdate'] > '2021-07-26')

dfwave1 = df[wave1].sort_values(by='age')
dfwave2 = df[wave2].sort_values(by='age')
dfwave3 = df[wave3].sort_values(by='age')
dfwave4 = df[wave4].sort_values(by='age')

We can see a clear trend of age distributions moving towards younger generations with each wave. There is a lot of speculation and people are quick to criticize younger Canadians for not following Covid guidelines like social distancing and not gathering in groups. I don't think that is entirely fair as Ontario has been proritizing older Ontarians during vaccine rollout.

Also more virulent variants have taken hold and many young Canadians work in the service sector, therefore may not have the luxury of working from home. They have no choice but to get out there.

Also as we see in the last wave "under 20's" have not had access to vaccination in the -12 years old group. The under 30 group now account for almost three quarters of new cases.

In [20]:
# wave 1 graph
plt.figure(figsize=(10,6))
plt.title("Wave 1 Ontario - Infections by Age Category", fontsize=18)
sns.countplot(data=dfwave1, x='age')
plt.xlabel('Age Group', fontsize=15)
plt.ylabel('Wave 1 Infections', fontsize=15)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)

# wave 2 graphb
plt.figure(figsize=(10,6))
plt.title("Wave 2 - Ontario - Infections by Age Category", fontsize=18)
sns.countplot(data=dfwave2, x='age')
plt.xlabel('Age Group', fontsize=15)
plt.ylabel('Wave 2 Infections', fontsize=15)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)

# wave 3 graph
plt.figure(figsize=(10,6))
plt.title("Wave 3 - Ontario - Infections by Age Category", fontsize=18)
sns.countplot(data=dfwave3, x='age')
plt.xlabel('Age Group', fontsize=15)
plt.ylabel('Wave 3 Infections', fontsize=15)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)

# wave 4 graph
plt.figure(figsize=(10,6))
plt.title("Wave 4 - Ontario - Infections by Age Category since July 26", fontsize=18)
sns.countplot(data=dfwave4, x='age')
plt.xlabel('Age Group', fontsize=15)
plt.ylabel('Wave 4 Infections', fontsize=15)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)

plt.show()
In [21]:
#df.age.value_counts().sort_index()

Look at how the different age categories are getting infected by Covid 19.

In [22]:
print('Missing Information and Unspecified EPI Link have been ommitted')
plt.figure(figsize=(14,6))
plt.title("Ontario - Source of Infection by Age Category", fontsize=18)
sns.countplot(data=df, x='age', hue='source', hue_order=('CC', 'NO KNOWN EPI LINK','OB', 'TRAVEL'), 
              order=['0-19', '20s','30s','40s','50s','60s','70s','80s','90s'])
plt.legend(title='Source of Infection', loc=7,labels=('Contact of a Case', 'Outbreak',
                                              'No Known Link', 'Travel', 'Missing Information', 'Unspecified Link'))
plt.xlabel('Age Group', fontsize=15)
plt.ylabel('Infections', fontsize=15)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.show()
Missing Information and Unspecified EPI Link have been ommitted

Tracking Deaths

The risk of death from Covid rises exponentially as we age. Despite most infections occurring in younger Ontarians, the elderly have suffered the most deaths.

In [23]:
dfdeath = df[df.outcome == 'Fatal'].age.value_counts().sort_index()
print(dfdeath)
plt.figure(figsize=(10,6))
plt.title('Deaths by Age Group', fontsize=20)
plt.ylabel('Number of Deaths', fontsize=15)
plt.xlabel('')
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
dfdeath.plot(kind='bar')
#sns.countplot(data=df, x='age', hue='outcome', hue_order=['Fatal'], order=df.age.value_counts().index)
plt.show()
0-19          5
20s          28
30s          66
40s         157
50s         489
60s        1129
70s        1998
80s        3260
90+        2504
UNKNOWN       1
Name: age, dtype: int64

Death by time period

With well over 9000 deaths in Ontario since the beginning of the Covid pandemic, the vast majority have been in individuals over 70 years of age. Despite the increasing number of cases throughout the second and third wave, deaths have dropped dramatically as infections moved to younger individuals, who are less susceptible to death as a result of infection.

Vaccination is also contributing to decreased rates of death.

In [24]:
df_fatal = df[df.outcome == 'Fatal'].sort_index()
df_fatal = df_fatal.sort_values(by=['rdate'])
print('There have been',len(df_fatal), 'Deaths Total.')
There have been 9637 Deaths Total.
In [25]:
plt.figure(figsize=(14,6))
plt.title('Deaths Since Beginning of Covid Pandemic', fontsize=20)
plt.ylabel('Deaths', fontsize=15)
plt.xlabel('Date', fontsize=15)
plt.yticks(fontsize=12)
plt.xticks(fontsize=12)
df_fatal['rdate'].value_counts().plot()
plt.show()
In [ ]:
 
In [26]:
df['outcome'].unique()
Out[26]:
array(['Resolved', 'Fatal', 'Not Resolved'], dtype=object)
In [27]:
df['outcome'].value_counts()
Out[27]:
Resolved        562172
Fatal             9637
Not Resolved      6239
Name: outcome, dtype: int64
In [ ]:
 
In [ ]:
 

The hardest hit regions in Ontario

No surprise that large urban centres had the highest rates of transmission

In [28]:
plt.figure(figsize=(10,6))
plt.title("Infections by Top 10 PHU Area", fontsize=20)
sns.countplot(data=df, y=df['phu'], order=df.phu.value_counts().iloc[:10].index)
plt.ylabel('Area', fontsize=15)
plt.xlabel('Count', fontsize=15)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.show()
In [29]:
df.phu.value_counts().iloc[:10].index
Out[29]:
Index(['Toronto Public Health', 'Peel Public Health',
       'York Region Public Health Services', 'Ottawa Public Health',
       'Durham Region Health Department', 'Hamilton Public Health Services',
       'Region of Waterloo, Public Health', 'Windsor-Essex County Health Unit',
       'Halton Region Health Department',
       'Niagara Region Public Health Department'],
      dtype='object')

Ontario Hotspots (name and phuid) where Delta Variant has taken hold

Toronto 3895, Peel 2253, York 2270, Durham 2230, Hamilton 2237, Waterloo 2265, Halton 2236, Porcupine 2256, Wellington-Dufferin-Guelph 2266, and Simcoe-Muskoka 2260, Grey Bruce 2233

In [30]:
hotspots = (df['phuid'] == 3895) | (df['phuid'] == 2253) | (df['phuid'] == 2270) | (df['phuid'] == 2230) | (df['phuid'] == 2237) | (df['phuid'] == 2265) | (df['phuid'] == 2236) | (df['phuid'] == 2256) | (df['phuid'] == 2266) | (df['phuid'] == 2260)

dfhot = df.loc[hotspots]
dfhot.tail()
Out[30]:
adate rdate age gender source outcome outbreak phuid phu
Row_ID
578038 2021-09-16 2021-09-16 50s FEMALE MISSING INFORMATION Not Resolved NaN 2270 York Region Public Health Services
578041 2021-09-16 2021-09-16 30s MALE MISSING INFORMATION Not Resolved NaN 2253 Peel Public Health
578044 2021-09-16 2021-09-16 60s MALE MISSING INFORMATION Not Resolved NaN 2253 Peel Public Health
578045 2021-09-16 2021-09-16 0-19 FEMALE MISSING INFORMATION Not Resolved NaN 2253 Peel Public Health
578048 2021-09-16 2021-09-16 0-19 FEMALE MISSING INFORMATION Not Resolved NaN 3895 Toronto Public Health
In [31]:
junehot = dfhot['rdate'] > "2021-07-01"
dfhot.loc[junehot]['rdate'].value_counts().plot()
Out[31]:
<AxesSubplot:>

4th Wave Timmins and Sudbury

In [32]:
df4 = dfwave4.loc[wave4]
In [33]:
df_tim4 = df4[df4.phu == "Porcupine Health Unit"]
df_sud4 = df4[df4.phu == "Sudbury & District Health Unit"]
In [34]:
plt.figure(figsize=(14,6))
plt.title('4th wave Cases in Porcupine and Sudbury Health Unit Areas', fontsize=20)
plt.xlabel("")
plt.ylabel("Daily Cases", fontsize=15)
plt.yticks(fontsize=12)
plt.xticks(fontsize=12)
sns.lineplot(data=df_tim4['rdate'].value_counts(), label="Porcupine Health Unit")
sns.lineplot(data=df_sud4['rdate'].value_counts(), label="Sudbury & District Health Unit")
plt.show()
In [ ]:
 
In [ ]:
 
In [ ]:
 

2. Effective reproduction number (Re) for COVID-19 in Ontario

An estimate of the average number of people 1 person will infect when they have COVID-19.

Source: https://data.ontario.ca/dataset/effective-reproduction-number-re-for-covid-19-in-ontario

Note: A rate over one will mean that covid numbers are on the rise. A rate below one means Covid cases are shrinking.

In [35]:
# Make date_start and date_end Pandas datetime objects instead of strings.
dfre['date_start'] = pd.to_datetime(dfre['date_start'])
dfre['date_end'] = pd.to_datetime(dfre['date_end'])
In [36]:
dfre.dtypes
Out[36]:
region                object
date_start    datetime64[ns]
date_end      datetime64[ns]
Re                   float64
lower_CI             float64
upper_CI             float64
dtype: object

Create a Baseline Re rate of 1

In [37]:
dfre['Re_baseline'] = dfre.apply(lambda x: 1, axis=1)

Set date_end as the index of the dataframe.

The Re number is provided as a rolling average of the past 7 days in Ontario's data.

In [38]:
dfre.set_index('date_end', inplace=True)
In [39]:
dfre.tail()
Out[39]:
region date_start Re lower_CI upper_CI Re_baseline
date_end
2021-09-10 Ontario 2021-09-04 0.98 0.96 1.01 1
2021-09-11 Ontario 2021-09-05 0.97 0.94 1.00 1
2021-09-12 Ontario 2021-09-06 0.97 0.95 1.00 1
2021-09-13 Ontario 2021-09-07 0.98 0.95 1.01 1
2021-09-14 Ontario 2021-09-08 0.99 0.97 1.02 1

Re rate observations

The Re rate can be a powerful predictor of where we are headed in terms of an increasing or decreasing number of cases. Vaccination of Ontarians started in February and has really picked up steam in April, May and June. The Re rate seems to reflect this and has been on a continuous decline since April. However it may still be too early to tell for sure with the Delta variant taking hold.

We see a similar trend from January to the end of February before the third wave hit. Vaccination was not an issue at that time.

It will be interesting to follow the Re rate in the next months given high vaccination rates but also increased spread of the Delta variant (and future unknown variants). If vaccination manages to contain Re then we can get ahead of Covid and return to a more normal way of life. The wildcard in this will be variants. While vaccination appears to be working with current strains, new variants could take hold and push Re back up again resulting in more waves.

Prediction

Looking at the graph below and the upward trend, I predict that the rate of decrease in cases will stop in August and numbers will climb by September. (Assuming no changes)

wildcards - Delta Variant, Success in getting first doses, Opening Immunization to under 12. All of these can impact Re.

In [40]:
# Re Graph
plt.figure(figsize=(14, 6))
plt.title("Ontario Covid Reproduction Rate (Re) vs Cases", fontsize=20)
plt.xticks(fontsize=13)
plt.yticks(fontsize=13)
sns.lineplot(data=dfre[['Re', 'Re_baseline']])
plt.xlabel("")
plt.ylabel("Re Number", fontsize=15)

# Ontario Covid Case graph for comparison.  

#Let's lineup the dates with the Re dataset first.
df = df[df['rdate'] > '2020-03-19']

plt.figure(figsize=(14,6))
#plt.title('Ontario Covid Waves - Daily Cases', fontsize=20)
sns.lineplot(data=df['rdate'].value_counts())
plt.ylabel('Cases', fontsize=15)
plt.xlabel('')
plt.yticks(fontsize=12)
plt.xticks(fontsize=12)
plt.show()

Vaccination Analysis

The following looks at vaccination rates in Ontario. We can see that Ontarians overall are being vaccinated in large numbers. As of June 27, 2021 we have not yet seen a plateau although rates are expected to slow down.

In [41]:
# Dataset #3 - Vaccine data for Ontario
dfvaccine = pd.read_csv('https://data.ontario.ca/dataset/752ce2b7-c15a-4965-a3dc-397bf405e7cc/resource/8a89caa9-511c-4568-af89-7f2174b4378c/download/vaccine_doses.csv')
In [42]:
#dfvaccine.tail()
In [43]:
# Create a 7 day rolling average column of daily vaccinations.
dfvaccine['7day'] = dfvaccine.iloc[:,1].rolling(window=7).mean()
In [44]:
#plt.figure(figsize=(14,6))
dfvaccine[['report_date','previous_day_at_least_one', 'previous_day_fully_vaccinated',
           'previous_day_total_doses_administered', '7day']].set_index('report_date').tail(10)#.plot(kind='bar')
Out[44]:
previous_day_at_least_one previous_day_fully_vaccinated previous_day_total_doses_administered 7day
report_date
2021-09-08 17447.0 20727.0 38174.0 33033.285714
2021-09-09 18043.0 20348.0 38391.0 33496.000000
2021-09-10 16477.0 19367.0 35844.0 32351.571429
2021-09-11 16532.0 23688.0 40220.0 31542.142857
2021-09-12 11733.0 17449.0 29182.0 31075.285714
2021-09-13 6616.0 9226.0 15842.0 30292.000000
2021-09-14 12538.0 16119.0 28657.0 32330.000000
2021-09-15 15171.0 20520.0 35691.0 31975.285714
2021-09-16 15271.0 20192.0 35463.0 31557.000000
2021-09-17 14865.0 20420.0 35285.0 31477.142857
In [45]:
# Make report_date a pandas datetime object instead of a string.
dfvaccine['report_date'] = pd.to_datetime(dfvaccine['report_date'])
#dfvaccine.dtypes

Interesting to see that numbers really drop on Sundays as Monday reporting always shows lower numbers

In [46]:
plt.figure(figsize=(14,6))
plt.title('Daily Vaccine Doses - Ontario', fontsize=20)
plt.yticks(fontsize=12)
plt.xticks(fontsize=12)
sns.lineplot(data=dfvaccine, x='report_date', y='7day', label='7 Day Rolling Average')
sns.lineplot(data=dfvaccine, x='report_date', y='previous_day_total_doses_administered', label='Daily Dose Count')
plt.xlabel('Date',fontsize=15)
plt.ylabel('Number Vaccinated',fontsize=15)
plt.show()

Show the trend of first and second doses

In [47]:
plt.figure(figsize=(14,6))
plt.title('First and Second Dose Daily Counts - Ontario', fontsize=20)
plt.yticks(fontsize=12)
plt.xticks(fontsize=12)
sns.lineplot(data=dfvaccine, x='report_date', y='previous_day_at_least_one', label='First Dose')
sns.lineplot(data=dfvaccine, x='report_date', y='previous_day_fully_vaccinated', label='Second Dose')
plt.xlabel('')
plt.ylabel('Number Vaccinated',fontsize=15)
plt.show()
In [48]:
total_doses = dfvaccine['previous_day_total_doses_administered'].sum()
total_fully_vaccinated = dfvaccine['total_individuals_fully_vaccinated'].max()
total_first_doses = total_doses - total_fully_vaccinated
population = 14734014 # See sources (1)
eligible_pop = population - 1961438 # See sources (2)
vaccine_rate = (total_first_doses / eligible_pop) * 100
vaccine_rate_tot = (total_first_doses /population) * 100
full_vaccine_rate = (total_fully_vaccinated / eligible_pop) * 100
full_vaccine_rate_tot = (total_fully_vaccinated / population) * 100
total_unvaccinated = int(eligible_pop - dfvaccine['total_individuals_at_least_one'].max())
unvaccinated_percentage = round((total_unvaccinated / eligible_pop) * 100,1)
In [49]:
###### print('Fast Sheet')
print("----------")
print("Data Published:", str(dfvaccine['report_date'].iloc[-1])[0:10])
print()
print('Eligible Population - 12 and over')
print('---------------------------------')
print("First Dose Only: ", round((vaccine_rate),1),"%")
print("Fully Vaccinated:", round((full_vaccine_rate),1),"%")
print()

print('Total Population')
print('----------------')
print("First Dose Only: ", round((vaccine_rate_tot),1),"%")
print("Fully Vaccinated:", round((full_vaccine_rate_tot),1),"%")
print()

print("Maximum Vaccinated in one day:", int(dfvaccine['previous_day_total_doses_administered'].max()) )
print("Vaccinated Yesterday", int(dfvaccine['previous_day_total_doses_administered'].tail(1)) )
print()
print("Total individuals with at least one dose:", int(dfvaccine['total_individuals_at_least_one'].max()))
print("Total individuals fully vaccinated:", int(dfvaccine['total_individuals_fully_vaccinated'].max()))
print()
print("Total Percentage of Unvaccinated Individual:", unvaccinated_percentage,"%")
print("Estimated total of eligible population foregoing vaccination:", total_unvaccinated )
----------
Data Published: 2021-09-17

Eligible Population - 12 and over
---------------------------------
First Dose Only:  86.5 %
Fully Vaccinated: 80.3 %

Total Population
----------------
First Dose Only:  75.0 %
Fully Vaccinated: 69.6 %

Maximum Vaccinated in one day: 268884
Vaccinated Yesterday 35285

Total individuals with at least one dose: 11061902
Total individuals fully vaccinated: 10256563

Total Percentage of Unvaccinated Individual: 13.4 %
Estimated total of eligible population foregoing vaccination: 1710674

sources

(1) Vaccine Data from Ontario Open Data Portal

(2) Statistics Canada. Table 17-10-0005-01 Population estimates on July 1st, by age and sex

(3) 1,950,000 is an estimate of population under 12 based from source (2) above. Stats Can lists only pop from 10-14. 1,961,438 represents 60% of that age group. Assumed an even distribution of ages.

In [50]:
dfvacstatus.set_index('Date', inplace=True)
In [51]:
plt.figure(figsize=(14,6))
dfvacstatus[['covid19_cases_unvac', 'covid19_cases_partial_vac', 'covid19_cases_full_vac']].describe().plot(kind='bar')
plt.show()
<Figure size 1008x432 with 0 Axes>
In [52]:
plt.figure(figsize=(16,6))
plt.title('Cases by Vaccine Status - Ontario', fontsize=20)
plt.yticks(fontsize=12)
plt.xticks(fontsize=12, rotation=30)
plt.xlabel('')
plt.ylabel('Cases',fontsize=15,)
sns.lineplot(data=dfvacstatus[['covid19_cases_unvac', 'covid19_cases_partial_vac', 'covid19_cases_full_vac']])
plt.show()
In [53]:
plt.figure(figsize=(16,6))
plt.title('Cases per 100k - Ontario', fontsize=20)
plt.yticks(fontsize=12)
plt.xticks(fontsize=12, rotation=30)
plt.xlabel('')
plt.ylabel('Cases',fontsize=15,)
sns.lineplot(data=dfvacstatus[['cases_unvac_rate_per100K', 'cases_partial_vac_rate_per100K',
       'cases_full_vac_rate_per100K']])
plt.show()
In [ ]: