How did Covid-19 affect Europe economically?

To answer this question, we will first download data from Eurostat to assess the economic development, in particular the hh-income in the EU-27 for 2019 and 2020.

A bigger version of this plot can be found at the bottom of this notebook.

Corona_Map_Europe.png

# Imports 
import eurostat
import pandas as pd

# for mapping
import geopandas as gpd
# import shapefile
import geoplot as gplt
import geoplot.crs as gcrs
from tqdm.auto import tqdm
from shapely.geometry import Point, Polygon
from matplotlib_scalebar.scalebar import ScaleBar
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
import matplotlib.patheffects as pe
# eurostat TOC - Dataframe
toc_df = eurostat.get_toc_df()
eurostat.subset_toc_df(toc_df, "income").head()
title code type last update of data last table structure change data start data end
279 Income of households by NUTS 2 regions NAMA_10R_2HHINC dataset 2023-02-21T23:00:00+0100 2023-02-21T23:00:00+0100 1995 2021
288 Gross value added and income by A*10 industry ... NAMA_10_A10 dataset 2023-04-27T23:00:00+0200 2023-02-20T23:00:00+0100 1975 2022
300 GDP and main components (, expenditure a... NAMA_10_GDP dataset 2023-04-27T23:00:00+0200 2023-02-16T23:00:00+0100 1975 2022
310 GNI (gross national income) per capita in PPS NAMA_10_PP dataset 2022-12-16T23:00:00+0100 2022-12-16T23:00:00+0100 2018 2021
311 Gross value added and income A*10 industry bre... NAMQ_10_A10 dataset 2023-04-27T23:00:00+0200 2023-02-16T23:00:00+0100 1975-Q1 2022-Q4
# Income of households by NUTS 2 regions
hh_income_nuts = eurostat.get_data_df("nama_10r_2hhinc", flags=False)
hh_income_nuts.head(2)
freq unit direct na_item geo\TIME_PERIOD 1995 1996 1997 1998 1999 ... 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
0 A EUR_HAB BAL B5N AT NaN NaN NaN NaN NaN ... 24500.0 24600.0 25000.0 25400.0 25600.0 26500.0 27600.0 28500.0 26900.0 NaN
1 A EUR_HAB BAL B5N AT1 NaN NaN NaN NaN NaN ... 24700.0 24600.0 25000.0 25200.0 25400.0 26100.0 27200.0 28100.0 26600.0 NaN

2 rows × 32 columns

Injection: NUTS

NUTS: Nomenclature of territorial units for statistics

https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units/nuts

The NUTS level 2 is in layman's words a unified district level for European countris. We will look at available data on this NUTS2 level. The data has been previously downloaded as .shp.

Data Preparation

The following gives an overview about the dataset. To adress our question, we will filter the years 2020 and 2019.

# as can be seen, in 2020 264 regions have not reported their data yet. 
hh_income_nuts.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11390 entries, 0 to 11389
Data columns (total 32 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   freq             11390 non-null  object 
 1   unit             11390 non-null  object 
 2   direct           11390 non-null  object 
 3   na_item          11390 non-null  object 
 4   geo\TIME_PERIOD  11390 non-null  object 
 5   1995             1944 non-null   float64
 6   1996             1944 non-null   float64
 7   1997             1944 non-null   float64
 8   1998             1944 non-null   float64
 9   1999             1944 non-null   float64
 10  2000             11108 non-null  float64
 11  2001             11108 non-null  float64
 12  2002             11108 non-null  float64
 13  2003             11108 non-null  float64
 14  2004             11101 non-null  float64
 15  2005             11101 non-null  float64
 16  2006             11114 non-null  float64
 17  2007             11114 non-null  float64
 18  2008             11150 non-null  float64
 19  2009             11150 non-null  float64
 20  2010             11150 non-null  float64
 21  2011             11390 non-null  float64
 22  2012             11390 non-null  float64
 23  2013             11390 non-null  float64
 24  2014             11390 non-null  float64
 25  2015             11342 non-null  float64
 26  2016             11342 non-null  float64
 27  2017             11342 non-null  float64
 28  2018             11342 non-null  float64
 29  2019             11342 non-null  float64
 30  2020             11078 non-null  float64
 31  2021             2096 non-null   float64
dtypes: float64(27), object(5)
memory usage: 2.8+ MB
hh_income_recent = hh_income_nuts[["unit", "direct", "na_item", 
                                   "geo\TIME_PERIOD", "2020", "2019", "2018"]]

There are four categorial variables, and some data variables, starting from 1995 up until 2020. Since there are only 371 NUTS-2 regions and we have 11452 values in 2018, we probably should filter the data.

hh_income_recent["unit"].value_counts()
MIO_EUR              4558
MIO_NAC              4558
EUR_HAB               762
PPS_EU27_2020_HAB     762
MIO_PPS_EU27_2020     750
Name: unit, dtype: int64
# as unit we will select MIO_EUR
nuts2_income_euro = hh_income_recent[hh_income_recent["unit"] == "MIO_EUR"]
nuts2_income_euro["direct"].value_counts()
PAID    1721
RECV    1670
BAL     1167
Name: direct, dtype: int64

As direction we will chose received.

nuts2_income_euro_rec = nuts2_income_euro[nuts2_income_euro["direct"]=="RECV"]

The variable na_item is not about missing values, but about national accounts indicators.

nuts2_income_euro_rec["CC"] = nuts2_income_euro_rec["geo\TIME_PERIOD"].apply(lambda x: x[:2])
/tmp/ipykernel_5671/1167040726.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  nuts2_income_euro_rec["CC"] = nuts2_income_euro_rec["geo\TIME_PERIOD"].apply(lambda x: x[:2])

Unfortunately, this needs to be adressed, since the data was reported for different purposes. A list of na_items can be found here: https://dd.eionet.europa.eu/vocabulary/eurostat/na_item/view

We will include only D1 (Compensation of employees), D4 (Property income). The code D7 (Other current transfers) is a bit unclear and will not be included for further analysis.

nuts2_income_euro_na = nuts2_income_euro_rec[nuts2_income_euro_rec["na_item"].isin(["D1", "D4"])]

Now, we will select only NUTS-2 regions.

# Download and preparation of NUTS 2 Codes. 
nuts_region = pd.read_excel("https://ec.europa.eu/eurostat/documents/345175/629341/NUTS2021.xlsx", 
              sheet_name="NUTS & SR 2021", nrows=1846) # Careful: below statistical regions start
nuts2_codes = nuts_region[~nuts_region["NUTS level 2"].isna()]
nuts2_codes_nn = nuts2_codes[~nuts2_codes["Code 2021"].isna()]
nuts2_income = nuts2_income_euro_na[nuts2_income_euro_na["geo\TIME_PERIOD"]\
                                    .isin(nuts2_codes_nn["Code 2021"])]

Checking, if many NA-values are present

nuts2_income.groupby(["direct", "na_item"]).agg({np.mean, lambda x: x.isnull().sum()})
/tmp/ipykernel_5671/1579563077.py:1: FutureWarning: ['unit', 'geo\\TIME_PERIOD', 'CC'] did not aggregate successfully. If any error is raised this will raise in a future version of pandas. Drop these columns/ops to avoid this warning.
  nuts2_income.groupby(["direct", "na_item"]).agg({np.mean, lambda x: x.isnull().sum()})
2020 2019 2018
<lambda_0> mean <lambda_0> mean <lambda_0> mean
direct na_item
RECV D1 2 26422.948612 1 26734.023862 1 25748.720650
D4 1 3370.212593 1 3870.998313 1 3891.949794

Now, both the property income as well as the labour income will be added.

nuts2_income_final = nuts2_income.groupby(["geo\TIME_PERIOD"]).sum().reset_index()
nuts2_income_final.rename(columns={"geo\TIME_PERIOD":"NUTS_Code"}, inplace=True)
nuts2_income_final.head()
/tmp/ipykernel_5671/2980442314.py:1: FutureWarning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
  nuts2_income_final = nuts2_income.groupby(["geo\TIME_PERIOD"]).sum().reset_index()
NUTS_Code 2020 2019 2018
0 AT11 6621.0 7019.0 6751.0
1 AT12 40437.0 42744.0 41233.0
2 AT13 42842.0 45370.0 43286.0
3 AT21 11975.0 12820.0 12371.0
4 AT22 27689.0 29658.0 28500.0

We got a coverage of 250 NUTS 2 regions. Now we will calculate a simple corona indicator as the difference between 2020 and 2019, relative to 2019. The lower this number, the lower was the difference in economic growth between those 2 years.

nuts2_income_final["corona"] = (nuts2_income_final["2020"] - nuts2_income_final["2019"]) /\
    nuts2_income_final["2019"] * 100
nuts2_income_final.head()
NUTS_Code 2020 2019 2018 corona
0 AT11 6621.0 7019.0 6751.0 -5.670323
1 AT12 40437.0 42744.0 41233.0 -5.397249
2 AT13 42842.0 45370.0 43286.0 -5.571964
3 AT21 11975.0 12820.0 12371.0 -6.591264
4 AT22 27689.0 29658.0 28500.0 -6.639018

Mapping of the data

# read the NUTS shapefile and extract the polygons for a individual countries
nuts = gpd.read_file("./Data/Nuts2/NUTS_RG_20M_2021_3035.shp")
nuts.crs = "EPSG:3035"
# representitive_point for labelling (in die Spalte coords)
nuts['coords'] = nuts['geometry'].apply(lambda x: x.representative_point().coords[:])
nuts['coords'] = [coords[0] for coords in nuts['coords']]
countries = nuts[nuts["LEVL_CODE"]==0]
countries_sel = countries[~countries["NUTS_ID"].isin(["TR", "IS", "UK", "LI", "NO", 
                                                      "AL", "ME", "MK", "RS", "CH"])]
# countries_sel = countries
# Data Selection, remove duplicates
countries_lv2 = nuts[nuts["LEVL_CODE"]==2]
countries_lv2_unique = countries_lv2[~countries_lv2.duplicated(subset="NUTS_ID")]
# Combination of Eurostat-Data with NUTS2-map 
df_merged = countries_lv2_unique.merge(nuts2_income_final, left_on='FID', 
                       right_on='NUTS_Code')
# European Bounding Box 
# To reduce the map size (and not to include e.g. Caribean islands)
# http://bboxfinder.com/#35.317366,-24.082031,71.413177,46.230469
x1 = 1303552.3576+1000000
y1 = 2047374.7227-1000000
x2 = 5556364.6588+1500000
y2 = 5701700.2014

np1 = (x1, y1)
np2 = (x1, y2)
np3 = (x2, y2)
np4 = (x2, y1)

bb_polygon = Polygon([np1, np2, np3, np4])
bb = gpd.GeoDataFrame(gpd.GeoSeries(bb_polygon), columns=['geometry'])
bb.crs = "EPSG:3035"
europa = gpd.overlay(df_merged, bb, how='intersection')
europa_countries = gpd.overlay(countries_sel, bb, how='intersection')
map = europa.plot(column = 'corona', 
                      legend = True, 
                      figsize = [20,12],
                      vmax=35,
                      cmap="BrBG", norm=mcolors.TwoSlopeNorm(vcenter = 0, 
                                                             vmin = europa.corona.min(), 
                                                             vmax = europa.corona.max()), 
                      legend_kwds = { 
                                     'format': "%.0f"
                                    }
           )
map.get_xaxis().set_visible(False)
map.get_yaxis().set_visible(False)

for x, y, label in zip(countries_sel.coords.str[0], 
                       countries_sel.coords.str[1], 
                       countries_sel.CNTR_CODE):
    map.annotate(label, xy=(x, y), c="black",  
                 path_effects=[pe.withStroke(linewidth=1, foreground="white")])

# countries_sel.plot(ax=map, column="geometry", edgecolor='black')
europa_countries.geometry.boundary.plot(color=None,edgecolor='k', 
                                        linewidth = 0.5,  ax=map)

plt.suptitle("How Covid-19 hit Household Incomes in the EU", fontsize=25)
map.set_title("Difference in HH-income (received) of EU-27 between 2020 and 2019 [in % 2019] \nAvailable data reported on NUTS-2 level. Level of completeness: 97.5 %. "
              ,fontsize=15, loc="left")
# map.patch.set_facecolor('black')
map.add_artist(ScaleBar(1))
map.text(y=y1, x = x2-23e5, 
         s="© EuroGeographics for the administrative boundaries. \nData: Eurostat. C. Brandstätter www.brandata.at, April 2023") 
plt.savefig('Corona_Map_Europe.png')

Corona Map bigger