How did Covid-19 affect Europe economically?

To answer this question, we will first download data from Eurostat to assess the economic development, in particular the hh-income in the EU-27 for 2019 and 2020.

A bigger version of this plot can be found at the bottom of this notebook.

# Imports 
import eurostat
import pandas as pd

# for mapping
import geopandas as gpd
# import shapefile
import geoplot as gplt
import geoplot.crs as gcrs
from tqdm.auto import tqdm
from shapely.geometry import Point, Polygon
from matplotlib_scalebar.scalebar import ScaleBar
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
import matplotlib.patheffects as pe

# eurostat TOC - Dataframe
toc_df = eurostat.get_toc_df()

eurostat.subset_toc_df(toc_df, "income").head()

	title	code	type	last update of data	last table structure change	data start	data end
279	Income of households by NUTS 2 regions	NAMA_10R_2HHINC	dataset	2023-02-21T23:00:00+0100	2023-02-21T23:00:00+0100	1995	2021
288	Gross value added and income by A*10 industry ...	NAMA_10_A10	dataset	2023-04-27T23:00:00+0200	2023-02-20T23:00:00+0100	1975	2022
300	GDP and main components (, expenditure a...	NAMA_10_GDP	dataset	2023-04-27T23:00:00+0200	2023-02-16T23:00:00+0100	1975	2022
310	GNI (gross national income) per capita in PPS	NAMA_10_PP	dataset	2022-12-16T23:00:00+0100	2022-12-16T23:00:00+0100	2018	2021
311	Gross value added and income A*10 industry bre...	NAMQ_10_A10	dataset	2023-04-27T23:00:00+0200	2023-02-16T23:00:00+0100	1975-Q1	2022-Q4

# Income of households by NUTS 2 regions
hh_income_nuts = eurostat.get_data_df("nama_10r_2hhinc", flags=False)
hh_income_nuts.head(2)

	freq	unit	direct	na_item	geo\TIME_PERIOD	1995	1996	1997	1998	1999	...	2012	2013	2014	2015	2016	2017	2018	2019	2020	2021
0	A	EUR_HAB	BAL	B5N	AT	NaN	NaN	NaN	NaN	NaN	...	24500.0	24600.0	25000.0	25400.0	25600.0	26500.0	27600.0	28500.0	26900.0	NaN
1	A	EUR_HAB	BAL	B5N	AT1	NaN	NaN	NaN	NaN	NaN	...	24700.0	24600.0	25000.0	25200.0	25400.0	26100.0	27200.0	28100.0	26600.0	NaN

2 rows × 32 columns

Injection: NUTS

NUTS: Nomenclature of territorial units for statistics

https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units/nuts

The NUTS level 2 is in layman's words a unified district level for European countris. We will look at available data on this NUTS2 level. The data has been previously downloaded as .shp.

Data Preparation

The following gives an overview about the dataset. To adress our question, we will filter the years 2020 and 2019.

# as can be seen, in 2020 264 regions have not reported their data yet. 
hh_income_nuts.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11390 entries, 0 to 11389
Data columns (total 32 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   freq             11390 non-null  object 
 1   unit             11390 non-null  object 
 2   direct           11390 non-null  object 
 3   na_item          11390 non-null  object 
 4   geo\TIME_PERIOD  11390 non-null  object 
 5   1995             1944 non-null   float64
 6   1996             1944 non-null   float64
 7   1997             1944 non-null   float64
 8   1998             1944 non-null   float64
 9   1999             1944 non-null   float64
 10  2000             11108 non-null  float64
 11  2001             11108 non-null  float64
 12  2002             11108 non-null  float64
 13  2003             11108 non-null  float64
 14  2004             11101 non-null  float64
 15  2005             11101 non-null  float64
 16  2006             11114 non-null  float64
 17  2007             11114 non-null  float64
 18  2008             11150 non-null  float64
 19  2009             11150 non-null  float64
 20  2010             11150 non-null  float64
 21  2011             11390 non-null  float64
 22  2012             11390 non-null  float64
 23  2013             11390 non-null  float64
 24  2014             11390 non-null  float64
 25  2015             11342 non-null  float64
 26  2016             11342 non-null  float64
 27  2017             11342 non-null  float64
 28  2018             11342 non-null  float64
 29  2019             11342 non-null  float64
 30  2020             11078 non-null  float64
 31  2021             2096 non-null   float64
dtypes: float64(27), object(5)
memory usage: 2.8+ MB

hh_income_recent = hh_income_nuts[["unit", "direct", "na_item", 
                                   "geo\TIME_PERIOD", "2020", "2019", "2018"]]

There are four categorial variables, and some data variables, starting from 1995 up until 2020. Since there are only 371 NUTS-2 regions and we have 11452 values in 2018, we probably should filter the data.

hh_income_recent["unit"].value_counts()

MIO_EUR              4558
MIO_NAC              4558
EUR_HAB               762
PPS_EU27_2020_HAB     762
MIO_PPS_EU27_2020     750
Name: unit, dtype: int64

# as unit we will select MIO_EUR
nuts2_income_euro = hh_income_recent[hh_income_recent["unit"] == "MIO_EUR"]

nuts2_income_euro["direct"].value_counts()

PAID    1721
RECV    1670
BAL     1167
Name: direct, dtype: int64

As direction we will chose received.

nuts2_income_euro_rec = nuts2_income_euro[nuts2_income_euro["direct"]=="RECV"]

The variable na_item is not about missing values, but about national accounts indicators.

nuts2_income_euro_rec["CC"] = nuts2_income_euro_rec["geo\TIME_PERIOD"].apply(lambda x: x[:2])

/tmp/ipykernel_5671/1167040726.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  nuts2_income_euro_rec["CC"] = nuts2_income_euro_rec["geo\TIME_PERIOD"].apply(lambda x: x[:2])

Unfortunately, this needs to be adressed, since the data was reported for different purposes. A list of na_items can be found here: https://dd.eionet.europa.eu/vocabulary/eurostat/na_item/view

We will include only D1 (Compensation of employees), D4 (Property income). The code D7 (Other current transfers) is a bit unclear and will not be included for further analysis.

nuts2_income_euro_na = nuts2_income_euro_rec[nuts2_income_euro_rec["na_item"].isin(["D1", "D4"])]

Now, we will select only NUTS-2 regions.

# Download and preparation of NUTS 2 Codes. 
nuts_region = pd.read_excel("https://ec.europa.eu/eurostat/documents/345175/629341/NUTS2021.xlsx", 
              sheet_name="NUTS & SR 2021", nrows=1846) # Careful: below statistical regions start
nuts2_codes = nuts_region[~nuts_region["NUTS level 2"].isna()]
nuts2_codes_nn = nuts2_codes[~nuts2_codes["Code 2021"].isna()]

nuts2_income = nuts2_income_euro_na[nuts2_income_euro_na["geo\TIME_PERIOD"]\
                                    .isin(nuts2_codes_nn["Code 2021"])]

Checking, if many NA-values are present

nuts2_income.groupby(["direct", "na_item"]).agg({np.mean, lambda x: x.isnull().sum()})

/tmp/ipykernel_5671/1579563077.py:1: FutureWarning: ['unit', 'geo\\TIME_PERIOD', 'CC'] did not aggregate successfully. If any error is raised this will raise in a future version of pandas. Drop these columns/ops to avoid this warning.
  nuts2_income.groupby(["direct", "na_item"]).agg({np.mean, lambda x: x.isnull().sum()})

		2020		2019		2018
		<lambda_0>	mean	<lambda_0>	mean	<lambda_0>	mean
direct	na_item
RECV	D1	2	26422.948612	1	26734.023862	1	25748.720650
RECV	D4	1	3370.212593	1	3870.998313	1	3891.949794

Now, both the property income as well as the labour income will be added.

nuts2_income_final = nuts2_income.groupby(["geo\TIME_PERIOD"]).sum().reset_index()
nuts2_income_final.rename(columns={"geo\TIME_PERIOD":"NUTS_Code"}, inplace=True)
nuts2_income_final.head()

/tmp/ipykernel_5671/2980442314.py:1: FutureWarning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
  nuts2_income_final = nuts2_income.groupby(["geo\TIME_PERIOD"]).sum().reset_index()

	NUTS_Code	2020	2019	2018
0	AT11	6621.0	7019.0	6751.0
1	AT12	40437.0	42744.0	41233.0
2	AT13	42842.0	45370.0	43286.0
3	AT21	11975.0	12820.0	12371.0
4	AT22	27689.0	29658.0	28500.0

We got a coverage of 250 NUTS 2 regions. Now we will calculate a simple corona indicator as the difference between 2020 and 2019, relative to 2019. The lower this number, the lower was the difference in economic growth between those 2 years.

nuts2_income_final["corona"] = (nuts2_income_final["2020"] - nuts2_income_final["2019"]) /\
    nuts2_income_final["2019"] * 100

nuts2_income_final.head()

	NUTS_Code	2020	2019	2018	corona
0	AT11	6621.0	7019.0	6751.0	-5.670323
1	AT12	40437.0	42744.0	41233.0	-5.397249
2	AT13	42842.0	45370.0	43286.0	-5.571964
3	AT21	11975.0	12820.0	12371.0	-6.591264
4	AT22	27689.0	29658.0	28500.0	-6.639018

Mapping of the data

# read the NUTS shapefile and extract the polygons for a individual countries
nuts = gpd.read_file("./Data/Nuts2/NUTS_RG_20M_2021_3035.shp")
nuts.crs = "EPSG:3035"

# representitive_point for labelling (in die Spalte coords)
nuts['coords'] = nuts['geometry'].apply(lambda x: x.representative_point().coords[:])
nuts['coords'] = [coords[0] for coords in nuts['coords']]

countries = nuts[nuts["LEVL_CODE"]==0]

countries_sel = countries[~countries["NUTS_ID"].isin(["TR", "IS", "UK", "LI", "NO", 
                                                      "AL", "ME", "MK", "RS", "CH"])]

# countries_sel = countries

# Data Selection, remove duplicates
countries_lv2 = nuts[nuts["LEVL_CODE"]==2]
countries_lv2_unique = countries_lv2[~countries_lv2.duplicated(subset="NUTS_ID")]

# Combination of Eurostat-Data with NUTS2-map 
df_merged = countries_lv2_unique.merge(nuts2_income_final, left_on='FID', 
                       right_on='NUTS_Code')

# European Bounding Box 
# To reduce the map size (and not to include e.g. Caribean islands)
# http://bboxfinder.com/#35.317366,-24.082031,71.413177,46.230469
x1 = 1303552.3576+1000000
y1 = 2047374.7227-1000000
x2 = 5556364.6588+1500000
y2 = 5701700.2014

np1 = (x1, y1)
np2 = (x1, y2)
np3 = (x2, y2)
np4 = (x2, y1)

bb_polygon = Polygon([np1, np2, np3, np4])
bb = gpd.GeoDataFrame(gpd.GeoSeries(bb_polygon), columns=['geometry'])
bb.crs = "EPSG:3035"
europa = gpd.overlay(df_merged, bb, how='intersection')
europa_countries = gpd.overlay(countries_sel, bb, how='intersection')

map = europa.plot(column = 'corona', 
                      legend = True, 
                      figsize = [20,12],
                      vmax=35,
                      cmap="BrBG", norm=mcolors.TwoSlopeNorm(vcenter = 0, 
                                                             vmin = europa.corona.min(), 
                                                             vmax = europa.corona.max()), 
                      legend_kwds = { 
                                     'format': "%.0f"
                                    }
           )
map.get_xaxis().set_visible(False)
map.get_yaxis().set_visible(False)

for x, y, label in zip(countries_sel.coords.str[0], 
                       countries_sel.coords.str[1], 
                       countries_sel.CNTR_CODE):
    map.annotate(label, xy=(x, y), c="black",  
                 path_effects=[pe.withStroke(linewidth=1, foreground="white")])

# countries_sel.plot(ax=map, column="geometry", edgecolor='black')
europa_countries.geometry.boundary.plot(color=None,edgecolor='k', 
                                        linewidth = 0.5,  ax=map)

plt.suptitle("How Covid-19 hit Household Incomes in the EU", fontsize=25)
map.set_title("Difference in HH-income (received) of EU-27 between 2020 and 2019 [in % 2019] \nAvailable data reported on NUTS-2 level. Level of completeness: 97.5 %. "
              ,fontsize=15, loc="left")
# map.patch.set_facecolor('black')
map.add_artist(ScaleBar(1))
map.text(y=y1, x = x2-23e5, 
         s="© EuroGeographics for the administrative boundaries. \nData: Eurostat. C. Brandstätter www.brandata.at, April 2023") 
plt.savefig('Corona_Map_Europe.png')

Corona Map bigger