R - Jupyter script
File: Comtrade_Recode.R
Author: Christian Brandstätter
Contact: bran.chri@gmail.com
Date: 26.1.2019
Copyright (C) 2019
Description: The combined data need to be harmonized: there are several different classification systems. In the comtrade database the data is stored the way it was created.
library(plyr)
library(dplyr)
library(openxlsx) # zum Importieren der großen Excel-Daten
library(ggplot2)
library(ggthemes)
library(reshape2)
Attaching package: ‘dplyr’
The following objects are masked from ‘package:plyr’:
arrange, count, desc, failwith, id, mutate, rename, summarise,
summarize
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
fa_clean <- function(x) {
varname <- deparse(substitute(x))
require(dplyr)
x <- x %>% mutate_if(is.factor, as.character)
assign(varname, x, envir = .GlobalEnv)
}
setwd("../")
# read previously generated Data
ctdat <- fa_clean(read.csv("./2_pipeline/Combine_Downloads/combo_ct_dat.csv"))
head(ctdat,2)
Classification | Year | Aggregate_Level | Is_Leaf_Code | Trade_Flow | Reporter | Partner | Commodity_Code | Commodity | Qty_Unit_Code | Qty_Unit | Alt_Qty_Unit | Netweight_kg | Trade_Value_USD | Flag |
S1 | 1963 | 1 | 0 | Import | Austria | World | 0 | Food and live animals | 1 | No Quantity | NA | NA | 194876768 | 0 |
S1 | 1963 | 1 | 0 | Export | Austria | World | 0 | Food and live animals | 1 | No Quantity | NA | NA | 71546640 | 0 |
# As mentioned before, there are seven different classification systems in use.
entries_per_class <- ctdat %>%
group_by(Classification) %>%
summarise(count = n())
ggplot(entries_per_class,aes(x=Classification,y=count, fill = Classification))+
geom_bar(stat="identity")+
scale_fill_brewer(palette="Greens")+
labs(title="Austria Comtrade Commodities", subtitle= "Autria vs. World, Import and Export, Entries per Classification")+
theme_minimal(12)

# Thus, we need a mapping between the Classification Systems to harmonize our tables.
# Downloaded only once, then the following download was commented.
#download.file("https://unstats.un.org/unsd/trade/classifications/tables/CompleteCorrelationsOfHS-SITC-BEC_20170606.xlsx",
# "./0_data/Commodity Classifications/CompleteCorrelationsOfHS-SITC-BEC_20170606.xlsx")
corr_table <- read.xlsx("./0_data/Commodity Classifications/CompleteCorrelationsOfHS-SITC-BEC_20170606.xlsx")
head(corr_table,2)
HS92 | HS96 | HS02 | HS07 | HS12 | HS17 | BEC4 | SITC1 | SITC2 | SITC3 | SITC4 |
010111 | 010111 | 010110 | 010110 | 010121 | 010121 | 41 | 0015 | 0015 | 00151 | 0015 |
010111 | 010111 | 010110 | 010110 | 010121 | 010121 | 41 | 0015 | 0015 | 00152 | 0015 |
# The downloaded correlation table contains all the codes, we also need to ensure the proper naming of the Commodity Categories in the system we choose.
# Here is a table containing all the commodities (incl. Description) of Comtrade in all Classification Systems.
# Downloaded only once, then the following download was commented.
# download.file("http://unstats.un.org/unsd/tradekb/Attachment439.aspx?AttachmentType=1",
# "./0_data/Commodity Classifications/UN_Comtrade_Commodity_Classifications.xlsx")
class_table <- read.xlsx("./0_data/Commodity Classifications/UN_Comtrade_Commodity_Classifications.xlsx")
head(class_table,2)
Classification | Code | Description | Code.Parent | Level | isLeaf |
BE | 1 | Food and beverages | TOTAL | 1 | 0 |
BE | 11 | Food and beverages, primary | 1 | 2 | 0 |
# Now we can proceed with the harmonizing: We select, without much reasoning, the newest system SITC4.
# The mapping must occur in our downloaded data. For that we create a new variable:
ctdat$newClass <- NA # new variable for the chosen System
head(ctdat) # A quick look at the class_table for recoding
Classification | Year | Aggregate_Level | Is_Leaf_Code | Trade_Flow | Reporter | Partner | Commodity_Code | Commodity | Qty_Unit_Code | Qty_Unit | Alt_Qty_Unit | Netweight_kg | Trade_Value_USD | Flag | newClass |
S1 | 1963 | 1 | 0 | Import | Austria | World | 0 | Food and live animals | 1 | No Quantity | NA | NA | 194876768 | 0 | NA |
S1 | 1963 | 1 | 0 | Export | Austria | World | 0 | Food and live animals | 1 | No Quantity | NA | NA | 71546640 | 0 | NA |
S1 | 1963 | 2 | 0 | Import | Austria | World | 00 | Live animals | 1 | No Quantity | NA | NA | 8827853 | 0 | NA |
S1 | 1963 | 2 | 0 | Export | Austria | World | 00 | Live animals | 1 | No Quantity | NA | NA | 40104828 | 0 | NA |
S1 | 1963 | 3 | 0 | Import | Austria | World | 001 | Live animals | 8 | Weight in kilograms | 21781116 | 21781116 | 8827853 | 0 | NA |
S1 | 1963 | 3 | 0 | Export | Austria | World | 001 | Live animals | 8 | Weight in kilograms | 87052088 | 87052088 | 40104828 | 0 | NA |
str(ctdat)
'data.frame': 426813 obs. of 16 variables:
$ Classification : chr "S1" "S1" "S1" "S1" ...
$ Year : int 1963 1963 1963 1963 1963 1963 1963 1963 1963 1963 ...
$ Aggregate_Level: int 1 1 2 2 3 3 4 4 4 4 ...
$ Is_Leaf_Code : int 0 0 0 0 0 0 1 1 1 1 ...
$ Trade_Flow : chr "Import" "Export" "Import" "Export" ...
$ Reporter : chr "Austria" "Austria" "Austria" "Austria" ...
$ Partner : chr "World" "World" "World" "World" ...
$ Commodity_Code : chr "0" "0" "00" "00" ...
$ Commodity : chr "Food and live animals" "Food and live animals" "Live animals" "Live animals" ...
$ Qty_Unit_Code : int 1 1 1 1 8 8 8 8 8 8 ...
$ Qty_Unit : chr "No Quantity" "No Quantity" "No Quantity" "No Quantity" ...
$ Alt_Qty_Unit : num NA NA NA NA 21781116 ...
$ Netweight_kg : num NA NA NA NA 21781116 ...
$ Trade_Value_USD: num 1.95e+08 7.15e+07 8.83e+06 4.01e+07 8.83e+06 ...
$ Flag : int 0 0 0 0 0 0 0 0 0 0 ...
$ newClass : logi NA NA NA NA NA NA ...
# Actual recoding with the handy mapvalues function from the plyr package: here the different classification codes are mapped against the classification table.
# It is better to work with the code instead of the description, when available, as the description may contain typing mistakes or different writings, etc.
ctdat$newClass[ctdat$Classification=="H0"] <- mapvalues(ctdat$Commodity_Code[ctdat$Classification=="H0"],corr_table$HS92,corr_table$SITC4,warn_missing=FALSE)
ctdat$newClass[ctdat$Classification=="H1"] <- mapvalues(ctdat$Commodity_Code[ctdat$Classification=="H1"],corr_table$HS96,corr_table$SITC4,warn_missing=FALSE)
ctdat$newClass[ctdat$Classification=="H2"] <- mapvalues(ctdat$Commodity_Code[ctdat$Classification=="H2"],corr_table$HS02,corr_table$SITC4,warn_missing=FALSE)
ctdat$newClass[ctdat$Classification=="H3"] <- mapvalues(ctdat$Commodity_Code[ctdat$Classification=="H3"],corr_table$HS07,corr_table$SITC4,warn_missing=FALSE)
ctdat$newClass[ctdat$Classification=="H4"] <- mapvalues(ctdat$Commodity_Code[ctdat$Classification=="H4"],corr_table$HS12,corr_table$SITC4,warn_missing=FALSE)
ctdat$newClass[ctdat$Classification=="H5"] <- mapvalues(ctdat$Commodity_Code[ctdat$Classification=="H5"],corr_table$HS17,corr_table$SITC4,warn_missing=FALSE)
ctdat$newClass[ctdat$Classification=="S1"] <- mapvalues(ctdat$Commodity_Code[ctdat$Classification=="S1"],corr_table$SITC1,corr_table$SITC4,warn_missing=FALSE)
ctdat$newClass[ctdat$Classification=="S2"] <- mapvalues(ctdat$Commodity_Code[ctdat$Classification=="S2"],corr_table$SITC2,corr_table$SITC4,warn_missing=FALSE)
ctdat$newClass[ctdat$Classification=="S3"] <- mapvalues(ctdat$Commodity_Code[ctdat$Classification=="S3"],corr_table$SITC3,corr_table$SITC4,warn_missing=FALSE)
S4_classes <- class_table[which(class_table$Classification=="S4"),]
# str(S4_classes)
ctdat$CatS4 <- mapvalues(ctdat$newClass,S4_classes$Code,S4_classes$Description,warn_missing=FALSE)
head(ctdat,5)
Classification | Year | Aggregate_Level | Is_Leaf_Code | Trade_Flow | Reporter | Partner | Commodity_Code | Commodity | Qty_Unit_Code | Qty_Unit | Alt_Qty_Unit | Netweight_kg | Trade_Value_USD | Flag | newClass | CatS4 |
S1 | 1963 | 1 | 0 | Import | Austria | World | 0 | Food and live animals | 1 | No Quantity | NA | NA | 194876768 | 0 | 0 | Food and live animals |
S1 | 1963 | 1 | 0 | Export | Austria | World | 0 | Food and live animals | 1 | No Quantity | NA | NA | 71546640 | 0 | 0 | Food and live animals |
S1 | 1963 | 2 | 0 | Import | Austria | World | 00 | Live animals | 1 | No Quantity | NA | NA | 8827853 | 0 | 00 | Live animals other than animals of division 03 |
S1 | 1963 | 2 | 0 | Export | Austria | World | 00 | Live animals | 1 | No Quantity | NA | NA | 40104828 | 0 | 00 | Live animals other than animals of division 03 |
S1 | 1963 | 3 | 0 | Import | Austria | World | 001 | Live animals | 8 | Weight in kilograms | 21781116 | 21781116 | 8827853 | 0 | 001 | Live animals other than animals of division 03 |
categories_S4_unique <- unique(ctdat$CatS4)
egglist <- categories_S4_unique[grep("egg",categories_S4_unique)]
egglist
- 'Dairy products and birds’ eggs'
- 'Eggs, birds\', and egg yolks, fresh, dried or otherwise preserved, sweetened or not; egg albumin'
- 'Birds\' eggs, in shell, fresh, preserved/cooked'
- 'Parts of footwear (including uppers, whether/not attached to soles other than outer soles); removable insoles, heel cushions & similar articles; gaiters, leggings & similar articles & parts thereof'
- 'Birds\' eggs, not in shell, & egg yolks, dried'
- 'Harvesting or threshing machinery (including straw or fodder balers); grass or hay mowers; machines for cleaning, sorting or grading seed or grain or for grading eggs, fruit or other agricultural produce (other than milling machinery of heading 727.11); p'
- 'Machines for cleaning, sorting/grading eggs, fruit/other agricultural produce'
- 'Birds\' eggs, not in shell, and egg yolks'
- 'Birds\' eggs, not in shell, & egg yolks, other than dried'
- 'Fish, prepared or preserved, n.e.s.; caviar and caviar substitutes prepared from fish eggs.'
- 'Caviar & caviar substitutes prepared from fish eggs'
- 'Albumins (including concentrates of two/more whey proteins, containing by weight > 80% whey proteins, calculated on the dry matter), other than egg albumin; albuminates & other albumin derivatives'
egglist_select <- egglist[c(1,2,3,5,8,9)]
eggdat_comtrade <- ctdat[(ctdat$CatS4 %in% egglist_select),]
head(eggdat_comtrade)
| Classification | Year | Aggregate_Level | Is_Leaf_Code | Trade_Flow | Reporter | Partner | Commodity_Code | Commodity | Qty_Unit_Code | Qty_Unit | Alt_Qty_Unit | Netweight_kg | Trade_Value_USD | Flag | newClass | CatS4 |
49 | S1 | 1963 | 2 | 0 | Import | Austria | World | 02 | Dairy products and eggs | 1 | No Quantity | NA | NA | 13429270 | 0 | 02 | Dairy products and birds’ eggs |
50 | S1 | 1963 | 2 | 0 | Export | Austria | World | 02 | Dairy products and eggs | 1 | No Quantity | NA | NA | 18056338 | 0 | 02 | Dairy products and birds’ eggs |
66 | S1 | 1963 | 3 | 0 | Import | Austria | World | 025 | Eggs | 8 | Weight in kilograms | 15815741 | 15815741 | 9414903 | 0 | 025 | Eggs, birds', and egg yolks, fresh, dried or otherwise preserved, sweetened or not; egg albumin |
67 | S1 | 1963 | 3 | 0 | Export | Austria | World | 025 | Eggs | 8 | Weight in kilograms | 10608 | 10608 | 16229 | 0 | 025 | Eggs, birds', and egg yolks, fresh, dried or otherwise preserved, sweetened or not; egg albumin |
68 | S1 | 1963 | 4 | 0 | Import | Austria | World | 0250 | Eggs | 8 | Weight in kilograms | 15815741 | 15815741 | 9414903 | 0 | 0251 | Birds' eggs, in shell, fresh, preserved/cooked |
69 | S1 | 1963 | 4 | 0 | Export | Austria | World | 0250 | Eggs | 8 | Weight in kilograms | 10608 | 10608 | 16229 | 0 | 0251 | Birds' eggs, in shell, fresh, preserved/cooked |
table(eggdat_comtrade$CatS4)
Birds' eggs, in shell, fresh, preserved/cooked
158
Birds' eggs, not in shell, & egg yolks, dried
127
Birds' eggs, not in shell, & egg yolks, other than dried
105
Birds' eggs, not in shell, and egg yolks
12
Dairy products and birds’ eggs
110
Eggs, birds', and egg yolks, fresh, dried or otherwise preserved, sweetened or not; egg albumin
62
eggdat_comtrade$CatS4[grep("albumin",eggdat_comtrade$CatS4)] <- "Unspecified egg"
# The category of dairy products and bird's eggs is already hard to distinguish between egg and dairy products.
# This category is even less useful, when no weight data ist given.
egg_na <- eggdat_comtrade[is.na(eggdat_comtrade$Netweight_kg),]
table(egg_ct$CatS4)
Birds' eggs, in shell, fresh, preserved/cooked
1
Dairy products and birds’ eggs
106
# For now, we have to exclude these data.
egg_set <- eggdat_comtrade[!is.na(eggdat_comtrade$Netweight_kg),]
head(egg_set,2)
| Classification | Year | Aggregate_Level | Is_Leaf_Code | Trade_Flow | Reporter | Partner | Commodity_Code | Commodity | Qty_Unit_Code | Qty_Unit | Alt_Qty_Unit | Netweight_kg | Trade_Value_USD | Flag | newClass | CatS4 |
66 | S1 | 1963 | 3 | 0 | Import | Austria | World | 025 | Eggs | 8 | Weight in kilograms | 15815741 | 15815741 | 9414903 | 0 | 025 | Unspecified egg |
67 | S1 | 1963 | 3 | 0 | Export | Austria | World | 025 | Eggs | 8 | Weight in kilograms | 10608 | 10608 | 16229 | 0 | 025 | Unspecified egg |
table(egg_set$CatS4)
Birds' eggs, in shell, fresh, preserved/cooked
157
Birds' eggs, not in shell, & egg yolks, dried
127
Birds' eggs, not in shell, & egg yolks, other than dried
105
Birds' eggs, not in shell, and egg yolks
12
Dairy products and birds’ eggs
4
Unspecified egg
62
egg_plot <- egg_set %>%
group_by(Year, CatS4,Trade_Flow) %>%
summarize(Netweight_kg = sum(Netweight_kg))
head(egg_plot)
Year | CatS4 | Trade_Flow | Netweight_kg |
1963 | Birds' eggs, in shell, fresh, preserved/cooked | Export | 10608 |
1963 | Birds' eggs, in shell, fresh, preserved/cooked | Import | 15815741 |
1963 | Unspecified egg | Export | 10608 |
1963 | Unspecified egg | Import | 15815741 |
1964 | Birds' eggs, in shell, fresh, preserved/cooked | Export | 53886 |
1964 | Birds' eggs, in shell, fresh, preserved/cooked | Import | 14164452 |
# Now we can create a timeline of egg import and export:
ggplot(egg_plot,aes(x=Year,y=Netweight_kg/1000, fill=CatS4))+
geom_area()+theme_minimal()+
facet_grid(cols=vars(Trade_Flow))+
labs(title="Egg Trade Balance Austria:World",
subtitle= "Stacked Area Plot",
caption="Data Source: Comtrade",
color = "",
y = "Weight [1,000 kg]")+
theme(legend.position="bottom",
legend.text=element_text(size=7))+
guides(fill=guide_legend(nrow=3, byrow=TRUE))+
scale_fill_brewer(palette="Spectral")
ggsave("./3_output/first_egg_balance.png",width=13,height=10)

And we see now, that Austria imports by far more eggs (raw or dried) than it exports.
To continue, we could try to approximate the amount of nitrogen and phosphorous that enters and exists Austria via the egg-pathway.
We could relate that to fertilizer import and export to get a measure of magnitude.