R - Jupyter script
File: Comtrade_Recode.R
Author: Christian Brandstätter
Contact: bran.chri@gmail.com
Date: 26.1.2019
Copyright (C) 2019

Description: The combined data need to be harmonized: there are several different classification systems. In the comtrade database the data is stored the way it was created.

library(plyr)
library(dplyr)
library(openxlsx) # zum Importieren der großen Excel-Daten
library(ggplot2)
library(ggthemes)
library(reshape2)
Attaching package: ‘dplyr’

The following objects are masked from ‘package:plyr’:

    arrange, count, desc, failwith, id, mutate, rename, summarise,
    summarize

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union
fa_clean <- function(x) {
  varname <- deparse(substitute(x))
  require(dplyr)
  x <- x %>% mutate_if(is.factor, as.character) 
  assign(varname, x, envir = .GlobalEnv)
}
setwd("../")
# read previously generated Data 
ctdat <- fa_clean(read.csv("./2_pipeline/Combine_Downloads/combo_ct_dat.csv"))
head(ctdat,2)
ClassificationYearAggregate_LevelIs_Leaf_CodeTrade_FlowReporterPartnerCommodity_CodeCommodityQty_Unit_CodeQty_UnitAlt_Qty_UnitNetweight_kgTrade_Value_USDFlag
S1 1963 1 0 Import Austria World 0 Food and live animals1 No Quantity NA NA 194876768 0
S1 1963 1 0 Export Austria World 0 Food and live animals1 No Quantity NA NA 71546640 0
# As mentioned before, there are seven different classification systems in use. 
entries_per_class <-  ctdat %>%
group_by(Classification) %>%
summarise(count = n())

ggplot(entries_per_class,aes(x=Classification,y=count, fill = Classification))+
geom_bar(stat="identity")+
scale_fill_brewer(palette="Greens")+
labs(title="Austria Comtrade Commodities", subtitle= "Autria vs. World, Import and Export, Entries per Classification")+
theme_minimal(12)

png

# Thus, we need a mapping between the Classification Systems to harmonize our tables. 
# Downloaded only once, then the following download was commented.
#download.file("https://unstats.un.org/unsd/trade/classifications/tables/CompleteCorrelationsOfHS-SITC-BEC_20170606.xlsx",
#              "./0_data/Commodity Classifications/CompleteCorrelationsOfHS-SITC-BEC_20170606.xlsx")
corr_table <- read.xlsx("./0_data/Commodity Classifications/CompleteCorrelationsOfHS-SITC-BEC_20170606.xlsx")
head(corr_table,2)
HS92HS96HS02HS07HS12HS17BEC4SITC1SITC2SITC3SITC4
01011101011101011001011001012101012141 0015 0015 00151 0015
01011101011101011001011001012101012141 0015 0015 00152 0015
# The downloaded correlation table contains all the codes, we also need to ensure the proper naming of the Commodity Categories in the system we choose. 
# Here is a table containing all the commodities (incl. Description) of Comtrade in all Classification Systems. 
# Downloaded only once, then the following download was commented.
# download.file("http://unstats.un.org/unsd/tradekb/Attachment439.aspx?AttachmentType=1",
#                "./0_data/Commodity Classifications/UN_Comtrade_Commodity_Classifications.xlsx")
class_table <- read.xlsx("./0_data/Commodity Classifications/UN_Comtrade_Commodity_Classifications.xlsx")
head(class_table,2)
ClassificationCodeDescriptionCode.ParentLevelisLeaf
BE 1 Food and beverages TOTAL 1 0
BE 11 Food and beverages, primary1 2 0
# Now we can proceed with the harmonizing: We select, without much reasoning, the newest system SITC4.
# The mapping must occur in our downloaded data. For that we create a new variable: 
ctdat$newClass <- NA # new variable for the chosen System 
head(ctdat) # A quick look at the class_table for recoding
ClassificationYearAggregate_LevelIs_Leaf_CodeTrade_FlowReporterPartnerCommodity_CodeCommodityQty_Unit_CodeQty_UnitAlt_Qty_UnitNetweight_kgTrade_Value_USDFlagnewClass
S1 1963 1 0 Import Austria World 0 Food and live animals1 No Quantity NA NA 194876768 0 NA
S1 1963 1 0 Export Austria World 0 Food and live animals1 No Quantity NA NA 71546640 0 NA
S1 1963 2 0 Import Austria World 00 Live animals 1 No Quantity NA NA 8827853 0 NA
S1 1963 2 0 Export Austria World 00 Live animals 1 No Quantity NA NA 40104828 0 NA
S1 1963 3 0 Import Austria World 001 Live animals 8 Weight in kilograms 21781116 21781116 8827853 0 NA
S1 1963 3 0 Export Austria World 001 Live animals 8 Weight in kilograms 87052088 87052088 40104828 0 NA
str(ctdat)
'data.frame':   426813 obs. of  16 variables:
 $ Classification : chr  "S1" "S1" "S1" "S1" ...
 $ Year           : int  1963 1963 1963 1963 1963 1963 1963 1963 1963 1963 ...
 $ Aggregate_Level: int  1 1 2 2 3 3 4 4 4 4 ...
 $ Is_Leaf_Code   : int  0 0 0 0 0 0 1 1 1 1 ...
 $ Trade_Flow     : chr  "Import" "Export" "Import" "Export" ...
 $ Reporter       : chr  "Austria" "Austria" "Austria" "Austria" ...
 $ Partner        : chr  "World" "World" "World" "World" ...
 $ Commodity_Code : chr  "0" "0" "00" "00" ...
 $ Commodity      : chr  "Food and live animals" "Food and live animals" "Live animals" "Live animals" ...
 $ Qty_Unit_Code  : int  1 1 1 1 8 8 8 8 8 8 ...
 $ Qty_Unit       : chr  "No Quantity" "No Quantity" "No Quantity" "No Quantity" ...
 $ Alt_Qty_Unit   : num  NA NA NA NA 21781116 ...
 $ Netweight_kg   : num  NA NA NA NA 21781116 ...
 $ Trade_Value_USD: num  1.95e+08 7.15e+07 8.83e+06 4.01e+07 8.83e+06 ...
 $ Flag           : int  0 0 0 0 0 0 0 0 0 0 ...
 $ newClass       : logi  NA NA NA NA NA NA ...
# Actual recoding with the handy mapvalues function from the plyr package: here the different classification codes are mapped against the classification table.
# It is better to work with the code instead of the description, when available, as the description may contain typing mistakes or different writings, etc. 
ctdat$newClass[ctdat$Classification=="H0"] <- mapvalues(ctdat$Commodity_Code[ctdat$Classification=="H0"],corr_table$HS92,corr_table$SITC4,warn_missing=FALSE)
ctdat$newClass[ctdat$Classification=="H1"] <- mapvalues(ctdat$Commodity_Code[ctdat$Classification=="H1"],corr_table$HS96,corr_table$SITC4,warn_missing=FALSE)
ctdat$newClass[ctdat$Classification=="H2"] <- mapvalues(ctdat$Commodity_Code[ctdat$Classification=="H2"],corr_table$HS02,corr_table$SITC4,warn_missing=FALSE)
ctdat$newClass[ctdat$Classification=="H3"] <- mapvalues(ctdat$Commodity_Code[ctdat$Classification=="H3"],corr_table$HS07,corr_table$SITC4,warn_missing=FALSE)
ctdat$newClass[ctdat$Classification=="H4"] <- mapvalues(ctdat$Commodity_Code[ctdat$Classification=="H4"],corr_table$HS12,corr_table$SITC4,warn_missing=FALSE)
ctdat$newClass[ctdat$Classification=="H5"] <- mapvalues(ctdat$Commodity_Code[ctdat$Classification=="H5"],corr_table$HS17,corr_table$SITC4,warn_missing=FALSE)
ctdat$newClass[ctdat$Classification=="S1"] <- mapvalues(ctdat$Commodity_Code[ctdat$Classification=="S1"],corr_table$SITC1,corr_table$SITC4,warn_missing=FALSE)
ctdat$newClass[ctdat$Classification=="S2"] <- mapvalues(ctdat$Commodity_Code[ctdat$Classification=="S2"],corr_table$SITC2,corr_table$SITC4,warn_missing=FALSE)
ctdat$newClass[ctdat$Classification=="S3"] <- mapvalues(ctdat$Commodity_Code[ctdat$Classification=="S3"],corr_table$SITC3,corr_table$SITC4,warn_missing=FALSE)
S4_classes <- class_table[which(class_table$Classification=="S4"),]
# str(S4_classes)
ctdat$CatS4 <- mapvalues(ctdat$newClass,S4_classes$Code,S4_classes$Description,warn_missing=FALSE)
head(ctdat,5)
ClassificationYearAggregate_LevelIs_Leaf_CodeTrade_FlowReporterPartnerCommodity_CodeCommodityQty_Unit_CodeQty_UnitAlt_Qty_UnitNetweight_kgTrade_Value_USDFlagnewClassCatS4
S1 1963 1 0 Import Austria World 0 Food and live animals 1 No Quantity NA NA 194876768 0 0 Food and live animals
S1 1963 1 0 Export Austria World 0 Food and live animals 1 No Quantity NA NA 71546640 0 0 Food and live animals
S1 1963 2 0 Import Austria World 00 Live animals 1 No Quantity NA NA 8827853 0 00 Live animals other than animals of division 03
S1 1963 2 0 Export Austria World 00 Live animals 1 No Quantity NA NA 40104828 0 00 Live animals other than animals of division 03
S1 1963 3 0 Import Austria World 001 Live animals 8 Weight in kilograms 21781116 21781116 8827853 0 001 Live animals other than animals of division 03
categories_S4_unique <- unique(ctdat$CatS4)
egglist <- categories_S4_unique[grep("egg",categories_S4_unique)]
egglist
  1. 'Dairy products and birds’ eggs'
  2. 'Eggs, birds\', and egg yolks, fresh, dried or otherwise preserved, sweetened or not; egg albumin'
  3. 'Birds\' eggs, in shell, fresh, preserved/cooked'
  4. 'Parts of footwear (including uppers, whether/not attached to soles other than outer soles); removable insoles, heel cushions & similar articles; gaiters, leggings & similar articles & parts thereof'
  5. 'Birds\' eggs, not in shell, & egg yolks, dried'
  6. 'Harvesting or threshing machinery (including straw or fodder balers); grass or hay mowers; machines for cleaning, sorting or grading seed or grain or for grading eggs, fruit or other agricultural produce (other than milling machinery of heading 727.11); p'
  7. 'Machines for cleaning, sorting/grading eggs, fruit/other agricultural produce'
  8. 'Birds\' eggs, not in shell, and egg yolks'
  9. 'Birds\' eggs, not in shell, & egg yolks, other than dried'
  10. 'Fish, prepared or preserved, n.e.s.; caviar and caviar substitutes prepared from fish eggs.'
  11. 'Caviar & caviar substitutes prepared from fish eggs'
  12. 'Albumins (including concentrates of two/more whey proteins, containing by weight > 80% whey proteins, calculated on the dry matter), other than egg albumin; albuminates & other albumin derivatives'
egglist_select <- egglist[c(1,2,3,5,8,9)]
eggdat_comtrade <- ctdat[(ctdat$CatS4 %in% egglist_select),]
head(eggdat_comtrade)
ClassificationYearAggregate_LevelIs_Leaf_CodeTrade_FlowReporterPartnerCommodity_CodeCommodityQty_Unit_CodeQty_UnitAlt_Qty_UnitNetweight_kgTrade_Value_USDFlagnewClassCatS4
49S1 1963 2 0 Import Austria World 02 Dairy products and eggs 1 No Quantity NA NA 13429270 0 02 Dairy products and birds’ eggs
50S1 1963 2 0 Export Austria World 02 Dairy products and eggs 1 No Quantity NA NA 18056338 0 02 Dairy products and birds’ eggs
66S1 1963 3 0 Import Austria World 025 Eggs 8 Weight in kilograms 15815741 15815741 9414903 0 025 Eggs, birds', and egg yolks, fresh, dried or otherwise preserved, sweetened or not; egg albumin
67S1 1963 3 0 Export Austria World 025 Eggs 8 Weight in kilograms 10608 10608 16229 0 025 Eggs, birds', and egg yolks, fresh, dried or otherwise preserved, sweetened or not; egg albumin
68S1 1963 4 0 Import Austria World 0250 Eggs 8 Weight in kilograms 15815741 15815741 9414903 0 0251 Birds' eggs, in shell, fresh, preserved/cooked
69S1 1963 4 0 Export Austria World 0250 Eggs 8 Weight in kilograms 10608 10608 16229 0 0251 Birds' eggs, in shell, fresh, preserved/cooked
table(eggdat_comtrade$CatS4)
                                                 Birds' eggs, in shell, fresh, preserved/cooked 
                                                                                            158 
                                                  Birds' eggs, not in shell, & egg yolks, dried 
                                                                                            127 
                                       Birds' eggs, not in shell, & egg yolks, other than dried 
                                                                                            105 
                                                       Birds' eggs, not in shell, and egg yolks 
                                                                                             12 
                                                                 Dairy products and birds’ eggs 
                                                                                            110 
Eggs, birds', and egg yolks, fresh, dried or otherwise preserved, sweetened or not; egg albumin 
                                                                                             62 
eggdat_comtrade$CatS4[grep("albumin",eggdat_comtrade$CatS4)] <- "Unspecified egg"
# The category of dairy products and bird's eggs is already hard to distinguish between egg and dairy products.
# This category is even less useful, when no weight data ist given. 
egg_na <- eggdat_comtrade[is.na(eggdat_comtrade$Netweight_kg),]
table(egg_ct$CatS4)
Birds' eggs, in shell, fresh, preserved/cooked 
                                             1 
                Dairy products and birds’ eggs 
                                           106 
# For now, we have to exclude these data.
egg_set <- eggdat_comtrade[!is.na(eggdat_comtrade$Netweight_kg),]
head(egg_set,2)
ClassificationYearAggregate_LevelIs_Leaf_CodeTrade_FlowReporterPartnerCommodity_CodeCommodityQty_Unit_CodeQty_UnitAlt_Qty_UnitNetweight_kgTrade_Value_USDFlagnewClassCatS4
66S1 1963 3 0 Import Austria World 025 Eggs 8 Weight in kilograms15815741 15815741 9414903 0 025 Unspecified egg
67S1 1963 3 0 Export Austria World 025 Eggs 8 Weight in kilograms 10608 10608 16229 0 025 Unspecified egg
table(egg_set$CatS4)
          Birds' eggs, in shell, fresh, preserved/cooked 
                                                     157 
           Birds' eggs, not in shell, & egg yolks, dried 
                                                     127 
Birds' eggs, not in shell, & egg yolks, other than dried 
                                                     105 
                Birds' eggs, not in shell, and egg yolks 
                                                      12 
                          Dairy products and birds’ eggs 
                                                       4 
                                         Unspecified egg 
                                                      62 
egg_plot <- egg_set %>%
group_by(Year, CatS4,Trade_Flow) %>%
summarize(Netweight_kg = sum(Netweight_kg))
head(egg_plot)
YearCatS4Trade_FlowNetweight_kg
1963 Birds' eggs, in shell, fresh, preserved/cookedExport 10608
1963 Birds' eggs, in shell, fresh, preserved/cookedImport 15815741
1963 Unspecified egg Export 10608
1963 Unspecified egg Import 15815741
1964 Birds' eggs, in shell, fresh, preserved/cookedExport 53886
1964 Birds' eggs, in shell, fresh, preserved/cookedImport 14164452
# Now we can create a timeline of egg import and export:
ggplot(egg_plot,aes(x=Year,y=Netweight_kg/1000, fill=CatS4))+
geom_area()+theme_minimal()+
facet_grid(cols=vars(Trade_Flow))+
labs(title="Egg Trade Balance Austria:World",
     subtitle= "Stacked Area Plot",
     caption="Data Source: Comtrade",
     color = "",
     y = "Weight [1,000 kg]")+
theme(legend.position="bottom",
legend.text=element_text(size=7))+
guides(fill=guide_legend(nrow=3, byrow=TRUE))+
scale_fill_brewer(palette="Spectral")
ggsave("./3_output/first_egg_balance.png",width=13,height=10)

png

And we see now, that Austria imports by far more eggs (raw or dried) than it exports.
To continue, we could try to approximate the amount of nitrogen and phosphorous that enters and exists Austria via the egg-pathway.
We could relate that to fertilizer import and export to get a measure of magnitude.