Comtrade Analysis Step 2

Description: The downloaded dataset from before will be combined into one file. In total, the period from 1963 to 2017 was chosen. The dataset contains all imports and exports of all commodities of Austria to the world and it was reported from Statistic Austria to the UN Comtrade database.

library(readr)
library(dplyr)
library(WriteXLS)
library(rjson)

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

# setting directories like before (in the download file)
setwd("../")
loc_fol <- paste0(getwd(),"/0_data/comtrade")

down_files <- list.files(path=loc_fol,pattern="*.csv",full.names=TRUE) # get all downloaded csv-files
combination <- lapply(down_files,read.csv)
df <- do.call("rbind", combination)
dim(df)
head(df, 2)

426813
35

pfCode	yr	period	periodDesc	aggrLevel	IsLeaf	rgCode	rgDesc	rtCode	rtTitle	⋯	qtAltCode	qtAltDesc	TradeQuantity	AltQuantity	NetWeight	GrossWeight	TradeValue	CIFValue	FOBValue	estCode
S1	1963	1963	1963	1	0	1	Import	40	Austria	⋯	NA	NA	NA	NA	NA	NA	194876768	NA	NA	0
S1	1963	1963	1963	1	0	2	Export	40	Austria	⋯	NA	NA	NA	NA	NA	NA	71546640	NA	NA	0

# The data still contains the names from comtrade. Here the names get changed: 
names <- c("Classification","Year","Period","Period Desc.","Aggregate Level","Is Leaf Code","Trade Flow Code","Trade Flow","Reporter Code","Reporter","Reporter ISO","Partner Code","Partner","Partner ISO","2nd Partner Code","2nd Partner","2nd Partner ISO","Customs Proc. Code","Customs","Mode of Transport Code","Mode of Transport","Commodity Code","Commodity","Qty Unit Code","Qty Unit","Qty","Alt Qty Unit Code","Alt Qty Unit","Alt Qty","Netweight (kg)","Gross weight (kg)","Trade Value (USD)","CIF Trade Value (USD)","FOB Trade Value (USD)","Flag")
colnames(df) <- names

head(df,5)

Classification	Year	Period	Period Desc.	Aggregate Level	Trade Flow Code	Trade Flow	Reporter Code	Reporter	⋯	Qty	Alt Qty Unit Code	Alt Qty Unit	Alt Qty	Netweight (kg)	Gross weight (kg)	Trade Value (USD)	CIF Trade Value (USD)	FOB Trade Value (USD)
S1	1963	1963	1963	1	1	Import	40	Austria	⋯	NA	NA	NA	NA	NA	NA	194876768	NA	NA
S1	1963	1963	1963	1	2	Export	40	Austria	⋯	NA	NA	NA	NA	NA	NA	71546640	NA	NA
S1	1963	1963	1963	2	1	Import	40	Austria	⋯	NA	NA	NA	NA	NA	NA	8827853	NA	NA
S1	1963	1963	1963	2	2	Export	40	Austria	⋯	NA	NA	NA	NA	NA	NA	40104828	NA	NA
S1	1963	1963	1963	3	1	Import	40	Austria	⋯	NA	NA	21781116	NA	21781116	NA	8827853	NA	NA

# After having a little look on the data, some columns are not necessary for our analysis.
# Since it is yearly data period is not relevant. The trade flow code is redundand to Trade Flow. 
# Partner and Reporter codes are not important, as we predefined them: Austria and World. 
# So we define a name vector with redundand information: 
col_removal <- c("Period","Period Desc.","Trade Flow Code","Reporter Code", "Reporter ISO", "Partner Code", "Partner ISO")  

# Then remove that columns with the following line: 
cleandat_red <- df[ , !(colnames(df) %in% col_removal)] # Removal Step one - redundand information

# The data type factor often shows some unwanted side-effects, so we change factors into characters. 
cleandat_fac <- cleandat_red %>% mutate_if(is.factor, as.character) # Removal Step 2: factor to character

# Now in addition we remove all columns that only contain NA-values (-> empty columns). 
cleandat_na <- Filter(function(x) !all(is.na(x)), cleandat_fac) # Removal Step 3: remove empty colums

head(cleandat_na,2) # Now we have a more compact data set.

Classification	Year	Aggregate Level	Is Leaf Code	Trade Flow	Reporter	Partner	Commodity Code	Commodity	Qty Unit Code	Qty Unit	Alt Qty Unit	Netweight (kg)	Trade Value (USD)	Flag
S1	1963	1	0	Import	Austria	World	0	Food and live animals	1	No Quantity	NA	NA	194876768	0
S1	1963	1	0	Export	Austria	World	0	Food and live animals	1	No Quantity	NA	NA	71546640	0

# Export Data: To interact with other people and distribute spreadsheets the formats .csv and .xlsx are both good. 
write_fol <- paste0(getwd(),"/2_pipeline/Combine_Downloads")
WriteXLS(cleandat_na,paste0(write_fol,"/combo_ct_dat.xlsx"),row.names=FALSE)
write.csv(cleandat_na,paste0(write_fol,"/combo_ct_dat.csv"),row.names=FALSE)

Now we combined all the downloaded datasets and removed some bloat. Next we have to make sure, that the product categories (commodities) match over the whole period. For this we have to map all of them to the same reference system. The people of Comtrade thankfully provide lists for that.