Description: The downloaded dataset from before will be combined into one file. In total, the period from 1963 to 2017 was chosen. The dataset contains all imports and exports of all commodities of Austria to the world and it was reported from Statistic Austria to the UN Comtrade database.

library(readr)
library(dplyr)
library(WriteXLS)
library(rjson)
Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union
# setting directories like before (in the download file)
setwd("../")
loc_fol <- paste0(getwd(),"/0_data/comtrade")
down_files <- list.files(path=loc_fol,pattern="*.csv",full.names=TRUE) # get all downloaded csv-files
combination <- lapply(down_files,read.csv)
df <- do.call("rbind", combination)
dim(df)
head(df, 2)
  1. 426813
  2. 35
pfCodeyrperiodperiodDescaggrLevelIsLeafrgCodergDescrtCodertTitleqtAltCodeqtAltDescTradeQuantityAltQuantityNetWeightGrossWeightTradeValueCIFValueFOBValueestCode
S1 1963 1963 1963 1 0 1 Import 40 Austria NA NA NA NA NA NA 194876768NA NA 0
S1 1963 1963 1963 1 0 2 Export 40 Austria NA NA NA NA NA NA 71546640NA NA 0
# The data still contains the names from comtrade. Here the names get changed: 
names <- c("Classification","Year","Period","Period Desc.","Aggregate Level","Is Leaf Code","Trade Flow Code","Trade Flow","Reporter Code","Reporter","Reporter ISO","Partner Code","Partner","Partner ISO","2nd Partner Code","2nd Partner","2nd Partner ISO","Customs Proc. Code","Customs","Mode of Transport Code","Mode of Transport","Commodity Code","Commodity","Qty Unit Code","Qty Unit","Qty","Alt Qty Unit Code","Alt Qty Unit","Alt Qty","Netweight (kg)","Gross weight (kg)","Trade Value (USD)","CIF Trade Value (USD)","FOB Trade Value (USD)","Flag")
colnames(df) <- names
head(df,5)
ClassificationYearPeriodPeriod Desc.Aggregate LevelIs Leaf CodeTrade Flow CodeTrade FlowReporter CodeReporterQtyAlt Qty Unit CodeAlt Qty UnitAlt QtyNetweight (kg)Gross weight (kg)Trade Value (USD)CIF Trade Value (USD)FOB Trade Value (USD)Flag
S1 1963 1963 1963 1 0 1 Import 40 Austria NA NA NA NA NA NA 194876768NA NA 0
S1 1963 1963 1963 1 0 2 Export 40 Austria NA NA NA NA NA NA 71546640NA NA 0
S1 1963 1963 1963 2 0 1 Import 40 Austria NA NA NA NA NA NA 8827853NA NA 0
S1 1963 1963 1963 2 0 2 Export 40 Austria NA NA NA NA NA NA 40104828NA NA 0
S1 1963 1963 1963 3 0 1 Import 40 Austria NA NA 21781116 NA 21781116 NA 8827853NA NA 0
# After having a little look on the data, some columns are not necessary for our analysis.
# Since it is yearly data period is not relevant. The trade flow code is redundand to Trade Flow. 
# Partner and Reporter codes are not important, as we predefined them: Austria and World. 
# So we define a name vector with redundand information: 
col_removal <- c("Period","Period Desc.","Trade Flow Code","Reporter Code", "Reporter ISO", "Partner Code", "Partner ISO")  

# Then remove that columns with the following line: 
cleandat_red <- df[ , !(colnames(df) %in% col_removal)] # Removal Step one - redundand information

# The data type factor often shows some unwanted side-effects, so we change factors into characters. 
cleandat_fac <- cleandat_red %>% mutate_if(is.factor, as.character) # Removal Step 2: factor to character

# Now in addition we remove all columns that only contain NA-values (-> empty columns). 
cleandat_na <- Filter(function(x) !all(is.na(x)), cleandat_fac) # Removal Step 3: remove empty colums

head(cleandat_na,2) # Now we have a more compact data set. 
ClassificationYearAggregate LevelIs Leaf CodeTrade FlowReporterPartnerCommodity CodeCommodityQty Unit CodeQty UnitAlt Qty UnitNetweight (kg)Trade Value (USD)Flag
S1 1963 1 0 Import Austria World 0 Food and live animals1 No Quantity NA NA 194876768 0
S1 1963 1 0 Export Austria World 0 Food and live animals1 No Quantity NA NA 71546640 0
# Export Data: To interact with other people and distribute spreadsheets the formats .csv and .xlsx are both good. 
write_fol <- paste0(getwd(),"/2_pipeline/Combine_Downloads")
WriteXLS(cleandat_na,paste0(write_fol,"/combo_ct_dat.xlsx"),row.names=FALSE)
write.csv(cleandat_na,paste0(write_fol,"/combo_ct_dat.csv"),row.names=FALSE)

Now we combined all the downloaded datasets and removed some bloat. Next we have to make sure, that the product categories (commodities) match over the whole period. For this we have to map all of them to the same reference system. The people of Comtrade thankfully provide lists for that.