World Bank is an ideal place for us to access global development data. We can always search suitable open data by relevant keyword and download it in any format we prefer (CSV, XML, EXCEL). However, if you are the guy who prefers to wrangle data with R, then there is another option for you to deal with it instead of download open data from World Bank into your pc and import into R manually by using read.csv( ), read.xlsx( ) functions.
Installation
To download and install ‘WDI’ package from CRAN, use install.packages( ) function.
install.packages('WDI')
# Install R package, 'WDI'.
Search for data
By using WDIsearch( ) function from ‘WDI’ package, we are allowed to search interested datasets from World Bank web API by inserting relevant keywords. But before we make use of this function, do remember to load the package in our R console after the installation of ‘WDI’ package.
library('WDI')
# Load R package.
WDIsearch('keyword')
# Insert relevant keyword inside single quote.
Let’s say we’re looking for global greenhouse gas emissions data from World Bank for our assignment purpose. We can start to search the relevant data by key in:
- greenhouse
- green
inside the single quote.
WDIsearch('greenhouse')
# 'greenhouse' keyword
WDIsearch('green')
# 'green' keyword
Please be noted that the keyword we insert into the function inside single quote(not wrong for using a double quote, " “) should not consist of white space. For an example,
- greenhouse gas emissions
- green house gas emissions
If you wish to do these, use ‘.*’ for each white space between the words. For an instance,
- greenhouse.*gass.*emissions
- green.*house.*gass.*emissions
WDIsearch('greenhouse.\*gass.\*emissions')
# 'greenhouse.\*gass.\*emissions' keyword
WDIsearch('green.\*house.\*gass.\*emissions')
# 'green.\*house.\*gass.\*emissions' keyword
All of them produce the same output which is a matrix which consists of a list of indicators and name of datasets.
indicator
[1,] "EN.ATM.GHGT.ZG"
[2,] "EN.ATM.GHGT.KT.CE"
[3,] "EN.ATM.GHGO.ZG"
[4,] "EN.ATM.GHGO.KT.CE"
name
[1,] "Total greenhouse gas emissions (% change from 1990)"
[2,] "Total greenhouse gas emissions (kt of CO2 equivalent)"
[3,] "Other greenhouse gas emissions (% change from 1990)"
[4,] "Other greenhouse gas emissions, HFC, PFC and SF6 (thousand metric tons of CO2 equivalent)"
Convert class of searched output(optional)
It seems like the output produced above is a little bit messy. So, let’s convert the class of the matrix into another data type, data frame. Before that, we have to store the output produced by a named variable.
data= WDIsearch('greenhouse')
# Store in 'data'
Use as.data.frame( ) function to convert any data type into data frame.
as.data.frame(data)
# Convert class
Or perform these two steps in one line.
data= as.data.frame(WDIsearch('greenhouse'))
Both of the above ways produce the same output in data frame format.
indicator
1 EN.ATM.GHGT.ZG
2 EN.ATM.GHGT.KT.CE
3 EN.ATM.GHGO.ZG
4 EN.ATM.GHGO.KT.CE
name
1 Total greenhouse gas emissions (% change from 1990)
2 Total greenhouse gas emissions (kt of CO2 equivalent)
3 Other greenhouse gas emissions (% change from 1990)
4 Other greenhouse gas emissions, HFC, PFC and SF6 (thousand metric tons of CO2 equivalent)
Because of limited space here, the ‘name’ variable is shifted a little bit downward, but if you follow the steps in R console, this should not be an issue.
Download and insert data into R
If you have read my first post on this blog, you will know how complicated I wrangled the downloaded data from World Bank web API in R during analytics competition. However, one of the wonderful features for this R package is WDI( ) function allows us to make use of downloaded data from World Bank in tidy format directly (one variable in one column and every observation have its row).
variable_name= WDI(indicator='keyword', country= c('country_code1','country_code2','country_code3'),start= start_year, end= end_year)
Let’s talk back about global greenhouse gas emissions. We have obtained the list of greenhouse gas emissions data which available on World Bank.
indicator | name |
---|---|
EN.ATM.GHGT.ZG | Total greenhouse gas emissions (% change from 1990) |
EN.ATM.GHGT.KT.CE | Total greenhouse gas emissions (kt of CO2 equivalent) |
EN.ATM.GHGO.ZG | Other greenhouse gas emissions (% change from 1990) |
EN.ATM.GHGO.KT.CE | Other greenhouse gas emissions, HFC, PFC and SF6 (thousand metric tons of CO2 equivalent) |
Indicator for each of dataset is important as it is used as a keyword which can be included in WDI( ) function.
We have a scenario here. Let’s say we want to download the total greenhouse gas emissions in kilotonnes of carbon dioxide equivalent for Malaysia, Singapore, Canada from the year 2000 to 2004, then we need to include the indicator for this dataset, country code, start year and end year inside the WDI( ) function.
- indicator= EN.ATM.GHGT.KT.CE (from searched result)
- country code= Malaysia= MY, Singapore= SG, Canada= CA
- start year= 2000
- end year= 2004
ghge= WDI(indicator='EN.ATM.GHGT.KT.CE', country= c('MY','SG','CA'),start= 2000,end= 2004)
# Download data and store into 'ghge'.
Now, we have our dataset in tidy format. If you notice, you will found that total greenhouse gas emissions in kilotonnes in different years for these three countries are combined under a named ‘year’ column.
iso2c country EN.ATM.GHGT.KT.CE year
1 CA Canada 1040136.77 2004
2 CA Canada 951373.04 2003
3 CA Canada 908686.15 2002
4 CA Canada 718534.46 2001
5 CA Canada 760643.00 2000
6 MY Malaysia 216283.73 2004
7 MY Malaysia 195386.72 2003
8 MY Malaysia 199983.48 2002
9 MY Malaysia 176214.37 2001
10 MY Malaysia 171393.52 2000
11 SG Singapore 44836.63 2004
12 SG Singapore 43165.65 2003
13 SG Singapore NA 2002
14 SG Singapore 49851.52 2001
15 SG Singapore NA 2000
Renaming columns of data
To change the name for each column of data, use colnames( ) function.
columnnames=c('Country Code','Country','Total Greenhouse Gass Emissions in kt','Year')
# Renaming columns of data and store in 'columnnames'
colnames(ghge)= columnnames
# change columns name of 'ghge' with the new names we set
Country Code Country Total Greenhouse Gass Emissions in kt Year
1 CA Canada 1040136.77 2004
2 CA Canada 951373.04 2003
3 CA Canada 908686.15 2002
4 CA Canada 718534.46 2001
5 CA Canada 760643.00 2000
6 MY Malaysia 216283.73 2004
7 MY Malaysia 195386.72 2003
8 MY Malaysia 199983.48 2002
9 MY Malaysia 176214.37 2001
10 MY Malaysia 171393.52 2000
11 SG Singapore 44836.63 2004
12 SG Singapore 43165.65 2003
13 SG Singapore NA 2002
14 SG Singapore 49851.52 2001
15 SG Singapore NA 2000