--- title: "(Down)load data from external partners" author: "Stijn Van Hoey" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{(Down)load data from external partners} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ## Introduction We regularly get data sets (e.g. `csv`) from external partners with specific data formats. To overcome redundant work in writing custom functions to load these data sets, this vignette provides some examples for custom data readers provided by the `inborutils` package. ```{r} library(inborutils) ``` ## `KNMI` data The Dutch meteorological institute, `KNMI`, provides a webservice to query and download data. [This tutorial](https://www.knmi.nl/kennis-en-datacentrum/achtergrond/data-ophalen-vanuit-een-script) provides more information about their service. For the hourly data, the R function `download_knmi_data_hour` facilitates the download of the data. To download data, the required inputs are the `stations`, `variables`, a `start_date` and an `end_date`. In order to have an idea about the measurement stations that can be used, `KNMI` provides an overview list [here](http://projects.knmi.nl/klimatologie/metadata/stationslijst.html). The variables, for which both a group name or a variable name can be used, also provided in the [`KNMI` documentation](https://www.knmi.nl/kennis-en-datacentrum/achtergrond/data-ophalen-vanuit-een-script): | group name | variable name | description | |---|---|---| | WIND | DD:FH:FF:FX | wind | | TEMP | T:T10N:TD | temperature | | SUNR | SQ:Q | sunlight duration, global radiation | | PRCP | DR:RH | rainfall, pet | | VICL | VV:N:U | sight, cloudiness, relative humidity | | WEER | M:R:S:O:Y:WW | weather types | | ALL | | all variables | Hence, for a given start (e.g. January 1st 2012) and end date (February 1st 2012), the data download for rainfall data in `Vlissingen` (`310`) and `Wesdorpe` (`319`), writing the output to a file called `knmi_download.csv`, can be started as follows: ```{r eval=FALSE, include=TRUE} response <- download_knmi_data_hour(c(310, 319), "PRCP", "2012-01-01", "2012-02-01", output_file = "knmi_download.csv") ``` When chosen the rainfall data only, the package already includes a specific function to read the rainfall data (a more general functionality is on the `todo`-list, feel free to extend the existing function): ```{r eval=FALSE, include=TRUE} rain_knmi_2012 <- read_knmi_data("./knmi_download.csv") ``` ```{r include=FALSE} rain_knmi_2012 <- inborutils::rain_knmi_2012 ``` ```{r} head(rain_knmi_2012) ``` From which a time series plot can be made: ```{r fig.width = 7} library(ggplot2) ggplot(rain_knmi_2012, aes(x = datetime, y = value)) + geom_line() + xlab("") + ylab("mm") ``` From the figure, it gets clear that `KNMI` uses `-1` to define Nan values. ## `MOW-HIC` data When receiving data from MOW (apart from using the `waterinfo` API), the file format of MOW data sets looks as follows: ``` Station Name: Destelbergen SF/Zeeschelde Station Number: zes57n-SF-CM River: Zeeschelde Operator: - Easting: 109591 Northing: 192793 Datum: 0.000 Parameter Name: Cond Parameter Type: Cond Time series Name: Destelbergen SF/Zeeschelde / Cond / zes57n-SF.Cond.5 Time series Unit: µS/cm Time level: High-resolution Time series Type: Instantaneous value Time series equidistant: yes Time series value distance: 5 Minute(s) Time series quality: 2 Time series measuring system: --- Date Time Cond [µS/cm] Quality flag Comments 01/04/2015 00:00:00 631.996 G 01/04/2015 00:05:00 631.007 G 01/04/2015 00:10:00 632.993 G 01/04/2015 00:15:00 631.004 G 01/04/2015 00:20:00 631.996 G 01/04/2015 00:25:00 631.004 G 01/04/2015 00:30:00 632.000 G 01/04/2015 00:35:00 632.000 G ... ``` A lot of the information is provided in the header, which we would like to combine with the time series itself. The function `read_mow_data` is a tailor-made function to load this file format into a data.frame: ```{r include=FALSE} fpath <- system.file("extdata", "mow_example.txt", package = "inborutils") ``` ```{r eval=FALSE, include=TRUE} fpath <- "mow_example.txt" ``` ```{r} conductivity_mow <- read_mow_data(fpath) head(conductivity_mow) ``` (*Remark*: this example file is provided by the package itself, see also on [GitHub](https://github.com/inbo/inborutils/tree/master/inst/extdata)) A time series plot can be made for these data as well: ```{r fig.width = 7} library(ggplot2) ggplot(conductivity_mow, aes(x = datetime, y = value)) + geom_line() + xlab("") + ylab("µS/cm") + scale_x_datetime(date_labels = "%H:%M\n%Y-%m-%d", date_breaks = "4 hours") ``` ## `KMI` data When receiving data from the Belgian Meteorological Institute, `KMI`, the format of the data file looks as follows (at least for some project we did): ``` date;JAAR;MAAND;DAG;UUR;STATION;NEERSLAG(mm) 2012-1-1_1;2012;1;1;1;SINT_KATELIJNE_WAVER;0 2012-1-1_2;2012;1;1;2;SINT_KATELIJNE_WAVER;0 2012-1-1_3;2012;1;1;3;SINT_KATELIJNE_WAVER;0 2012-1-1_4;2012;1;1;4;SINT_KATELIJNE_WAVER;0 2012-1-1_5;2012;1;1;5;SINT_KATELIJNE_WAVER;1.1 2012-1-1_6;2012;1;1;6;SINT_KATELIJNE_WAVER;0.2 2012-1-1_7;2012;1;1;7;SINT_KATELIJNE_WAVER;0 2012-1-1_8;2012;1;1;8;SINT_KATELIJNE_WAVER;0 2012-1-1_9;2012;1;1;9;SINT_KATELIJNE_WAVER;0 2012-1-1_10;2012;1;1;10;SINT_KATELIJNE_WAVER;0.8 2012-1-1_11;2012;1;1;11;SINT_KATELIJNE_WAVER;0.1 ... ``` To read the data and provide it into a similar format as the previous time series, the function `read_kmi_data` is available in the `inborutils` package: ```{r include=FALSE} rpath <- system.file("extdata", "kmi_example.txt", package = "inborutils") ``` ```{r eval=FALSE, include=TRUE} rpath <- "kmi_example.txt" ``` ```{r} rainfall_kmi <- read_kmi_data(rpath) head(rainfall_kmi) ``` (*Remark*: this example file is provided by the package itself, see also on [GitHub](https://github.com/inbo/inborutils/tree/master/inst/extdata)) ## Google maps `kml` files To extract coordinate and date information from a `kml` file, the function `load_kml` To read the data and provide it into a similar format as the previous time series, the function `read_kml_file` is available in the `inborutils` package: ```{r include=FALSE} rpath <- system.file("extdata", "kml_example.kml", package = "inborutils") ``` ```{r eval=FALSE, include=TRUE} rpath <- "kml_example.kml" ``` ```{r} tracks <- read_kml_file(rpath) head(tracks) ``` (*Remark*: this example file is provided by the package itself, see also on [GitHub](https://github.com/inbo/inborutils/tree/master/inst/extdata)) ## Closure Feel free to add other potentially useful data formats reader functions and associated documentation!