--- title: "How to retrieve data from the INBOVEG database" author: "Els De Bie, Hans Van Calster, Jo Loos" date: 2020-07-03 output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{How to retrieve data from the INBOVEG database} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} if (requireNamespace("dbplyr", quietly = TRUE)) { library("dbplyr") # Example with pkg } else { message("'dbplyr' not available") } tryCatch( con <- inbodb::connect_inbo_dbase("D0010_00_Cydonia"), error = function(e) e, finally = database_access <- exists("con") ) knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = database_access, cache = FALSE ) if (!database_access) { load("get_data_inboveg_results.Rda") } ``` # Introduction The Flemish vegetation database, INBOVEG, is an application developed to provide a repository of relevés and makes the relevés available for future use. More information? Check out https://www.vlaanderen.be/inbo/en-gb/data-applications/inboveg/ INBOVEG supports different types of recordings: `BioHab` recordings (protocol of Natura 2000 monitoring) and the classic relevés. The classic relevés can stand alone, be an element of a collection or element of a chain where the linkage is used to give information about the relative position of recording within a series. Ample selection and export functions toward analysis tools are provided. It also provides standardized lists of species, habitats, life forms, scales ... Original observations are preserved and a full history of subsequent identifications is saved. More information? Check out [INBOVEG-website](https://www.vlaanderen.be/inbo/en-gb/data-applications/inboveg/) # Aim We make functions available to query data directly from the INBOVEG SQL-server database. This avoids writing your own queries or to copy/paste them from the access-frontend for INBOVEG. We have provided functions to query * survey (INBOVEG-projects) * metadata of recordings (header info) * recordings (vegetation relevés) * layer data (cover or qualifiers) * classification (Natura2000 or local classification like BWK) * qualifiers (management and site characteristics) * relations between recordings (relation information on recordings for one or more surveys based on Parent (classic-chain/bucket) and Child (classic) relationships) # Packages and connection The main functions that we will use in this tutorial all start with `get_inboveg_*`. These functions are made available by loading the `inbodb` package. ```{r setup} library(inbodb) ``` These functions will only work for people with access to the INBO network. As an INBO employee, you should make sure you have reading-rights for CYDONIA, otherwise place an ICT-call. The following R-code can be used to establish a connection to INBOVEG by using `connect_inbo_dbase` of the inbodb-package with the database 'Cydonia' on the `inbo-sql07-prd` server: ```{r connection_inbo} con <- connect_inbo_dbase("D0010_00_Cydonia") ``` # Functionality ## Survey information The function `get_inboveg_survey` queries the INBOVEG database for survey information (metadata about surveys) for one or more survey(s) by the name of the survey. Three examples are given, this can be used as base to continue selecting the data you require. Get information of a specific survey and collect data. ```{r examples_survey} survey_info <- get_inboveg_survey( con, survey_name = "OudeLanden_1979", collect = TRUE) ``` ```{r eval=TRUE} survey_info ``` Get information of all surveys. This time we will not use `collect = TRUE`, which will return a [lazy query](https://docs.lucee.org/guides/cookbooks/lazy_queries.html): ```{r} allsurveys <- get_inboveg_survey(con) ``` ```{r eval=TRUE} class(allsurveys) ``` If only a part of the survey name is known, you can make use of wildcards such as `%`. ```{r} partsurveys <- get_inboveg_survey( con, survey_name = "%MILKLIM%", collect = TRUE) ``` ```{r eval=TRUE} partsurveys ``` ## Header information The function `get_inboveg_header` queries the INBOVEG database for header information (metadata for a vegetation-relevé) for one or more survey by the name of the survey(s) and the recorder type (`classic`, `classic-emmer`, `classic-ketting`, `biohab`, `ppa`). All relevés are selected, also those with 'needs work' indicated (0 = no work needed, 1 = needs work). It is important to realize that the relevé is not finished yet, species lists can be incomplete, or not all species have been identified yet. For certain analysis these relevés can better be excluded (by using dplyr -> filter). Three examples are given, this can be used as base to continue selecting the data you require. Get header information from a specific survey and a specific recording type and collect the data: ```{r examples_header} header_info <- get_inboveg_header( con, survey_name = "OudeLanden_1979", rec_type = "Classic", collect = TRUE) ``` ```{r eval=TRUE} head(header_info, 10) ``` Get header information from several specific surveys by using `multiple = TRUE`. ```{r, eval = FALSE} header_severalsurveys <- get_inboveg_header( con, survey_name = c("MILKLIM_Heischraal2012", "NICHE Vlaanderen"), multiple = TRUE) ``` Get header information of all surveys without collecting the data: ```{r, eval = FALSE} all_header_info <- get_inboveg_header(con) ``` ## Recording information The function `get_inboveg_recording()` queries the INBOVEG database for relevé information (which species were recorded in which plots and in which vegetation layers with which cover) for one or more surveys. Several examples are given, this can be used as base to continue selecting the data you require. First, we show how to get the relevés from one survey and collect the data: ```{r examples_recordings} recording_heischraal2012 <- get_inboveg_recording( con, survey_name = "MILKLIM_Heischraal2012", collect = TRUE) ``` ```{r eval=TRUE} recording_heischraal2012 ``` Here is how to get the relevé for a specific `RecordingGivid`: ```{r, eval = FALSE} recording_specific_givid <- get_inboveg_recording( con, recording_givid = "IV2012080609161322", collect = TRUE) ``` Or for multiple `RecordingGivid`: ```{r} recording_specific_givid <- get_inboveg_recording( con, recording_givid = c("IV2012080609161322", "IV2012081611384756"), collect = TRUE) ``` The above is very useful in tandem with `get_inboveg_header()` to first search for all relevés belonging to one or more surveys and further filter these records to only the relevés you require based on the relevé metadata (such as date, observer, ...). Or based on a user reference: ```{r, eval = FALSE} recording_specific_userref <- get_inboveg_recording( con, user_reference = "HS_1001", collect = TRUE) ``` Get all recordings from MILKLIM surveys (partial matching), without collecting the data: ```{r, eval = FALSE} recording_milkim <- get_inboveg_recording( con, survey_name = "%MILKLIM%") ``` Get recordings from several specific surveys: ```{r, eval = FALSE} recording_severalsurveys <- get_inboveg_recording( con, survey_name = c("MILKLIM_Heischraal2012", "NICHE Vlaanderen"), collect = TRUE) ``` Get all relevés of all surveys, without collecting the data: ```{r, eval = FALSE} allrecordings <- get_inboveg_recording(con) ``` The function `get_inboveg_ppa()` is similar in use as `get_inboveg_recording()`, but can be used specifically to get point to plant distance type data (PPA) from the INBOVEG database. ```{r eval=FALSE} allppa <- get_inboveg_ppa(con) ``` ## Classification information The function `get_inboveg_classification` queries the INBOVEG database for information on the field classification (N2000 or BWK-code) of the relevé for one or more survey(s) by the name of the survey. Two examples are given, this can be used as base to continue selecting the data you require. Get a specific classification from a survey and collect the data: ```{r examples_classification} classif_info <- get_inboveg_classification( con, survey_name = "MILKLIM_Heischraal2012", classif = "4010", collect = TRUE) ``` ```{r eval=TRUE} classif_info ``` Get the classification from several specific surveys ```{r, eval = FALSE} classif_info <- get_inboveg_classification( con, survey_name = c("MILKLIM_Heischraal2012", "NICHE Vlaanderen"), multiple = TRUE) classif_info ``` Get all surveys, all classifications without collecting the data: ```{r, eval = FALSE} all_codes <- get_inboveg_classification(con) ``` ## Qualifiers information The function `get_inboveg_qualifier()`queries the INBOVEG database for qualifier information on recordings for one or more surveys. These qualifiers give information on management (management qualifier `MQ`) or location description (site qualifier `SQ`). Get the qualifiers from one survey: ```{r examples_qualifiers} qualifiers_heischraal2012 <- get_inboveg_qualifier( con, survey_name = "MILKLIM_Heischraal2012") ``` ```{r eval=TRUE} head(qualifiers_heischraal2012) ``` Get all site qualifiers (SQ) from MILKLIM surveys (partial matching): ```{r, eval = FALSE} qualifiers_milkim <- get_inboveg_qualifier( con, survey_name = "%MILKLIM%", qualifier_type = "SQ") ``` Get qualifiers from several specific surveys with `multiple = TRUE`: ```{r, eval = FALSE} qualifiers_severalsurveys <- get_inboveg_qualifier( con, survey_name = c("MILKLIM_Heischraal2012", "NICHE Vlaanderen"), multiple = TRUE) ``` Get all qualifiers of all surveys: ```{r, eval = FALSE} allqualifiers <- get_inboveg_qualifier(con) ``` ## Layer information The function `get_inboveg_layer_cover` queries the INBOVEG database for cover information per layer on recordings for one or more surveys. Get the layer cover information from one survey ```{r, eval = FALSE} layerinfo_heischraal2012 <- get_inboveg_layer_cover( con, survey_name = "MILKLIM_Heischraal2012") ``` Get all layer cover information from MILKLIM surveys (partial matching) ```{r, eval = FALSE} layerinfo_milkim <- get_inboveg_layer_cover( con, survey_name = "%MILKLIM%") ``` Get layer cover information from several specific surveys ```{r, eval = FALSE} layerinfo_severalsurveys <- get_inboveg_layer_cover( con, survey_name = c("MILKLIM_Heischraal2012", "NICHE Vlaanderen"), multiple = TRUE) ``` Get all layer cover information of all surveys ```{r, eval = FALSE} all_layerinfo <- get_inboveg_layer_cover(con) ``` ## Layer qualifiers The function `get_inboveg_layer_qualifier()`queries the INBOVEG database for layer qualifier information on recordings for one or more surveys. Get the layer qualifiers from one survey ```{r, eval = FALSE} layerqualifier_gagealutea <- get_inboveg_layer_qualifier( con, survey_name = "GageaLutea_1980") ``` Get all layer qualifiers from MILKLIM surveys (partial matching) ```{r, eval = FALSE} layerqualifiers_milkim <- get_inboveg_layer_qualifier( con, survey_name = "%MILKLIM%") ``` Get layer qualifiers from several specific surveys ```{r, eval = FALSE} layerqualifiers_severalsurveys <- get_inboveg_layer_qualifier( con, survey_name = c("MILKLIM_Heischraal2012", "NICHE Vlaanderen"), multiple = TRUE) ``` Get all layer qualifiers of all surveys ```{r, eval = FALSE} alllayerqualifiers <- get_inboveg_layer_qualifier(con) ``` ## Parent-child relationships between recordings The function `get_inboveg_relation_recording()` queries the INBOVEG database for relational information on recordings for one or more surveys based on parent (classic-chain/bucket) and Child (classic) relationships. Get all parent-child-relations from `N2000meetnet` surveys (partial matching): ```{r} relations_n2000meetnet <- get_inboveg_relation_recording( con, survey_name = "%N2000meetnet%") relations_n2000meetnet ``` ```{r echo=FALSE} output_relations_n2000meetnet <- capture.output(relations_n2000meetnet) ``` ```{r echo=FALSE, eval=TRUE} if (!database_access) { cat(output_relations_n2000meetnet, sep = "\n") } ``` Get the parent-child-relations from several specific surveys: ```{r, eval = FALSE} relations_severalsurveys <- get_inboveg_relation_recording( con, survey_name = c("DeBlankaart-1985-Beheer", "N2000meetnet_Grasland"), multiple = TRUE) ``` # Processing the retrieved data When you have retrieved the data from INBOVEG using one of the `inbodb` functions in the previous section, you can easily work with these data in R or export the data. We recommend to use the [`dplyr`](https://dplyr.tidyverse.org/) package for basic as well as advanced data wrangling. For instance: - `dplyr::mutate()` adds new variables that are functions of existing variables - `dplyr::select()` picks variables based on their names. - `dplyr::filter()` picks cases (rows) based on their values. - `dplyr::summarise()` reduces multiple values down to a single summary. - `dplyr::arrange()` changes the ordering of the rows. For exporting the data we can make use of the [`readr`](https://readr.tidyverse.org/) package. Here is a simple example: ```{r eval=FALSE, echo=TRUE} library(inbodb) library(dplyr) library(readr) # get the data con <- connect_inbo_dbase("D0010_00_Cydonia") recording_heischraal2012 <- get_inboveg_recording( con, survey_name = "MILKLIM_Heischraal2012", collect = TRUE) # close the connection dbDisconnect(con) rm(con) # do some data wrangling with dplyr # here we remove one column output_heischraal2012 <- recording_heischraal2012 %>% select(-OriginalName) # export the data to a csv file that can be easily read by excel write_excel_csv2( x = output_heischraal2012, file = "data/output_heischraal2012.csv" ) ``` # More complex queries These functions give most of the relevant basic information that is available in INBOVEG. In the future additional functions can be added to this package to help the INBOVEG-users. # Closing the connection Close the connection when done ```{r closing_conn} dbDisconnect(con) rm(con) ``` ```{r echo=FALSE, include=FALSE} rm(database_access) save.image(file = "get_data_inboveg_results.Rda") ```