--- title: "Working with geospatial data sources of habitat (sub)types and RIBs" author: "Floris Vanderhaeghe, Toon Westra, Jan Wouters, Cécile Herr" date: "2021-05-12" bibliography: references.bib output: rmarkdown::html_vignette: pandoc_args: - --csl - research-institute-for-nature-and-forest.csl vignette: > %\VignetteIndexEntry{025. Working with geospatial data sources of habitat (sub)types and RIBs} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} options(stringsAsFactors = FALSE) knitr::opts_chunk$set( collapse = TRUE, comment = "#>", paged.print = FALSE, eval = FALSE ) ``` _General note: the below vignette contains frozen output of 12 May 2021._ _This makes it possible to build the package with vignettes without access to the data sources._ ```{r eval=TRUE, results='hide', message=FALSE, warning=FALSE} library(n2khab) library(sf) library(dplyr) library(units) ``` ## Intro This vignette learns you: - the difference between various geospatial N2KHAB data sources on the occurrence of types in Flanders (types being habitat (sub)types or regionally important biotopes (RIBs); see `read_types()`); - how to load these data sources into R; - how to optimally match an interpreted data source with a 'types' column in your own data frame. ## Data sources The below N2KHAB data sources are available. For practical information on data storage and locations, see `vignette("v020_datastorage")`. - **raw data sources**: - **`habitatmap`** ([Zenodo-link](https://doi.org/10.5281/zenodo.3354381)): geospatial polygons of BWK and Natura 2000 habitat types in the Flemish Region, originally published by the Research Institute for Nature and Forest (INBO) [@de_saeger_biologische_2020] and distributed by 'Informatie Vlaanderen'. - **`habitatstreams`** ([Zenodo-link](https://doi.org/10.5281/zenodo.3386245)): geospatial lines of the Natura 2000 habitat type `3260` that correspond with its presence in streaming water segments in the Flemish Region, originally published by INBO [@leyssen_indicatieve_2020] and distributed by 'Informatie Vlaanderen'. - **`habitatsprings`** ([Zenodo-link](https://doi.org/10.5281/zenodo.3550994)): geospatial points hat correspond with the presence or absence of the Natura 2000 habitat type `7220` in springs and streaming water segments in the Flemish Region. The data source is produced, owned and administered by INBO. - **`habitatquarries`** ([Zenodo-link](https://doi.org/10.5281/zenodo.4072967)): geospatial polygons that correspond with the presence or absence of the Natura 2000 habitat type `8310` in underground marl quarries in the Flemish Region (and border areas). The data source is produced, owned and administered by INBO. - **processed data sources**: - **`habitatmap_stdized`** ([Zenodo-link](https://doi.org/10.5281/zenodo.3355192)): derived from `habitatmap`. This datasource only contains polygons with habitat (sub)type or RIB codes. All codes conform to the `types` reference list (see `read_types()`). The datasource is a GeoPackage with essentially two objects: 1) the geospatial polygons; 2) a long-formatted table with the associated types (multiple records can occur per `polygon_id`). For more details see `read_habitatmap_stdized()`. - **`habitatmap_terr`** ([Zenodo-link](https://doi.org/10.5281/zenodo.3468948)): the further interpreted, terrestrial part of `habitatmap_stdized` ('terrestrial' includes 'semi-terrestrial'). Amongst other properties, it excludes polygons _without_ terrestrial types, it excludes rows which most probably are no habitat or RIB at all, and several main type codes were translated to a corresponding subtype which they almost always represent. The datasource is a GeoPackage, organized in the same way as `habitatmap_stdized`. For more details see `read_habitatmap_terr()`. - **`watersurfaces_hab`** ([Zenodo-link](https://doi.org/10.5281/zenodo.3374645)): a combination of the environmental data source `watersurfaces` (@leyssen_watervlakken_2020; see `read_watersurfaces()`) and the above data source `habitatmap_stdized`. It represents polygons (from `watersurfaces` or `habitatmap_stdized`) that completely or partly contain standing water habitat types or RIBs (i.e. `2190_a`, '`31..`' types or `rbbah`; types `3260` and `7220` are not included). The datasource is a GeoPackage, organized in the same way as `habitatmap_stdized`. For more details see `read_watersurfaces_hab()`. ## Using the `read_...()` functions In the below R code, it is supposed that a `n2khab_data` folder is present in your working directory or a directory up to 10 levels higher, with the relevant data sources in the right place. See `vignette("v020_datastorage")` for the setup. The advantage of using the `read_...()` functions is the uniform approach to read the above data sources in R, enhancing collaboration and speeding up your work. - Geospatial layers are represented as an `sf` object, in the coordinate reference system (CRS) 'Belge 1972 / Belgian Lambert 72' (EPSG-code [31370](https://epsg.io/31370)). - Dataframes are presented as tibbles. ^[A tibble is a data frame that makes working in the tidyverse a little [easier](https://r4ds.had.co.nz/tibbles.html).] - English & [tidyverse-styled](https://style.tidyverse.org/) names of variables stimulate you to always produce internationalized code, tables and figures. - The `type` variable (if present) is a factor with the same levels as the `type` variable of the `types` reference list (see `read_types()`). - Some variables may be discarded that should normally have no added value, e.g. values that duplicate already existing information. ### Returning raw data sources The functions do just basic preprocessing, in order to return (by default) an object that well reflects the structure of the raw data source and returns all its records. ```{r} habitatmap <- read_habitatmap() habitatmap #> Simple feature collection with 646589 features and 30 fields #> Geometry type: MULTIPOLYGON #> Dimension: XY #> Bounding box: xmin: 21991.38 ymin: 153058.3 xmax: 258871.8 ymax: 244027.3 #> Projected CRS: Belge 1972 / Belgian Lambert 72 #> # A tibble: 646,589 x 31 #> polygon_id eval eenh1 eenh2 eenh3 eenh4 eenh5 eenh6 eenh7 eenh8 v1 v2 #> * #> 1 000098_v20… m b #> 2 000132_v20… m bl #> 3 000135_v20… m bl #> 4 000136_v20… m bl #> 5 000142_v20… m bl #> 6 000150_v20… m bl #> 7 000297_v20… m bl #> 8 000991_v20… m bl #> 9 000999_v20… m bl #> 10 001000_v20… m bl #> # … with 646,579 more rows, and 19 more variables: v3 , source , #> # info , bwk_label , hab1 , phab1 , hab2 , #> # phab2 , hab3 , phab3 , hab4 , phab4 , hab5 , #> # phab5 , source_hab , source_phab , hab_legend , #> # area_m2 , geometry ``` The `habitatmap` object is a very wide data frame which is not so easy to handle in R. For analytical work on habitat types and RIBs, you're advised to use the tidied data source `habitatmap_stdized`; see [below](#processed). `read_habitatmap(filter_hab = TRUE)` will only retain the polygons that occur in `habitatmap_stdized`, i.e. those that contain habitat types or RIBs. The data sources `habitatstreams`, `habitatsprings` and `habitatquarries` have a straightforward data structure. There was no need to generate derived data sources. The meaning of their columns is described in the function documentation. Not all function arguments are discussed in this vignette: do take the time to look at the documentation! ```{r} habitatstreams <- read_habitatstreams() habitatstreams #> Simple feature collection with 560 features and 3 fields #> Geometry type: LINESTRING #> Dimension: XY #> Bounding box: xmin: 33097.92 ymin: 157529.6 xmax: 254039 ymax: 243444.6 #> Projected CRS: Belge 1972 / Belgian Lambert 72 #> # A tibble: 560 x 4 #> river_name source_id type geometry #> #> 1 Wolfputbeek VMM 3260 (127857.1 167681.2, 127854.9 167684.5, 127844 1… #> 2 Oude Kale VMM 3260 (95737.01 196912.9, 95732.82 196912.4, 95710.38… #> 3 Venloop EcoInv 3260 (169352.7 209314.9, 169358.8 209290.5, 169326.2… #> 4 Venloop EcoInv 3260 (169633.6 209293.5, 169625 209289.2, 169594.4 2… #> 5 Kleine Nete EcoInv 3260 (181087.1 208607.2, 181088.6 208608.1, 181089 2… #> 6 Kleine Nete EcoInv 3260 (180037.4 208360.4, 180038.3 208377.5, 180038.3… #> 7 Kleine Nete EcoInv 3260 (180520 208595.7, 180540.5 208607.4, 180541.2 2… #> 8 Kleine Nete EcoInv 3260 (187379.9 209998.8, 187381.3 209998.5, 187381.6… #> 9 Raamdonkseb… extrapol 3260 (183545.5 192409, 183541.9 192406.7, 183541.9 1… #> 10 Kleine Nete EcoInv 3260 (183516.4 208261.7, 183567.3 208279.2, 183567.3… #> # … with 550 more rows ``` With `read_habitatstreams(source_text = TRUE)` a second object `sources` is returned with the meaning of the `source_id` codes: ```{r} read_habitatstreams(source_text = TRUE) %>% .$sources #> # A tibble: 7 x 2 #> source_id source_text #> #> 1 VMM "Gegevens afgeleid van macrofyteninventarisaties uitgevoer… #> 2 EcoInv "Tijdens ecologische inventarisatiestudies uitgevoerd in o… #> 3 extrapol "De conclusie van het nabijgelegen geïnventariseerde segme… #> 4 Van Belleghem (20… "Macrofytengegevens afgeleid van Van Belleghem S., Bal K.,… #> 5 Leyssen ea (2005) "Macrofytengegevens afgeleid van Leyssen A., Adriaens P., … #> 6 WrnBe "Waarnemingen afkomstig van Waarnemingen.be, de website vo… #> 7 INBO "Tijdens veldbezoeken werd de aan- of afwezigheid van het … ``` ```{r} habitatsprings <- read_habitatsprings() habitatsprings #> Simple feature collection with 104 features and 11 fields #> Geometry type: POINT #> Dimension: XY #> Bounding box: xmin: 36407.84 ymin: 155249.7 xmax: 258371.1 ymax: 179732 #> Projected CRS: Belge 1972 / Belgian Lambert 72 #> # A tibble: 104 x 12 #> point_id name system_type code_orig type certain unit_id area_m2 year #> #> 1 1 Steenputb… rivulet 7220 7220 TRUE 1 50 2014 #> 2 2 Steenputb… rivulet 7220 7220 TRUE 1 20 2014 #> 3 3 Duling 7230 7230 TRUE NA NA 2014 #> 4 4 Kapittelb… gh TRUE NA NA 2014 #> 5 5 Remersdaa… rivulet 7220 7220 TRUE 2 200 2014 #> 6 6 Remersdaa… rivulet 7220 7220 TRUE 2 500 2014 #> 7 7 Kesterbeek unknown 7220,gh 7220 FALSE 32 NA NA #> 8 8 Krindaal rivulet 7220 7220 TRUE 3 800 2014 #> 9 9 Bois de B… rivulet 7220 7220 TRUE 4 50 2014 #> 10 10 KwintenHo… mire 7220 7220 TRUE 5 10 2014 #> # … with 94 more rows, and 3 more variables: in_sac , source , #> # geometry ``` Do note that both the `habitatsprings` and `habitatquarries` data sources also contain records which are no habitat type. You can filter the habitat types by setting the `filter_hab` argument as `TRUE` (not shown). ```{r} habitatquarries <- read_habitatquarries() habitatquarries #> Simple feature collection with 45 features and 6 fields #> Geometry type: POLYGON #> Dimension: XY #> Bounding box: xmin: 221427.3 ymin: 160393.5 xmax: 243211.1 ymax: 168965.1 #> Projected CRS: Belge 1972 / Belgian Lambert 72 #> # A tibble: 45 x 7 #> polygon_id unit_id name code_orig type extra_reference #> #> 1 4 4 Avergat - … gh Lahaye 2018 #> 2 6 6 Avergat - … 8310 8310 Lahaye 2018 #> 3 5 5 Avergat - … gh Lahaye 2018 #> 4 20 20 Coolen 8310 8310 Limburgs Landschap 2020; pers… #> 5 21 21 Coolen gh Limburgs Landschap 2020 #> 6 29 29 Groeve Lin… 8310 8310 #> 7 31 31 Grote berg… 8310 8310 De Haan & Lahaye 2018 #> 8 37 37 Grote berg… 8310 8310 De Haan & Lahaye 2018 #> 9 24 24 Henisdael … 8310 8310 Dusar et al. 2007 #> 10 34 34 Henisdael … 8310 8310 Dusar et al. 2007 #> # … with 35 more rows, and 1 more variable: geom ``` The habitatquarries data source also includes the literature references. These can be added as a second data frame -- in that case a list is returned: ```{r} habitatquarries2 <- read_habitatquarries(references = TRUE) class(habitatquarries2) #> [1] "list" ``` ```{r} names(habitatquarries2) #> [1] "habitatquarries" "extra_references" ``` ```{r} all.equal(habitatquarries, habitatquarries2$habitatquarries) #> [1] TRUE ``` ```{r} habitatquarries2$extra_references #> # A tibble: 9 x 23 #> category bibtexkey address author booktitle journal month note number pages #> #> 1 BOOK de_haan_… Brusse… De Ha… #> 2 INCOLLE… dusar_me… Genk Dusar… Likona j… 6-13 #> 3 BOOK jenneken… Riemst Jenne… #> 4 INCOLLE… lahaye_g… Riemst Lahay… 12 #> 5 BOOK verhoeve… Weert Verho… 1769 #> 6 BOOK walschot… Walsc… #> 7 MISC wikipedi… {Wiki… jan Page… #> 8 MISC limburgs… {Limb… apr Libr… #> 9 ARTICLE silverta… Silve… Natuur… 12 334-… #> # … with 13 more variables: publisher , series , title , #> # volume , year , url , isbn , copyright , #> # abstract , language , urldate , issn , keywords ``` These extra references can also be printed to the R console in BibTeX format, when specifying `bibtex = TRUE`. ### Returning processed data sources {#processed} The reading functions for the three processed data sources will always return a **list**: - the first element holds the geospatial information as an **sf object**. It has unique feature IDs that can be joined with rows of the second element; - the second element is a **tibble** that presents attributes of the features as _**[tidy](https://r4ds.had.co.nz/tidy-data.html#tidy-data-1) data**_. Let's see how this works! #### read_habitatmap_stdized() A tidy representation of the `habitatmap` data, restricted to the polygons that contain habitat types or RIBs, is returned by `read_habitatmap_stdized()`: ```{r} hms <- read_habitatmap_stdized() ``` ```{r} hms_pol <- hms$habitatmap_polygons hms_pol #> Simple feature collection with 87781 features and 2 fields #> Geometry type: MULTIPOLYGON #> Dimension: XY #> Bounding box: xmin: 22003.2 ymin: 153084.4 xmax: 258871.8 ymax: 243446.1 #> Projected CRS: Belge 1972 / Belgian Lambert 72 #> # A tibble: 87,781 x 3 #> polygon_id description_orig geom #> * #> 1 130153_v20… 70% 9120_qb; 30% … (((150669.4 227248.6, 150668.1 227242.9, 1505… #> 2 130815_v20… 70% gh; 30% 9160 (((258338.4 158696.1, 258336.5 158693.5, 2583… #> 3 137826_v20… 100% 6430,rbbhf (((181587.2 234938, 181613.1 234933.1, 181646… #> 4 170624_v20… 100% rbbmr (((145876.8 229686.8, 145701.7 229680.6, 1456… #> 5 203261_v20… 70% gh; 30% rbbmr (((117137.2 210307.9, 117136.3 210288.3, 1171… #> 6 204352_v20… 80% gh; 20% rbbsp (((116357.3 159278.3, 116340.5 159259.8, 1163… #> 7 204376_v20… 90% gh; 10% rbbsg (((116110.1 210545.5, 116102.4 210541.1, 1160… #> 8 205188_v20… 60% rbbsp; 40% gh (((232114.7 161594.5, 232122.7 161590.5, 2321… #> 9 205291_v20… 70% gh; 30% rbbsp (((191253.5 160641.5, 191254.8 160636.1, 1912… #> 10 205756_v20… 70% gh; 30% rbbsf (((216258.8 156749, 216260.9 156750, 216287.7… #> # … with 87,771 more rows ``` ```{r} hms_occ <- hms$habitatmap_types hms_occ #> # A tibble: 110,485 x 5 #> polygon_id type certain code_orig phab #> #> 1 000038_v2016 91E0_va TRUE 91E0_va 100 #> 2 000043_v2016 9130_end TRUE 9130_end 100 #> 3 000064_v2020 9130_end TRUE 9130_end 100 #> 4 000132_v2016 9130_end TRUE 9130_end 100 #> 5 000204_v2016 91E0_vn TRUE 91E0_vn 100 #> 6 000255_v2016 91E0_vc TRUE 91E0_vc 100 #> 7 000297_v2016 rbbsp TRUE rbbsp 10 #> 8 000311_v2016 9130_end TRUE 9130_end 70 #> 9 000311_v2016 rbbsp TRUE rbbsp 30 #> 10 000390_v2016 91E0_vn TRUE 91E0_vn 70 #> # … with 110,475 more rows ``` Let's estimate the surface area per type, including uncertain occurrences of types and taking into account cover percentage per polygon (`phab`): ```{r} hms_pol %>% mutate(area = st_area(.)) %>% st_drop_geometry() %>% inner_join(hms_occ, by = "polygon_id") %>% # area of type within polygon: mutate(area_type = area * phab / 100) %>% group_by(type) %>% summarise(area = sum(area_type) %>% set_units("ha") %>% round(2)) #> # A tibble: 101 x 2 #> type area #> [ha] #> 1 1130 5678.87 #> 2 1140 2098.03 #> 3 1310_pol 17.39 #> 4 1310_zk 34.02 #> 5 1310_zv 10.62 #> 6 1320 2.41 #> 7 1330_da 117.61 #> 8 1330_hpr 138.12 #> 9 2110 27.14 #> 10 2120 462.54 #> # … with 91 more rows ``` #### read_habitatmap_terr() `read_habitatmap_terr()` behaves exactly the same way: ```{r} hmt <- read_habitatmap_terr() ``` ```{r} hmt$habitatmap_terr_polygons #> Simple feature collection with 78602 features and 4 fields #> Geometry type: MULTIPOLYGON #> Dimension: XY #> Bounding box: xmin: 22003.2 ymin: 153084.4 xmax: 258871.8 ymax: 243351.8 #> Projected CRS: Belge 1972 / Belgian Lambert 72 #> # A tibble: 78,602 x 5 #> polygon_id description_orig description source geom #> * #> 1 130153_v20… 70% 9120_qb; 30%… 70% 9120_q… habita… (((150669.4 227248.6, 1506… #> 2 130815_v20… 70% gh; 30% 9160 70% gh; 30… habita… (((258338.4 158696.1, 2583… #> 3 137826_v20… 100% 6430,rbbhf 100% 6430_… habita… (((181587.2 234938, 181613… #> 4 170624_v20… 100% rbbmr 100% rbbmr habita… (((145876.8 229686.8, 1457… #> 5 203261_v20… 70% gh; 30% rbbmr 70% gh; 30… habita… (((117137.2 210307.9, 1171… #> 6 204352_v20… 80% gh; 20% rbbsp 80% gh; 20… habita… (((116357.3 159278.3, 1163… #> 7 204376_v20… 90% gh; 10% rbbsg 90% gh; 10… habita… (((116110.1 210545.5, 1161… #> 8 205188_v20… 60% rbbsp; 40% gh 60% rbbsp;… habita… (((232114.7 161594.5, 2321… #> 9 205291_v20… 70% gh; 30% rbbsp 70% gh; 30… habita… (((191253.5 160641.5, 1912… #> 10 205756_v20… 70% gh; 30% rbbsf 70% gh; 30… habita… (((216258.8 156749, 216260… #> # … with 78,592 more rows ``` ```{r} hmt$habitatmap_terr_types #> # A tibble: 99,784 x 6 #> polygon_id type certain code_orig phab source #> #> 1 000038_v2016 91E0_va TRUE 91E0_va 100 habitatmap_stdized #> 2 000043_v2016 9130_end TRUE 9130_end 100 habitatmap_stdized #> 3 000064_v2020 9130_end TRUE 9130_end 100 habitatmap_stdized #> 4 000132_v2016 9130_end TRUE 9130_end 100 habitatmap_stdized #> 5 000204_v2016 91E0_vn TRUE 91E0_vn 100 habitatmap_stdized #> 6 000255_v2016 91E0_vc TRUE 91E0_vc 100 habitatmap_stdized #> 7 000297_v2016 rbbsp TRUE rbbsp 10 habitatmap_stdized #> 8 000311_v2016 9130_end TRUE 9130_end 70 habitatmap_stdized #> 9 000311_v2016 rbbsp TRUE rbbsp 30 habitatmap_stdized #> 10 000390_v2016 91E0_vn TRUE 91E0_vn 70 habitatmap_stdized #> # … with 99,774 more rows ``` Compared to `habitatmap_stdized`, _purely_ aquatic or non-habitat/RIB polygons were omitted, and a part of the type data were interpreted in a more specific way. Further, while type `7220` is present in `habitatmap_terr`, the `read_habitatmap_terr()` function drops it by default because the `habitatsprings` data source is recommended for that. This can be controlled by the `drop_7220` argument. As a consequence, some type codes are completely absent from `habitatmap_terr_types`: ```{r} hms_occ %>% distinct(type) %>% anti_join( hmt$habitatmap_terr_types %>% distinct(type), by = "type" ) %>% arrange(type) #> # A tibble: 7 x 1 #> type #> #> 1 2190 #> 2 6410 #> 3 6430 #> 4 6510 #> 5 7140 #> 6 7220 #> 7 9130 ``` About 3% of all type occurrences received a new type code: ```{r} hmt$habitatmap_terr_types %>% count(source) %>% mutate(pct = (n / sum(n) * 100) %>% round(0)) #> # A tibble: 2 x 3 #> source n pct #> #> 1 habitatmap_stdized 96424 97 #> 2 habitatmap_stdized + interpretation 3360 3 ``` #### read_watersurfaces_hab() A similar story, this time for polygons that (could) have aquatic types: ```{r} wsh <- read_watersurfaces_hab() ``` ```{r} wsh$watersurfaces_polygons #> Simple feature collection with 3233 features and 4 fields #> Geometry type: MULTIPOLYGON #> Dimension: XY #> Bounding box: xmin: 22546.57 ymin: 159273.1 xmax: 253896.9 ymax: 242960.1 #> Projected CRS: Belge 1972 / Belgian Lambert 72 #> # A tibble: 3,233 x 5 #> polygon_id polygon_id_ws polygon_id_habitatm… description_orig #> * #> 1 ANTANT0082 ANTANT0082 596466_v2014 60% 3150; 20% rbbmr; 20% rbbsf #> 2 ANTANT0234 ANTANT0234 633396_v2020 100% 3130_na #> 3 ANTANT0251 ANTANT0251 113978_v2014 100% 3150 #> 4 ANTANT0253 ANTANT0253 111606_v2014 100% 3150 #> 5 ANTANT0297 ANTANT0297 409153_v2014+409153… 85% 3140; 15% 3150+85% 3140; 1… #> 6 ANTANT0315 ANTANT0315 519082_v2018 100% 3140 #> 7 ANTANT0319 ANTANT0319 601958_v2014 100% 3150,gh #> 8 ANTANT0381 ANTANT0381 644003_v2014 85% gh; 15% 3140 #> 9 ANTANT0383 ANTANT0383 631879_v2014+593522… 50% 3150; 40% rbbmr; 10% rbbsf… #> 10 ANTANT0384 ANTANT0384 644003_v2014 85% gh; 15% 3140 #> # … with 3,223 more rows, and 1 more variable: geom ``` ```{r} wsh$watersurfaces_types #> # A tibble: 3,669 x 4 #> polygon_id type certain code_orig #> #> 1 ANTANT0082 3150 TRUE 3150 #> 2 ANTANT0234 3130_na TRUE 3130_na #> 3 ANTANT0251 3150 TRUE 3150 #> 4 ANTANT0253 3150 TRUE 3150 #> 5 ANTANT0297 3140 TRUE 3140 #> 6 ANTANT0297 3150 TRUE 3150 #> 7 ANTANT0315 3140 TRUE 3140 #> 8 ANTANT0319 3150 FALSE 3150,gh #> 9 ANTANT0381 3140 TRUE 3140 #> 10 ANTANT0383 3150 TRUE 3150 #> # … with 3,659 more rows ``` Let's compute some statistics of standing water types (ignoring the value `certain`): ```{r} wsh$watersurfaces_polygons %>% mutate(area = st_area(.)) %>% st_drop_geometry() %>% inner_join(wsh$watersurfaces_types, by = "polygon_id" ) %>% group_by(type) %>% summarise( nr_watersurfaces = n_distinct(polygon_id), total_area = sum(area), area_min = min(area), area_Q1 = quantile(area, 0.25), area_Q2 = quantile(area, 0.5), area_Q3 = quantile(area, 0.75), max = max(area) ) %>% mutate_at( vars(matches("area|max")), function(x) { set_units(x, "a") %>% round(1) } ) #> # A tibble: 9 x 8 #> type nr_watersurfaces total_area area_min area_Q1 area_Q2 area_Q3 max #> [a] [a] [a] [a] [a] [a] #> 1 2190_a 318 3605.6 0.0 0.9 2.1 5.5 317.5 #> 2 3110 6 1876.8 111.7 139.7 216.9 330.3 849.3 #> 3 3130 219 33022.9 0.6 13.0 41.0 160.0 2807.2 #> 4 3130_aom 1267 107732.6 0.1 2.3 9.7 44.1 7062.7 #> 5 3130_na 354 120174.2 1.0 44.7 122.5 277.9 13668.2 #> 6 3140 104 72272.0 0.8 13.3 55.3 418.5 13668.2 #> 7 3150 475 105019.1 0.4 6.6 17.3 83.5 13668.2 #> 8 3160 379 27936.9 0.1 6.1 17.4 59.3 4179.1 #> 9 rbbah 517 33716.3 0.1 4.2 10.9 47.0 3748.6 ``` A strong point of `sf` objects is that the geometry has explicit units. Consequently we can make use of tools like `set_units()` to convert units (e.g. surface area as _are_ (a)). Because the main type code `3130` will mostly boil down to `3130_aom` in the field, a further interpreted flavour can be generated with `read_watersurfaces_hab(interpreted = TRUE)`. ```{r} read_watersurfaces_hab(interpreted = TRUE) %>% .$watersurfaces_types %>% filter(type == "3130") %>% nrow() #> [1] 0 ``` ## Matching interpreted data sources with a data frame column of types The `expand_types()` function helps us to join a type column of your own data frame with the results of `read_habitatmap_terr()` or `read_watersurfaces_hab()`. For both of these datasets, the following conversions of your data frame are supported in order to create optimal joins: - adding extra _subtype_ rows when main type codes occur for which subtypes exist; - adding extra _main type_ rows when certain subtypes of that main type are present (the conditions can be relaxed; see the function documentation). This is supported for 2330, 6230 and 91E0, for which this makes sense with the two mentioned data sources. An example (`df` is our data frame): ```{r eval=TRUE} df <- tribble( ~mycode, ~obs, "2130", 5, "2190", 45, "2330_bu", 8, "2330_dw", 8, "6410_mo", 78, "6410_ve", 4, "91E0_vn", 10 ) df ``` With the `type_var` argument you specify which variable of your data frame represents type codes: ```{r eval=TRUE} df_exp <- expand_types(df, type_var = "mycode") df_exp ``` More examples and features are explained in the documentation of `expand_types()`. Obviously, more rows of `habitatmap_terr` will be retained by joining `df_exp`: ```{r warning=FALSE} hmt$habitatmap_terr_types %>% semi_join(df_exp, by = c(type = "mycode")) %>% nrow() #> [1] 6634 ``` When joining with `df`: ```{r warning=FALSE} hmt$habitatmap_terr_types %>% semi_join(df, by = c(type = "mycode")) %>% nrow() #> [1] 4984 ``` ## References