Working with geospatial data sources of habitat (sub)types and RIBs

General note: the below vignette contains frozen output of 12 May 2021. This makes it possible to build the package with vignettes without access to the data sources.

library(n2khab)
library(sf)
library(dplyr)
library(units)

Intro

This vignette learns you:

  • the difference between various geospatial N2KHAB data sources on the occurrence of types in Flanders (types being habitat (sub)types or regionally important biotopes (RIBs); see read_types());
  • how to load these data sources into R;
  • how to optimally match an interpreted data source with a ‘types’ column in your own data frame.

Data sources

The below N2KHAB data sources are available. For practical information on data storage and locations, see vignette("v020_datastorage").

  • raw data sources:
    • habitatmap (Zenodo-link): geospatial polygons of BWK and Natura 2000 habitat types in the Flemish Region, originally published by the Research Institute for Nature and Forest (INBO) (De Saeger et al. 2020) and distributed by ‘Informatie Vlaanderen’.
    • habitatstreams (Zenodo-link): geospatial lines of the Natura 2000 habitat type 3260 that correspond with its presence in streaming water segments in the Flemish Region, originally published by INBO (Leyssen, Smeekens, and Denys 2020) and distributed by ‘Informatie Vlaanderen’.
    • habitatsprings (Zenodo-link): geospatial points hat correspond with the presence or absence of the Natura 2000 habitat type 7220 in springs and streaming water segments in the Flemish Region. The data source is produced, owned and administered by INBO.
    • habitatquarries (Zenodo-link): geospatial polygons that correspond with the presence or absence of the Natura 2000 habitat type 8310 in underground marl quarries in the Flemish Region (and border areas). The data source is produced, owned and administered by INBO.
  • processed data sources:
    • habitatmap_stdized (Zenodo-link): derived from habitatmap. This datasource only contains polygons with habitat (sub)type or RIB codes. All codes conform to the types reference list (see read_types()). The datasource is a GeoPackage with essentially two objects: 1) the geospatial polygons; 2) a long-formatted table with the associated types (multiple records can occur per polygon_id). For more details see read_habitatmap_stdized().
    • habitatmap_terr (Zenodo-link): the further interpreted, terrestrial part of habitatmap_stdized (‘terrestrial’ includes ‘semi-terrestrial’). Amongst other properties, it excludes polygons without terrestrial types, it excludes rows which most probably are no habitat or RIB at all, and several main type codes were translated to a corresponding subtype which they almost always represent. The datasource is a GeoPackage, organized in the same way as habitatmap_stdized. For more details see read_habitatmap_terr().
    • watersurfaces_hab (Zenodo-link): a combination of the environmental data source watersurfaces (Leyssen et al. (2020); see read_watersurfaces()) and the above data source habitatmap_stdized. It represents polygons (from watersurfaces or habitatmap_stdized) that completely or partly contain standing water habitat types or RIBs (i.e. 2190_a, ‘31..’ types or rbbah; types 3260 and 7220 are not included). The datasource is a GeoPackage, organized in the same way as habitatmap_stdized. For more details see read_watersurfaces_hab().

Using the read_...() functions

In the below R code, it is supposed that a n2khab_data folder is present in your working directory or a directory up to 10 levels higher, with the relevant data sources in the right place. See vignette("v020_datastorage") for the setup.

The advantage of using the read_...() functions is the uniform approach to read the above data sources in R, enhancing collaboration and speeding up your work.

  • Geospatial layers are represented as an sf object, in the coordinate reference system (CRS) ‘Belge 1972 / Belgian Lambert 72’ (EPSG-code 31370).
  • Dataframes are presented as tibbles. 1
  • English & tidyverse-styled names of variables stimulate you to always produce internationalized code, tables and figures.
  • The type variable (if present) is a factor with the same levels as the type variable of the types reference list (see read_types()).
  • Some variables may be discarded that should normally have no added value, e.g. values that duplicate already existing information.

Returning raw data sources

The functions do just basic preprocessing, in order to return (by default) an object that well reflects the structure of the raw data source and returns all its records.

habitatmap <- read_habitatmap()
habitatmap
#> Simple feature collection with 646589 features and 30 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 21991.38 ymin: 153058.3 xmax: 258871.8 ymax: 244027.3
#> Projected CRS: Belge 1972 / Belgian Lambert 72
#> # A tibble: 646,589 x 31
#>    polygon_id  eval  eenh1 eenh2 eenh3 eenh4 eenh5 eenh6 eenh7 eenh8 v1    v2
#>  * <chr>       <fct> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#>  1 000098_v20… m     b     <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>
#>  2 000132_v20… m     bl    <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>
#>  3 000135_v20… m     bl    <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>
#>  4 000136_v20… m     bl    <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>
#>  5 000142_v20… m     bl    <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>
#>  6 000150_v20… m     bl    <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>
#>  7 000297_v20… m     bl    <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>
#>  8 000991_v20… m     bl    <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>
#>  9 000999_v20… m     bl    <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>
#> 10 001000_v20… m     bl    <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>
#> # … with 646,579 more rows, and 19 more variables: v3 <chr>, source <chr>,
#> #   info <chr>, bwk_label <chr>, hab1 <chr>, phab1 <int>, hab2 <chr>,
#> #   phab2 <int>, hab3 <chr>, phab3 <int>, hab4 <chr>, phab4 <int>, hab5 <chr>,
#> #   phab5 <int>, source_hab <chr>, source_phab <chr>, hab_legend <fct>,
#> #   area_m2 <dbl>, geometry <MULTIPOLYGON [m]>

The habitatmap object is a very wide data frame which is not so easy to handle in R. For analytical work on habitat types and RIBs, you’re advised to use the tidied data source habitatmap_stdized; see below.

read_habitatmap(filter_hab = TRUE) will only retain the polygons that occur in habitatmap_stdized, i.e. those that contain habitat types or RIBs.

The data sources habitatstreams, habitatsprings and habitatquarries have a straightforward data structure. There was no need to generate derived data sources. The meaning of their columns is described in the function documentation. Not all function arguments are discussed in this vignette: do take the time to look at the documentation!

habitatstreams <- read_habitatstreams()
habitatstreams
#> Simple feature collection with 560 features and 3 fields
#> Geometry type: LINESTRING
#> Dimension:     XY
#> Bounding box:  xmin: 33097.92 ymin: 157529.6 xmax: 254039 ymax: 243444.6
#> Projected CRS: Belge 1972 / Belgian Lambert 72
#> # A tibble: 560 x 4
#>    river_name   source_id type                                          geometry
#>    <fct>        <fct>     <fct>                                 <LINESTRING [m]>
#>  1 Wolfputbeek  VMM       3260  (127857.1 167681.2, 127854.9 167684.5, 127844 1…
#>  2 Oude Kale    VMM       3260  (95737.01 196912.9, 95732.82 196912.4, 95710.38…
#>  3 Venloop      EcoInv    3260  (169352.7 209314.9, 169358.8 209290.5, 169326.2…
#>  4 Venloop      EcoInv    3260  (169633.6 209293.5, 169625 209289.2, 169594.4 2…
#>  5 Kleine Nete  EcoInv    3260  (181087.1 208607.2, 181088.6 208608.1, 181089 2…
#>  6 Kleine Nete  EcoInv    3260  (180037.4 208360.4, 180038.3 208377.5, 180038.3…
#>  7 Kleine Nete  EcoInv    3260  (180520 208595.7, 180540.5 208607.4, 180541.2 2…
#>  8 Kleine Nete  EcoInv    3260  (187379.9 209998.8, 187381.3 209998.5, 187381.6…
#>  9 Raamdonkseb… extrapol  3260  (183545.5 192409, 183541.9 192406.7, 183541.9 1…
#> 10 Kleine Nete  EcoInv    3260  (183516.4 208261.7, 183567.3 208279.2, 183567.3…
#> # … with 550 more rows

With read_habitatstreams(source_text = TRUE) a second object sources is returned with the meaning of the source_id codes:

read_habitatstreams(source_text = TRUE) %>%
  .$sources
#> # A tibble: 7 x 2
#>   source_id          source_text
#>   <fct>              <fct>
#> 1 VMM                "Gegevens afgeleid van macrofyteninventarisaties uitgevoer…
#> 2 EcoInv             "Tijdens ecologische inventarisatiestudies uitgevoerd in o…
#> 3 extrapol           "De conclusie van het nabijgelegen geïnventariseerde segme…
#> 4 Van Belleghem (20… "Macrofytengegevens afgeleid van Van Belleghem S., Bal K.,…
#> 5 Leyssen ea (2005)  "Macrofytengegevens afgeleid van Leyssen A., Adriaens P., …
#> 6 WrnBe              "Waarnemingen afkomstig van Waarnemingen.be, de website vo…
#> 7 INBO               "Tijdens veldbezoeken werd de aan- of afwezigheid van het …
habitatsprings <- read_habitatsprings()
habitatsprings
#> Simple feature collection with 104 features and 11 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: 36407.84 ymin: 155249.7 xmax: 258371.1 ymax: 179732
#> Projected CRS: Belge 1972 / Belgian Lambert 72
#> # A tibble: 104 x 12
#>    point_id name       system_type code_orig type  certain unit_id area_m2  year
#>       <dbl> <chr>      <fct>       <chr>     <fct> <lgl>     <dbl>   <dbl> <dbl>
#>  1        1 Steenputb… rivulet     7220      7220  TRUE          1      50  2014
#>  2        2 Steenputb… rivulet     7220      7220  TRUE          1      20  2014
#>  3        3 Duling     <NA>        7230      7230  TRUE         NA      NA  2014
#>  4        4 Kapittelb… <NA>        gh        <NA>  TRUE         NA      NA  2014
#>  5        5 Remersdaa… rivulet     7220      7220  TRUE          2     200  2014
#>  6        6 Remersdaa… rivulet     7220      7220  TRUE          2     500  2014
#>  7        7 Kesterbeek unknown     7220,gh   7220  FALSE        32      NA    NA
#>  8        8 Krindaal   rivulet     7220      7220  TRUE          3     800  2014
#>  9        9 Bois de B… rivulet     7220      7220  TRUE          4      50  2014
#> 10       10 KwintenHo… mire        7220      7220  TRUE          5      10  2014
#> # … with 94 more rows, and 3 more variables: in_sac <lgl>, source <chr>,
#> #   geometry <POINT [m]>

Do note that both the habitatsprings and habitatquarries data sources also contain records which are no habitat type. You can filter the habitat types by setting the filter_hab argument as TRUE (not shown).

habitatquarries <- read_habitatquarries()
habitatquarries
#> Simple feature collection with 45 features and 6 fields
#> Geometry type: POLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 221427.3 ymin: 160393.5 xmax: 243211.1 ymax: 168965.1
#> Projected CRS: Belge 1972 / Belgian Lambert 72
#> # A tibble: 45 x 7
#>    polygon_id unit_id name        code_orig type  extra_reference
#>         <int>   <int> <chr>       <chr>     <fct> <fct>
#>  1          4       4 Avergat - … gh        <NA>  Lahaye 2018
#>  2          6       6 Avergat - … 8310      8310  Lahaye 2018
#>  3          5       5 Avergat - … gh        <NA>  Lahaye 2018
#>  4         20      20 Coolen      8310      8310  Limburgs Landschap 2020; pers…
#>  5         21      21 Coolen      gh        <NA>  Limburgs Landschap 2020
#>  6         29      29 Groeve Lin… 8310      8310  <NA>
#>  7         31      31 Grote berg… 8310      8310  De Haan & Lahaye 2018
#>  8         37      37 Grote berg… 8310      8310  De Haan & Lahaye 2018
#>  9         24      24 Henisdael … 8310      8310  Dusar et al. 2007
#> 10         34      34 Henisdael … 8310      8310  Dusar et al. 2007
#> # … with 35 more rows, and 1 more variable: geom <POLYGON [m]>

The habitatquarries data source also includes the literature references. These can be added as a second data frame – in that case a list is returned:

habitatquarries2 <- read_habitatquarries(references = TRUE)
class(habitatquarries2)
#> [1] "list"
names(habitatquarries2)
#> [1] "habitatquarries"  "extra_references"
all.equal(habitatquarries, habitatquarries2$habitatquarries)
#> [1] TRUE
habitatquarries2$extra_references
#> # A tibble: 9 x 23
#>   category bibtexkey address author booktitle journal month note  number pages
#>   <chr>    <chr>     <chr>   <chr>  <chr>     <chr>   <chr> <chr> <chr>  <chr>
#> 1 BOOK     de_haan_… Brusse… De Ha… <NA>      <NA>    <NA>  <NA>  <NA>   <NA>
#> 2 INCOLLE… dusar_me… Genk    Dusar… Likona j… <NA>    <NA>  <NA>  <NA>   6-13
#> 3 BOOK     jenneken… Riemst  Jenne… <NA>      <NA>    <NA>  <NA>  <NA>   <NA>
#> 4 INCOLLE… lahaye_g… Riemst  Lahay… <NA>      <NA>    <NA>  <NA>  <NA>   12
#> 5 BOOK     verhoeve… Weert   Verho… <NA>      <NA>    <NA>  <NA>  1769   <NA>
#> 6 BOOK     walschot… <NA>    Walsc… <NA>      <NA>    <NA>  <NA>  <NA>   <NA>
#> 7 MISC     wikipedi… <NA>    {Wiki… <NA>      <NA>    jan   Page… <NA>   <NA>
#> 8 MISC     limburgs… <NA>    {Limb… <NA>      <NA>    apr   Libr… <NA>   <NA>
#> 9 ARTICLE  silverta… <NA>    Silve… <NA>      Natuur… <NA>  <NA>  12     334-…
#> # … with 13 more variables: publisher <chr>, series <chr>, title <chr>,
#> #   volume <chr>, year <dbl>, url <chr>, isbn <chr>, copyright <chr>,
#> #   abstract <chr>, language <chr>, urldate <chr>, issn <chr>, keywords <chr>

These extra references can also be printed to the R console in BibTeX format, when specifying bibtex = TRUE.

Returning processed data sources

The reading functions for the three processed data sources will always return a list:

  • the first element holds the geospatial information as an sf object. It has unique feature IDs that can be joined with rows of the second element;
  • the second element is a tibble that presents attributes of the features as tidy data.

Let’s see how this works!

read_habitatmap_stdized()

A tidy representation of the habitatmap data, restricted to the polygons that contain habitat types or RIBs, is returned by read_habitatmap_stdized():

hms <- read_habitatmap_stdized()
hms_pol <- hms$habitatmap_polygons
hms_pol
#> Simple feature collection with 87781 features and 2 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 22003.2 ymin: 153084.4 xmax: 258871.8 ymax: 243446.1
#> Projected CRS: Belge 1972 / Belgian Lambert 72
#> # A tibble: 87,781 x 3
#>    polygon_id  description_orig                                             geom
#>  * <fct>       <chr>                                          <MULTIPOLYGON [m]>
#>  1 130153_v20… 70% 9120_qb; 30% … (((150669.4 227248.6, 150668.1 227242.9, 1505…
#>  2 130815_v20… 70% gh; 30% 9160   (((258338.4 158696.1, 258336.5 158693.5, 2583…
#>  3 137826_v20… 100% 6430,rbbhf    (((181587.2 234938, 181613.1 234933.1, 181646…
#>  4 170624_v20… 100% rbbmr         (((145876.8 229686.8, 145701.7 229680.6, 1456…
#>  5 203261_v20… 70% gh; 30% rbbmr  (((117137.2 210307.9, 117136.3 210288.3, 1171…
#>  6 204352_v20… 80% gh; 20% rbbsp  (((116357.3 159278.3, 116340.5 159259.8, 1163…
#>  7 204376_v20… 90% gh; 10% rbbsg  (((116110.1 210545.5, 116102.4 210541.1, 1160…
#>  8 205188_v20… 60% rbbsp; 40% gh  (((232114.7 161594.5, 232122.7 161590.5, 2321…
#>  9 205291_v20… 70% gh; 30% rbbsp  (((191253.5 160641.5, 191254.8 160636.1, 1912…
#> 10 205756_v20… 70% gh; 30% rbbsf  (((216258.8 156749, 216260.9 156750, 216287.7…
#> # … with 87,771 more rows
hms_occ <- hms$habitatmap_types
hms_occ
#> # A tibble: 110,485 x 5
#>    polygon_id   type     certain code_orig  phab
#>    <fct>        <fct>    <lgl>   <chr>     <int>
#>  1 000038_v2016 91E0_va  TRUE    91E0_va     100
#>  2 000043_v2016 9130_end TRUE    9130_end    100
#>  3 000064_v2020 9130_end TRUE    9130_end    100
#>  4 000132_v2016 9130_end TRUE    9130_end    100
#>  5 000204_v2016 91E0_vn  TRUE    91E0_vn     100
#>  6 000255_v2016 91E0_vc  TRUE    91E0_vc     100
#>  7 000297_v2016 rbbsp    TRUE    rbbsp        10
#>  8 000311_v2016 9130_end TRUE    9130_end     70
#>  9 000311_v2016 rbbsp    TRUE    rbbsp        30
#> 10 000390_v2016 91E0_vn  TRUE    91E0_vn      70
#> # … with 110,475 more rows

Let’s estimate the surface area per type, including uncertain occurrences of types and taking into account cover percentage per polygon (phab):

hms_pol %>%
  mutate(area = st_area(.)) %>%
  st_drop_geometry() %>%
  inner_join(hms_occ, by = "polygon_id") %>%
  # area of type within polygon:
  mutate(area_type = area * phab / 100) %>%
  group_by(type) %>%
  summarise(area = sum(area_type) %>%
    set_units("ha") %>%
    round(2))
#> # A tibble: 101 x 2
#>    type        area
#>    <fct>       [ha]
#>  1 1130     5678.87
#>  2 1140     2098.03
#>  3 1310_pol   17.39
#>  4 1310_zk    34.02
#>  5 1310_zv    10.62
#>  6 1320        2.41
#>  7 1330_da   117.61
#>  8 1330_hpr  138.12
#>  9 2110       27.14
#> 10 2120      462.54
#> # … with 91 more rows

read_habitatmap_terr()

read_habitatmap_terr() behaves exactly the same way:

hmt <- read_habitatmap_terr()
hmt$habitatmap_terr_polygons
#> Simple feature collection with 78602 features and 4 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 22003.2 ymin: 153084.4 xmax: 258871.8 ymax: 243351.8
#> Projected CRS: Belge 1972 / Belgian Lambert 72
#> # A tibble: 78,602 x 5
#>    polygon_id  description_orig  description source                         geom
#>  * <fct>       <chr>             <chr>       <fct>            <MULTIPOLYGON [m]>
#>  1 130153_v20… 70% 9120_qb; 30%… 70% 9120_q… habita… (((150669.4 227248.6, 1506…
#>  2 130815_v20… 70% gh; 30% 9160  70% gh; 30… habita… (((258338.4 158696.1, 2583…
#>  3 137826_v20… 100% 6430,rbbhf   100% 6430_… habita… (((181587.2 234938, 181613…
#>  4 170624_v20… 100% rbbmr        100% rbbmr  habita… (((145876.8 229686.8, 1457…
#>  5 203261_v20… 70% gh; 30% rbbmr 70% gh; 30… habita… (((117137.2 210307.9, 1171…
#>  6 204352_v20… 80% gh; 20% rbbsp 80% gh; 20… habita… (((116357.3 159278.3, 1163…
#>  7 204376_v20… 90% gh; 10% rbbsg 90% gh; 10… habita… (((116110.1 210545.5, 1161…
#>  8 205188_v20… 60% rbbsp; 40% gh 60% rbbsp;… habita… (((232114.7 161594.5, 2321…
#>  9 205291_v20… 70% gh; 30% rbbsp 70% gh; 30… habita… (((191253.5 160641.5, 1912…
#> 10 205756_v20… 70% gh; 30% rbbsf 70% gh; 30… habita… (((216258.8 156749, 216260…
#> # … with 78,592 more rows
hmt$habitatmap_terr_types
#> # A tibble: 99,784 x 6
#>    polygon_id   type     certain code_orig  phab source
#>    <fct>        <fct>    <lgl>   <chr>     <int> <fct>
#>  1 000038_v2016 91E0_va  TRUE    91E0_va     100 habitatmap_stdized
#>  2 000043_v2016 9130_end TRUE    9130_end    100 habitatmap_stdized
#>  3 000064_v2020 9130_end TRUE    9130_end    100 habitatmap_stdized
#>  4 000132_v2016 9130_end TRUE    9130_end    100 habitatmap_stdized
#>  5 000204_v2016 91E0_vn  TRUE    91E0_vn     100 habitatmap_stdized
#>  6 000255_v2016 91E0_vc  TRUE    91E0_vc     100 habitatmap_stdized
#>  7 000297_v2016 rbbsp    TRUE    rbbsp        10 habitatmap_stdized
#>  8 000311_v2016 9130_end TRUE    9130_end     70 habitatmap_stdized
#>  9 000311_v2016 rbbsp    TRUE    rbbsp        30 habitatmap_stdized
#> 10 000390_v2016 91E0_vn  TRUE    91E0_vn      70 habitatmap_stdized
#> # … with 99,774 more rows

Compared to habitatmap_stdized, purely aquatic or non-habitat/RIB polygons were omitted, and a part of the type data were interpreted in a more specific way. Further, while type 7220 is present in habitatmap_terr, the read_habitatmap_terr() function drops it by default because the habitatsprings data source is recommended for that. This can be controlled by the drop_7220 argument.

As a consequence, some type codes are completely absent from habitatmap_terr_types:

hms_occ %>%
  distinct(type) %>%
  anti_join(
    hmt$habitatmap_terr_types %>%
      distinct(type),
    by = "type"
  ) %>%
  arrange(type)
#> # A tibble: 7 x 1
#>   type
#>   <fct>
#> 1 2190
#> 2 6410
#> 3 6430
#> 4 6510
#> 5 7140
#> 6 7220
#> 7 9130

About 3% of all type occurrences received a new type code:

hmt$habitatmap_terr_types %>%
  count(source) %>%
  mutate(pct = (n / sum(n) * 100) %>% round(0))
#> # A tibble: 2 x 3
#>   source                                  n   pct
#>   <fct>                               <int> <dbl>
#> 1 habitatmap_stdized                  96424    97
#> 2 habitatmap_stdized + interpretation  3360     3

read_watersurfaces_hab()

A similar story, this time for polygons that (could) have aquatic types:

wsh <- read_watersurfaces_hab()
wsh$watersurfaces_polygons
#> Simple feature collection with 3233 features and 4 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 22546.57 ymin: 159273.1 xmax: 253896.9 ymax: 242960.1
#> Projected CRS: Belge 1972 / Belgian Lambert 72
#> # A tibble: 3,233 x 5
#>    polygon_id polygon_id_ws polygon_id_habitatm… description_orig
#>  * <fct>      <fct>         <fct>                <chr>
#>  1 ANTANT0082 ANTANT0082    596466_v2014         60% 3150; 20% rbbmr; 20% rbbsf
#>  2 ANTANT0234 ANTANT0234    633396_v2020         100% 3130_na
#>  3 ANTANT0251 ANTANT0251    113978_v2014         100% 3150
#>  4 ANTANT0253 ANTANT0253    111606_v2014         100% 3150
#>  5 ANTANT0297 ANTANT0297    409153_v2014+409153… 85% 3140; 15% 3150+85% 3140; 1…
#>  6 ANTANT0315 ANTANT0315    519082_v2018         100% 3140
#>  7 ANTANT0319 ANTANT0319    601958_v2014         100% 3150,gh
#>  8 ANTANT0381 ANTANT0381    644003_v2014         85% gh; 15% 3140
#>  9 ANTANT0383 ANTANT0383    631879_v2014+593522… 50% 3150; 40% rbbmr; 10% rbbsf…
#> 10 ANTANT0384 ANTANT0384    644003_v2014         85% gh; 15% 3140
#> # … with 3,223 more rows, and 1 more variable: geom <MULTIPOLYGON [m]>
wsh$watersurfaces_types
#> # A tibble: 3,669 x 4
#>    polygon_id type    certain code_orig
#>    <fct>      <fct>   <lgl>   <chr>
#>  1 ANTANT0082 3150    TRUE    3150
#>  2 ANTANT0234 3130_na TRUE    3130_na
#>  3 ANTANT0251 3150    TRUE    3150
#>  4 ANTANT0253 3150    TRUE    3150
#>  5 ANTANT0297 3140    TRUE    3140
#>  6 ANTANT0297 3150    TRUE    3150
#>  7 ANTANT0315 3140    TRUE    3140
#>  8 ANTANT0319 3150    FALSE   3150,gh
#>  9 ANTANT0381 3140    TRUE    3140
#> 10 ANTANT0383 3150    TRUE    3150
#> # … with 3,659 more rows

Let’s compute some statistics of standing water types (ignoring the value certain):

wsh$watersurfaces_polygons %>%
  mutate(area = st_area(.)) %>%
  st_drop_geometry() %>%
  inner_join(wsh$watersurfaces_types,
    by = "polygon_id"
  ) %>%
  group_by(type) %>%
  summarise(
    nr_watersurfaces = n_distinct(polygon_id),
    total_area = sum(area),
    area_min = min(area),
    area_Q1 = quantile(area, 0.25),
    area_Q2 = quantile(area, 0.5),
    area_Q3 = quantile(area, 0.75),
    max = max(area)
  ) %>%
  mutate_at(
    vars(matches("area|max")),
    function(x) {
      set_units(x, "a") %>% round(1)
    }
  )
#> # A tibble: 9 x 8
#>   type     nr_watersurfaces total_area area_min area_Q1 area_Q2 area_Q3     max
#>   <fct>               <int>        [a]      [a]     [a]     [a]     [a]     [a]
#> 1 2190_a                318     3605.6      0.0     0.9     2.1     5.5   317.5
#> 2 3110                    6     1876.8    111.7   139.7   216.9   330.3   849.3
#> 3 3130                  219    33022.9      0.6    13.0    41.0   160.0  2807.2
#> 4 3130_aom             1267   107732.6      0.1     2.3     9.7    44.1  7062.7
#> 5 3130_na               354   120174.2      1.0    44.7   122.5   277.9 13668.2
#> 6 3140                  104    72272.0      0.8    13.3    55.3   418.5 13668.2
#> 7 3150                  475   105019.1      0.4     6.6    17.3    83.5 13668.2
#> 8 3160                  379    27936.9      0.1     6.1    17.4    59.3  4179.1
#> 9 rbbah                 517    33716.3      0.1     4.2    10.9    47.0  3748.6

A strong point of sf objects is that the geometry has explicit units. Consequently we can make use of tools like set_units() to convert units (e.g. surface area as are (a)).

Because the main type code 3130 will mostly boil down to 3130_aom in the field, a further interpreted flavour can be generated with read_watersurfaces_hab(interpreted = TRUE).

read_watersurfaces_hab(interpreted = TRUE) %>%
  .$watersurfaces_types %>%
  filter(type == "3130") %>%
  nrow()
#> [1] 0

Matching interpreted data sources with a data frame column of types

The expand_types() function helps us to join a type column of your own data frame with the results of read_habitatmap_terr() or read_watersurfaces_hab().

For both of these datasets, the following conversions of your data frame are supported in order to create optimal joins:

  • adding extra subtype rows when main type codes occur for which subtypes exist;
  • adding extra main type rows when certain subtypes of that main type are present (the conditions can be relaxed; see the function documentation). This is supported for 2330, 6230 and 91E0, for which this makes sense with the two mentioned data sources.

An example (df is our data frame):

df <-
  tribble(
    ~mycode, ~obs,
    "2130", 5,
    "2190", 45,
    "2330_bu", 8,
    "2330_dw", 8,
    "6410_mo", 78,
    "6410_ve", 4,
    "91E0_vn", 10
  )
df
#> # A tibble: 7 × 2
#>   mycode    obs
#>   <chr>   <dbl>
#> 1 2130        5
#> 2 2190       45
#> 3 2330_bu     8
#> 4 2330_dw     8
#> 5 6410_mo    78
#> 6 6410_ve     4
#> 7 91E0_vn    10

With the type_var argument you specify which variable of your data frame represents type codes:

df_exp <-
  expand_types(df, type_var = "mycode")
df_exp
#> # A tibble: 13 × 2
#>    mycode        obs
#>    <chr>       <dbl>
#>  1 2130            5
#>  2 2190           45
#>  3 2330_bu         8
#>  4 2330_dw         8
#>  5 6410_mo        78
#>  6 6410_ve         4
#>  7 91E0_vn        10
#>  8 2130_had        5
#>  9 2130_hd         5
#> 10 2190_a         45
#> 11 2190_mp        45
#> 12 2190_overig    45
#> 13 2330            8

More examples and features are explained in the documentation of expand_types().

Obviously, more rows of habitatmap_terr will be retained by joining df_exp:

hmt$habitatmap_terr_types %>%
  semi_join(df_exp, by = c(type = "mycode")) %>%
  nrow()
#> [1] 6634

When joining with df:

hmt$habitatmap_terr_types %>%
  semi_join(df, by = c(type = "mycode")) %>%
  nrow()
#> [1] 4984

References

De Saeger, Steven, Robin Guelinckx, Patrik Oosterlynck, Adinda De Bruyn, Klaas Debusschere, Pieter Dhaluin, Rémar Erens, et al. 2020. Biologische Waarderingskaart en Natura 2000 Habitatkaart, uitgave 2020. Reports of the Research Institute for Nature and Forest. Brussels: Research Institute for Nature and Forest (INBO). https://doi.org/10.21436/inbor.18840851.
Leyssen, An, Kevin Scheers, Vincent Smeekens, Carine Wils, Jo Packet, Geert De Knijf, and Luc Denys. 2020. Watervlakken versie 1.1: polygonenkaart van stilstaand water in Vlaanderen: uitgave 2020. Reports of the Research Institute for Nature and Forest. Brussels: Research Institute for Nature and Forest (INBO). https://doi.org/10.21436/inbor.19088385.
Leyssen, An, Vincent Smeekens, and Luc Denys. 2020. Indicatieve situering van het Natura 2000 habitattype 3260: Submontane en laaglandrivieren met vegetaties behorend tot het Ranunculion fluitantis en het Callitricho-Batrachion. Uitgave 2020 (versie 1.7). Reports of the Research Institute for Nature and Forest. Brussels: Research Institute for Nature and Forest (INBO). https://doi.org/10.21436/inbor.18903609.

  1. A tibble is a data frame that makes working in the tidyverse a little easier.↩︎