Data Diaries - #6 Breeding and wintering birds on Humberside
Data is at the core of what we do as a Local Environmental Records Centre. This series shines a light on the breadth and variety of the records we receive and how we deal with them.
In 2024 NEYEDC received records from an ecological consultancy who had undertaken a large project on the Humberside in 2022, parts of which extend into the NEYEDC area. In total, 12,114 records of birds were sent to us, comprised of two survey types – breeding birds and wintering birds - split into two methodologies, points and lines. Breeding bird data is highly valuable, so this was a particularly important priority for preparation and input into our database.
Records were submitted to us split into four sheets – breeding bird points, breeding bird lines, wintering bird points, and wintering bird lines. The records contained all the standard fields we expect, with some additional fields including time of record (alongside date), BTO Code, and Location ID. Knowing that the records spanned the whole of Humberside, we expected a portion of these to be outside the NEYEDC area. Consequently, the first step in preparation was to map the records in GIS, inspecting each dataset and intersecting the points with the NEYEDC area to exclude anything that fell outside of our region. This also allowed us to complete our standard checks on the records, ensuring there were no obvious errors in the transcription of spatial reference. This resulted in four different spreadsheets with records in the NEYEDC area only – totalling 4,961 records.
Next, the program RStudio was used to identify any cases of duplication between the different spreadsheets. This was completed because we were unsure of whether records could have been duplicated across the two methodologies (points and lines). Therefore, code was created that compared records within the two sets of spreadsheets. It created individual output spreadsheets for breeding bird data and wintering bird data, populated with only records where the exact date and time combination appeared in both spreadsheets. This extraction confirmed that there were no records duplicated across the pairs of spreadsheets, which allowed us to continue with the next stages of preparation.
Spatial reference was in this case given as eastings and northings, so this was converted into OS grid reference format as is standard for our database. This resulted in high-resolution 10-figure grid references, which relate to records being made at the level of individual fields. Breeding and wintering bird data of this resolution is highly valuable in decision-making because records can be attributed to specific locations. Duplicate-checking was then completed again in R within each of the four individual spreadsheets, where a small number of records were excluded. Next, the records were manually checked for instances where males and females (occasionally juveniles) of a species were included within the same record, for instance, a count of 2 with ‘male and female’ given in the ‘Sex of birds’ field. In these cases, the record was effectively duplicated and the count split to produce an individual entry for males, females, or juveniles. This allows for a cleaner preparation for our database and more clearly communicates sex and life stage information to the data user when records are output in reports.
Finally, the remaining columns were checked and edited as necessary. Information in some of the raw data columns (Notes, Location ID) was concatenated to be included in a Comments field. BTO Code was maintained as its own field, ready to be included as a custom field which would appear to data users in our reports. Though not a standard field for all bird data we receive, considering the structured nature of these surveys and the methodologies adhered to, BTO Code information was maintained.
This preparation resulted in four final datasets – breeding bird lines (42 records), breeding bird points (220 records), wintering bird lines (42 records), and wintering bird points (4675 records), ready for import into our database.
This is a notable dataset for its size, formal survey methodology, record resolution, and the fact that the records are recent. These qualities make the data highly valuable for local decision-making. In addition, the dataset includes records of protected, priority, and red-list birds including Skylark, Cuckoo, Greenfinch, Dunlin, Black-tailed Godwit, and Curlew. Now added to our database, these records reflect part of an important evidence base in the Humberside region.