Data Diaries #5 - City Nature Challenge data in iNaturalist
Data is at the core of what we do as a Local Environmental Records Centre. This series shines a light on the breadth and variety of the records we receive and how we deal with them.
This year Hull took part in the City Nature Challenge for the second time. The City Nature Challenge is an annual 4-day bioblitz event, seeing hundreds of cities and regions across the world record the flora and fauna in their area. Hull took part for the first time in 2023 and signed up again for the 2024 event, both led by NEYEDC. This year’s event ended officially on the 6th May. You can read more about our results here: https://www.neyedc.org.uk/updates-insights/2024/5/17/hull-city-nature-challenge-2024-the-results-are-in
Over the course of this year’s event, participants from around Hull collected observations of over 7,000 plants and animals within the Hull city boundary on the iNaturalist platform. Through submission to iNaturalist, records are verified in-app by community users and become available to be used for purposes of research, conservation, and decision-making. As such, this year’s event gave us access to thousands of biological records – a nice new dataset for NEYEDC to work on! This dataset is a great demonstration of the differences in how we deal with data data sourced from citizen science events, or from online recording schemes.
The majority of the datasets or records we receive at NEYEDC are submitted to us directly – we receive a spreadsheet or GIS layer of records, usually prepared to some extent by an existing recorder or organisation with experience in biological recording, where records often have had some form of validation or verification methods already applied before they come to us.
In this way, the records from the City Nature Challenge (CNC) differed. To access them, we exported them from iNaturalist, one of many online recording systems that can be used to make and store biological records (others include iRecord and NBN). Once we had a copy of the ~7,000 records collected during the CNC, we could start to prepare them for inclusion in our database, undertaking many of our usual checks and validation procedures, but with some additional steps needed.
Whilst citizen science initiatives and online recording systems give us access to an amazing wealth of information, there are pitfalls and challenges to accessing data in this way. One of the first steps that had to be taken to prepare the CNC data was to remove any records that had a non-commercial license attached to them. Understanding licensing is an important part of our role as a records centre. Licensing lets us know what kinds of permissions data owners have applied to their data, and therefore indicates what that data can be used for. We must adhere to these licenses and as such, any records within iNaturalist that had a non-commercial (CC-BY-NC) license attached to them had to be removed from the dataset before moving to the next stage of preparation. Because our records are utilised in ecological data searches, a chargeable core service that helps fund our centre and inform the planning process, all data utilised in this manner must either have a CC-BY (creative commons with attribution) or CC0 (creative commons) license attached, unless we have prior permission to use data. Of 7,048 records, 1,819 had a CC-BY-NC license attached, leaving us with 5,229 records.
Next, the quality of the records had to be assessed. iNaturalist works on a system whereby records (or observations) are made with a photograph attached and uploaded to the app. The user who makes the observation submits their own identification (if known) alongside their record, and the record is marked as ‘Needs ID’. Other users on iNaturalist can then provide their own ID for the record, either agreeing with the original recorder or submitting a different ID. Once 2/3 contributors have agreed on an identification, the record is marked as ‘Research Grade’ – in iNaturalist this would be considered a correct ID. These records proceed down a pipeline to the iRecord platform – in time – where they can be externally validated. ‘Casual’ is a tag applied to any records that are submitted to iNaturalist without a photo, video, or sound recording attached. As these can’t be reliably identified in any way, they always remain as ‘Casual’. Taking this into account, only ‘Research Grade’ records, of which there were 2,398 out of the 5,229 records from the previous stage, were carried through to the next stage of the preparation process.
(N.B Whilst the above process considers the given identifications ‘research grade’, NEYEDC applies a ‘Considered Correct’ verification status for these records rather than ‘Correct’. The metadata for this dataset outlines the iNaturalist process for data users.)
After the two above processes, the remaining records are both licensed for use and considered reliable enough to be entered into our database. Next, standard procedures are undertaken including formatting the iNaturalist output fields to match our own required fields/columns, ensuring each record has the minimum information required, and removing any information that cannot be shared. For example, because of the manner in which users submit observations to iNaturalist (usually using a smartphone with location assist), iNaturalist generates a guess for a location name based on the spatial reference of the record. Many of these generated locations included postcodes, which had to be removed prior to upload to our database. Unless necessary in a small number of cases (for example, records of Swifts relating to specific buildings), the inclusion of postcodes can be a breach of privacy, for example, if a member of the public has made a record in their garden but would not wish to share its exact location.
The records are now formatted suitably to be added to our database. The upload process undertakes more verification on species names (some of which need to be checked/changed as needed from names provided in iNaturalist, which as an international platform sometimes uses names which don’t appear in the Natural History Museum dictionary), locations (checking they fall within the NEYEDC area and that that species reasonably occurs in that area), and dates. A very small number of records were eliminated from the process either due to being domesticated or cultivated specimens.
The result is a cleaned and processed dataset of 2,392 records all within the Hull city boundary. Traditionally we have received fewer records from Hull compared to other regions that NEYEDC covers, and as such this dataset is a valuable addition which vastly increases recording coverage across the city, in conjunction with records from the 2023 event. The City Nature Challenge and the resulting data is an excellent illustration of the way in which data from citizen science events, with the right considerations and preparation, can go on to have a real impact locally.