I have been reading and thinking about the relationship between long term spatial data preservation and the short term needs of day-to-day data security during the life of a research project. With research data being generated at faster and faster rates and the life cycles of supporting technologies getting shorter data preservation is destined to be a continual problem requiring new and smarter solutions every few years. Just dealing with new data that can be produced by Earth observation satellites at the volume of terabytes per day and may exist in several formats as is passes through complex processing stages is enough to take the issue well into the scope of being a serious problem. The sentiment expressed by Moss points to the aspiration of researchers for their hard won data.
“Scientists now want to keep everything, which they assume is digitally possible, in the belief that everything has value and can be retrieved and repurposed.” Michael Moss 2008
The question is; will the technology and the resources exist to meet this aspiration?
It is very easy for a researcher in a Higher Educational institution to secure data in the short term via either their own arrangements or by using the services of a central IT department. In that way data can be backed-up to multiple locations and held on hardware that is up-to-date and covered by manufacturer’s warranties. I am sure this professional approach is found in most (or all) HE institutions. It’s still up to individuals to avail themselves of these services but there are few obstacles standing in the way. Even storage costs are falling and for a few £s per gigabyte a university department can store data in professionally managed institutional servers.
In the hierarchy of data preservation the next levels up become harder for a researcher to arrange. Consider keeping spatial data for third parties to discover and use for the next 5 years. Immediately there is the need for precise and comprehensive metadata. This has been addressed in several ways and the development of specific standards via UK AGMAP has given, anybody who looks, an easy lead into useful metadata creation. For this longer term data storage it may also be necessary to look outside of your home institution to ensure suitable data curation and discoverability. But where do you put the data? It needs an accessible location that links metadata and the data object and makes them discoverable by future researchers. The tools provided by GoGeo provide a solution. Data can be described and even lodged within this service so it becomes searchable and accessible to other researchers.
So there is an infrastructure for this stage of the data management process but it now needs the data producer to step outside of their daily routine and to work on tasks not always considered core for a busy academic looking to their next paper. This 5 year time horizon is also significant in that the European INSPIRE Directive will be in force for its Annex III type data by2013. This means that many university generated geospatial data sets will need to comply with the INSPIRE standards promoting interoperability across boundaries. Possibly more difficult to achieve will be dealing with older data which will also have to meet INSPIRE standards by the next decade.
Once we look beyond the next few years and start to focus on spatial data of high quality or significance things get really interesting and much more challenging. It’s very easy to talk of data archiving and curation as if there are standard easily accessed facilities in every library. The more I have read the more I realised that it’s a far more fluid and developing science than I appreciated.
Who decides which data are in need of professional curation, or which data can we afford to curate? These kinds of questions move the process beyond the researcher into the realms of professional librarians or data curators and government departments working to budgets and polices. All this leads to a further stream of questions: Can data be given to one institution to look after? Can we guarantee that any institution will be a permanent fixture? Will the metadata that was created during data collection still have sufficient context to be useful in 10, 50 or 100 years time? How will the increasing number of data objects be kept searchable and accessible? With hardware life cycles only being a few years, who will ensure the passing on of data to the next technology and will that technology still support the data format? These questions just start to scratch the surface of the issues involved in designing future data curation methods and policies.
Let’s hope that the situation described in the quote below won’t be applied to the early 21st century when looking back in 20 years time.
“In terms of preserving our digital cartographic heritage, the last quarter of the 20th century has some similarities to the dark ages. In many cases, only fragments or written descriptions of the digital maps exist. In other cases, the original data have disappeared or can no longer be accessed due to changes in technical procedures and tools.”Markus Jobst 2010
It’s possible to take this timeline one stage further and start to consider which spatial data sets, that are so important to major scientific discoveries or advancements, should be considered for preservation in the equivalent of a scientific museum that holds the essential heritage of our scientific community.
Now where did I save that first human genome I was given for safe keeping in 2003? Are well not to worry it wasent geospatial data anyway, well not unless the DNA doner had an adress?! Oh and it was a mapping project so I better fnd it…….