Aug 012011

I have been reading and  thinking about the relationship between long term spatial data preservation and the short term needs of day-to-day data security during the life of a research project. With research data being generated at faster and faster rates and the life cycles of supporting technologies getting shorter data preservation is destined to be a continual problem requiring new and smarter solutions every few years. Just dealing with new data that can be produced by Earth observation satellites at the volume of terabytes per day and may exist in several formats as is passes through complex processing stages is enough to take the issue  well into the scope of being a serious problem.  The sentiment expressed by Moss points to the aspiration of researchers for their hard won data.

“Scientists now want to keep everything, which they assume is digitally possible, in the belief that everything has value and can be retrieved and repurposed.”  Michael Moss 2008

The question is; will the technology and the resources exist to meet this aspiration?

It is very easy for a researcher in a Higher Educational institution to secure data in the short term via either their own arrangements or by using the services of a central IT department. In that way data can be backed-up to multiple locations and held on hardware that is up-to-date and covered by manufacturer’s warranties. I am sure this professional approach is found in most (or all) HE institutions. It’s still up to individuals to avail themselves of these services but there are few obstacles standing in the way. Even storage costs are falling and for a few £s per gigabyte a university department can store data in professionally managed institutional servers.  

In the hierarchy of data preservation the next levels up become harder for a researcher to arrange.  Consider keeping spatial data for third parties to discover and use for the next 5 years. Immediately there is the need for precise and comprehensive metadata. This has been addressed in several ways and the development of specific standards via UK AGMAP has given, anybody who looks, an easy lead into useful metadata creation. For this longer term data storage it may also be necessary to look outside of your home institution to ensure suitable data curation and discoverability.  But where do you put the data? It needs an accessible location that links metadata and the data object and makes them discoverable by future researchers.  The tools provided by GoGeo provide a solution. Data can be described and even lodged within this service so it becomes searchable and accessible to other researchers. 

So there is an infrastructure for this stage of the data management process but it now needs the data producer to step outside of their daily routine and to work on tasks not always considered core for a busy academic looking to their next paper.  This 5 year time horizon is also significant in that the European INSPIRE Directive will be in force   for its Annex III type data by2013. This means that many university generated geospatial data sets will need to comply with the INSPIRE standards promoting interoperability across boundaries. Possibly more difficult to achieve will be dealing with older data which will also have to meet INSPIRE standards by the next decade.

Once we look beyond the next few years and start to focus on spatial data of high quality or significance things get really interesting and much more challenging.   It’s very easy to talk of data archiving and curation as if there are standard easily accessed facilities in every library. The more I have read the more I realised that it’s a far more fluid and developing science than I appreciated. 

Who decides which data are in need of professional curation, or which data can we afford to curate?  These kinds of questions move the process beyond the researcher into the realms of professional librarians or data curators and government departments working to budgets and polices.  All this leads to a further stream of questions: Can data be given to one institution to look after? Can we guarantee that any institution will be a permanent fixture?  Will the metadata that was created during data collection still have sufficient context to be useful in 10, 50 or 100 years time? How will the increasing number of data objects be kept searchable and accessible?  With hardware life cycles only being a few years, who will ensure the passing on of data to the next technology and will that technology still support the data format? These questions just start to scratch the surface of the issues involved in designing future data curation methods and policies.

Let’s hope that the situation described in the quote below won’t be applied to the early 21st century when looking back in 20 years time.

“In terms of preserving our digital cartographic heritage, the last quarter of the 20th century has some similarities to the dark ages. In many cases, only fragments or written descriptions of the digital maps exist. In other cases, the original data have disappeared or can no longer be accessed due to changes in technical procedures and tools.”Markus Jobst 2010

It’s possible to take this timeline one stage further and start to consider which spatial data sets, that are so important to major scientific discoveries or advancements, should be considered for preservation in  the equivalent of a scientific museum that holds the essential heritage of our scientific community.

Now where did I save that first human genome I was given for safe keeping in 2003? Are well not to worry it wasent geospatial data anyway, well not unless the DNA doner had an adress?! Oh and it was a mapping project so I better fnd it…….

 Posted by at 15:18 asides Tagged with: , , , ,  Comments Off on Don’t get your Back-Up over Data Preservation
Apr 182011

Information is now being collated on available data sets to incorporate in this project. We have identified a number of case study users from the Institute of Geography and Earth Sciences (IGES), Aberystwyth University and Forest Research in Wales, Forestry Commission who have previously and are currently working on projects based in the Dyfi Biosphere.

As part of the process for gathering this information users are being actively encouraged to create dataset metadata using GeoDoc tool – found within the GoGeo area on the EDINA web site. This utility is used to create standards compliant dataset metadata for upload into catalogues, eg, GoGeo! so that the data can be discovered, evaluated and possibly reused. Note that you need to have UK Access Management credentials to use GeoDoc.

Users that we have identified so far consist of academics, researchers and students within IGES in Aberystwyth University, and from the Centre for Catchment and Coastal Research (CCCR) which is a consortium of Aberystwyth University and Bangor University. Users will also include researchers from Forestry Research in Wales, Forestry Commission and staff from the Countryside Council for Wales (CCW). Within these bodies individuals have been identified and we will develop these as user case studies. We are currently collating their data sets and identifying their relevant uses and needs.

In the following weeks we will collate and input data sets some of which are complete whilst others are work in progress. These data sets will come from the individual user case studies. The user case studies will be something like the following:
• IGES Academic/Researcher
• IGES/CCCR Academic/Researcher
• IGES MSc Student
• IGES PhD Student
• IGES Digital Map Librarian
• Forestry Research Researcher
• CCW Senior Reserve Warden for Dyfi Biosphere Area

A ‘shopping list’ of data sets that are either not currently available to these users (and which they would like access to) or are difficult to find will also be identified and collated. Already we have had requests for biogeochemical data sets from IGES/CCCR, and for remote sensing data sets from Forest Research. It is hoped that Welsh Assembly Government may be able to help with some of these data and that, even if their use is restricted, we may be able to offer access to using web services secured using Shibboleth (the software underlying the UK Access Management Federation).

So far we have identified from the academic/researcher evidence that both academic staff and students would find the Web Map Service (WMS) “factory” application useful as a research and teaching tool. It has also been suggested by one of the academic users that an undergraduate module could be developed around the use of open geospatial standards. It was agreed that using the GeoDoc metadata input facility would generally improve data management practice for research projects.

Any comments from the user case study individuals or other potential users would be much appreciated to ensure the relevant uses and needs of all involved in this project are identified. The information will feed into the development of the mapping application and the identification of future requirements.

 Posted by at 15:05 User Reqs Tagged with: , , , , , , , , , , , , , , , , ,  Comments Off on Collation of data sets