Previous PageTable Of ContentsNext Page

Co-ownership of agricultural research data – a difficult issue in the digital age

J.M. Scott

Agronomy and Soil Science, University of New England, Armidale, NSW.

ABSTRACT

Compiling large data sets derived from multiple sources and agencies is becoming easier, but there are difficulties regarding ownership of, and providing continuing access to, data. In spite of these difficulties, the benefits to be gained from enabling long-term data access are considerable. The challenge is to provide such access without sacrificing the rights of the owners of that data, especially regarding their publication rights. A model is proposed whereby multiple agencies and scientists could share in the copyright of a co-owned database which provides services by adding value to the data and ensuring its continuity beyond the end of a particular funded project. Such a structure would benefit the end users of the data and society through maximising the retention of Australia’s intellectual capital, whilst avoiding the need to 're-invent the wheel' should the data format become obsolete.

KEY WORDS

Relational database, copyright, data sharing, intellectual property.

INTRODUCTION

Recent advances in data capture, combined with the availability of powerful computers, have resulted in a rapid increase in the generation of experimental data. The availability of relational database technology on personal computers has provided most scientists with ample computer capacity to collect and store large data sets on desktop computers. When a group of such scientists work collaboratively over a number of years, the volume and complexity of data collected increases dramatically. Hence there is a compelling argument for more attention to be paid to data collection, storage and access methodologies in such projects. After a considerable investment has been made in a large relational database, questions inevitably arise such as: how can or should the database be managed to prevent obsolescence; how can the value of the data be maximised; and who should own the collection of data? Of course, concerns about data security, integrity and ownership are not only relevant to science - they are also being addressed by corporate institutions (4).

Many scientists are not concerned about 'owning' their data provided they are first able to publish it in the scientific press, and that they and their supporting agencies are acknowledged when any of the data are used or cited. Nevertheless, from a national point of view, data may have considerable value beyond the life of a particular project, especially if the data are available to be re-worked by others and/or when technological changes open up new opportunities.

If data are made available to any outside a particular research project, there is a need to ensure that the integrity of data has been maintained, that any access granted to that data is permitted by those who contributed to or assembled the data as well as by those charged with administering access to it, and that the data are intelligible to the end user. This paper considers the opportunities that exist to add value to experimental data and to create a mechanism to allow it to be maintained indefinitely for future use. An example is provided of a series of relatively large relational databases over 6 sites in southern Australia within the National Experiment of the Sustainable Grazing Systems (SGS) Key Program (1). This is a large project funded over 4 years by a wide range of research funders and involving many agencies and scientists collecting a wide array of field and laboratory data relating to the sustainability of grazed pasture systems.

Need for relational databases

In most biophysical research, data are collected in different formats and with many attributes such as treatment, replicate, time, methodology, etc. Many of these data items are related through 'one-to-many' relationships and hence are most suited to storage in an appropriate structure such as a relational database. When a relational database is implemented correctly, the raw data reside in only one place in the database – usually within one of many related tables. Storage of a data item only once also minimises the possibility of transcription errors. Databases also allow for the 'cascading' of any changes made to raw data as subsequent queries will always interrogate the most recent raw data. Many such tables of data are linked using relationships between key index fields. All the data from a large experiment over many years can be readily contained within a single database file, which can contain not only all the data but also any related textual qualifying information. This greatly facilitates efficient data storage, retrieval and interpretation.

EXPERIENCE GAINED FROM THE SGS NATIONAL EXPERIMENT

Design of a large relational database

The relational database created for the SGS National Experiment using Microsoft Access is used as an example. This software enabled the aims of the project to be more readily achieved by providing:

  • ‘seamless’ links which can be made between the database and various spreadsheet, word processing and text formats,
  • data entry either directly into tables or forms in the database or imported from other files such as spreadsheets,
  • a facility within the relational database to facilitate quality checking of any data,
  • qualifying information associated with the data to ensure that the methodology and source of the data are properly attributed and to help avoid the data being mis-interpreted by an end user and
  • enhanced database functionality by creating a flexible graphing facility that enables the user to interrogate any data using ‘point-and-click’ commands. Graphs produced in this way can then be readily copied to other software programs for report writing, creation of presentations, etc.

Since the database structure is identical for each of the 6 experimental sites, across-site queries can theoretically be readily executed. This is important functionally, as one of the main aims of the SGSKP was to develop principles from major ‘themes’ (water, nutrients, animals, etc.) that have general applicability over a range of sites.

  • Concerns about data sharing

The major concerns about data sharing for those involved in the collection and analysis of data from individual SGS sites include: lack of acknowledgment, use of the data for unintended purposes, misinterpretation of data, publication and use of the data without the knowledge of the data owner(s), and publication of the data in such a way as to prevent subsequent publication by the person who owns the data (P. Sanford, A. Ridley, G. Lodge, D. Kemp – pers. comm). To overcome at least some of these concerns, the collective SGS research group developed a protocol and a publications committee to oversee all publications arising from this project (W. Mason – pers. comm.). Although there is no legally binding imperative, all those involved in the project have accepted this as a satisfactory solution to overcoming problems relating to publishing. Special mention is warranted of the rights of postgraduate students who might be contributing to the database and yet need primary ownership of that data until the publication of their thesis.

Whilst it has been easy for most to recognise the power of interrogating data from across sites in order to consider regional sustainability issues, it has proved considerably more difficult to resolve who might 'own' or have custody of the collective data and how access to the data might be provided after the end of this project.

Use of data for across-site modelling within SGS

Modelling is an integral part of SGS and there is a need for data to be extracted from one or more sites to enable comparisons with model performance. So far, this has been possible because of the existence of the agreed publishing protocol and the goodwill among team members. Nevertheless, data have only been made available to modelling efforts within the project on an individual site basis. Ultimately, it may be desirable to provide access to the data across the sites and for use by models outside SGS. An acceptable mechanism is needed before such access can be provided.

GENERAL ISSUES OF COPYRIGHT, INTELLECTUAL PROPERTY AND ACCESS

Who currently owns the copyright?

For any data that is collected, the individual data points are not subject to copyright. However, copyright does cover the expression of the data (3); currently, copyright ownership resides with individual scientists and their employing organisations except in some cases, where the copyright may be jointly owned with the funding agencies. Where there are a number of institutions/funding bodies that may have some claim to joint ownership of the databases it is difficult to develop a simple scheme to co-own copyright and coordinate data access into the indefinite future. Because value has been added to the original data, the relational database represents a unique compendium of data and interrogation tools that is more than the sum of its parts. This means that, in addition to the copyright that exists in the expression of the data owned by individual scientists and their agencies, copyright is also permissible over the combined database. The problem is that it is difficult to find a mechanism that enables such copyright to be jointly owned.

Does the relational database represent Intellectual Property?

In recent years, there has been an increasing focus on intellectual property by research bodies and especially the legal aspects of database ownership (2, 3). Today, most funding agencies and employing institutions insist that any potential intellectual property be identified and brought to their attention in order that it might be protected. However, in cases involving numerous research agencies and funding bodies, it is difficult to arrive at a common position on Intellectual Property.

Apart from the direct value of large data sets to end users, some features of which may not be anticipated during the life of a project, it is likely that there will be value in retaining the data indefinitely, if only to avoid the need for a repeated experimental investigation at some later date. Ultimately it is inefficient for society to allow for research to be repeated merely because corporate memory of past research might have lapsed. In the case of large, expensive and complex national research projects, it is unlikely that they will ever be repeated and hence one needs to ask, should the data be retained? If so, who should ‘own’ the data? And who should pay for the maintenance of the data and for providing access to it?

Funding of data access

In some cases, the collection of data has been conducted by publicly funded agencies, part of whose charter is to make that data freely available to the end user. However, in other cases, research institutions are funded to carry out the research over a limited time horizon with no funding provided to ensure that the data are widely available beyond the time frame of the project. If data access it to be provided, there need to be mechanisms put in place that protect data from unauthorised distribution and/or wilful or accidental damage. There also needs to be mechanisms for funding any body which act as the 'gatekeeper' or controller of access to the databases.

Features required in a co-owned database

It is important to recognise that data sharing arrangements normally only occur when there is substantial goodwill between participating scientists. Any breakdown of this goodwill, whether through deliberate or accidental means, will result in barriers to future collaboration and reduce the potential benefits of data sharing.

The issue of data quality is of special significance if data are to be relied upon by many end users over an extended period. Gaps or errors in data need to be detected as soon as possible so that error correction is possible. Also, because data are now commonly uploaded from data loggers in quantities that are difficult to check manually, some form of computer-assisted quality assurance is required to detect systematic errors or malfunctions in equipment.

Scientific research requires ways of storing and distributing data to a wide array of potential users. In an environment where computer software and hardware change frequently, there is a high risk that if not maintained, data will become obsolete as changes in software occur (2). Also, without maintenance, the likelihood of data being damaged, either deliberately or accidentally, increases over time. Another reason for continuing to add value to long-term data sets is such enhancements will gradually increase the utility of the stored data to more end users. If the data are to be made available to end users who are not familiar with the original experiments, it is crucial that the interrogation tools and data quality is sufficiently clear and unambiguous that the data can be comprehended without intervention (2).

Although many scientists would prefer it if data access could be provided without the need for any legal agreements, some form of legal standing is required if the scientists' rights are to be protected. Also, if a fee is to be charged to enable continuing access to maintained data sets, then it is imperative that the data be legally owned by an entity that can administer and account for that charge.

If complex multi-party agreements are to be avoided, then a simple ownership entity is required. This could be achieved if copyright owners of the fundamental data sets assigned their copyright to a single entity, perhaps a company in which the contributors to the database own shares. Having a single entity that administers the data would greatly simplify the task of assembling the data and providing access to it.

Whilst it may be unlikely that the data or the relational database(s) itself will be of great financial value, provision of data access to any parties, other than the funding bodies and the research organisations contributing to the data, should be at a cost at least sufficient to enable maintenance of the database(s) into the future. Although full and open access may be provided to some parties, it needs to be acknowledged that there is a cost associated with maintaining and supplying data, regardless of the mode of delivery. The suggestion of Reichman and Uhlir (3) that ownership of data could impede scientific discovery is legitimate if the data are to be owned and controlled by a private corporation which aims at making profit from the sale of data. However, such a risk needs to be balanced by concern for loss of the data should no funded mechanism for retaining the data be found.

It is likely that pricing of data would vary according to the end user. If the data are requested by the agency that contributed it, then it may be provided at the lowest cost. However, once a project has ended, access to the data would need to be at a cost at least equivalent to the cost of maintaining supply and access to the data. For library uses of the data, a ‘fair use’ provision would be needed which would permit limited distribution for research and education purposes only. Where end-users of data are commercial clients who may gain financially from use of the data, then access to the data should be provided at commercial rates which would depend on the amount of data accessed relative to the entire database, the type of data and their cost of collection, and the value of the entire database.

These databases could increasingly be made available via the Internet with access privileges managed by a body authorised to manage this ‘gatekeeping’ function. Ultimately, vast data libraries may be managed as repositories of past research enabling future reworking of old data sets. As more of science becomes incorporated into complex models, the value of maintaining quality empirical data will increase. Historic data such as that collected from long-term trials will be of special value. Making such data sets more widely available will increase their utility to end-users and ultimately will enhance the quality of research outcomes.

ACKNOWLEDGMENTS AND DISCLAIMER

Support for the creation of the relational databases from the Meat and Livestock Australia and its associated funding agencies, and of the scientists from the six sites of the SGS National Experiment is gratefully acknowledged. Mr. Colin Lord, University of New England undertook the successful implementation of the SGS database. The views expressed here are those of the author and are not claimed to represent the views of the funding agencies or scientists within the SGS Key Program.

REFERENCES

1. Mason, W. and Andrew, M. 1998. Sustainable grazing systems (SGS) – developing a national experiment. Proceedings 9th Australian Agronomy Conference, Wagga Wagga, pp. 314-317.

2. National Research Council. 1997. Bits of Power: Issues in Global Access to Scientific Data. (National Academy Press, Washington, DC) (http://www.nap.edu). 235 pp.

3. Reichman, J.H. and Uhlir, P. 1999. Database Protection at the Crossroads: Recent Developments and Their Impact on Science and Technology. Berkeley Technology Law Journal 14, 793-838.

4. Verton, D. 1998. Managing corporate knowledge. Federal Computer Week. November 2, 1998, 30-31.

Previous PageTop Of PageNext Page