The anonymisation of personal data is an essential aspect of good scientific practise. According to the BDSG (German data protection law) § 3, paragraph 6 anonymisation refers to the practise of changing personal data in such a way that "individual characteristics cannot, or only by use of a disproportionate amount of time, cost and effort, be attributed to a particular natural person." Another way of protecting personal or sensitive data is pseudonymisation.
An archive is a system which makes the organised storage and retrieval of historical data, documents and objects possible. The way its contents are organised depends on the underlying policy. Archives can be provided as a service or set up and operated independently. For long-term preservation of 10 years and more special archiving systems are required. A particular form of archive is a repository.
German copyright law regulates the use of literary, artistic and scientific works which fall under its specifications. If authors of such works do not grant users further usage rights by using a license such as Creative Commons, re-use is only possible within the restrictive limits of German copyright law.
Whether research data are subject to copyright or not depends on whether the threshold of originality is reached or whether the data fall under database protection law ("sui generis"). If in doubt as to whether either of these laws apply, it is best to consult a specialist attorney.
In order to ensure maximum reusability of scientific research data (which may be protected by copyright law) authors should consider granting more usage rights by choosing a less restrictive license. Licensed data are usually reused and cited more which can lead to better visibility and reputational gains for the data author even beyond their own research community.
Creative Commons Licences
In order to ensure maximum reusability of scientific reserach data, which might be subject to copyright law, the additional allocation of usage rights by using a suitable license should be considered. One possibility of determining reuse conditions of published research data is the use of liberal licensing models such as the widely accepted Creative Commons (CC) model.
Data journals are publications with the main goal of providing access to data sets. In general, they aim to establish research data as an academic achievement in their own right and to facilitate their reuse. Moreover, they attempt to improve the transparency of academic methods and processes and the associated research results, support good data management practises and provide long-term access to data.
Data Management Plan
A data management plan systematically describes how research data are managed within research projects. It documents the storage, indexing, maintenance and processing of data. A data management plan is essential in order to make data interpretable and re-usable for third parties. It is therefore recommended to assign data management responsibilities before the start of a project.The following questions can serve as an orientation:
- Which data will be generated and used within the project?
- Which data have to be archived at the end of a project?
- Who is responsible for the indexing of meta data?
- For what period of time will the data be archived?
- Who will be able to use the data after the end of the project and under which licensing conditions?
Data Protection Law
The term data protection refers to technical and organisational measures to prevent the misuse of personal data. Misuse is defined as gathering, processing or using such data in an unauthorised way. Data protection is regulated by the European data protection directive 95/46/E, by the German Federal Data Protection Act as well as the corresponding laws on state level, for example the Data Protection Act of Baden-Württemberg.
Personal data are gathered and used especially in medical and social science studies. It is mandatory to encode/encrypt data of this kind and store them in an especially secure location. Subsequent pseudonymisation and anonymisation can ensure that individuals cannot be identified which can make a publication of these kinds of data possible.
A digital artefact is the end result of the proces of digitalisation during which an analog object (a book, manuscript, picture, sculpture etc.) is transformed into digital values in order to store it electronically. As opposed to an analog object, a digital artefact can be distributed in the form of digital research data and machine-processed. Another advantage of working with digital artefacts is that further alteration or damage to sensitive analog objects can be avoided.
The DINI certificate is a widely recognised quality seal for repositories. It guarantees a high quality service for authors, users and funders. It indicates that open access standards, guidelines and best practises have been implemented. The 2013 version can also be used to certify that all services maintained by a hosting provider comply with certain miminum requirements from the criteria catalogue. These criteria are marked as Dini-ready for the hosting provider and don’t have to be certified separately during the certification process.
The term FAIR (Findable, Accessible, Interoperable und Reusable) Datawas coined by the FORCE11-Community for sustainable research data management in 2016. It is the main goal of the FAIR data principles to promote professional management of research data in order to make them more findable, accessible, interoperable and reusable. The FAIR principles were adopted by the European Commission and integrated into the Horizon 2020 funding guidelines.
Good Scientific Practice
The rules of good scientific practise serve as an orientation for scientific research and academic workflows. In Germany such a set of rules can be found in recommendation 7 by the German Research Foundation (DFG). It stipulates that "primary data as the foundation for academic publications should be stored for a minimum of 10 years on a secure and stable medium at the institution where they were created". This is meant to ensure the reproducibility of research results. Publishing data also facilitates the reuse of research data.
The aim of long-term preservation ist to ensure access to archived data over a long period of time. The limited durability of storage media, technological change and safety risks complicate this task which is why extensive and forward-thinking planning is necessary. In order to avoid data loss and ensure long-term data recall, a suitable archiving system (meta data, structure) has to be employed. During the planning stage different aspects like IT infrastructure, hardware and software have to be considered. Additionally, societal developments should also be taken into account.
Meta data are independent data which contain structured information about other data and/or ressources and their characteristics. Meta data are stored either independently of or together with the data they describe. An exact defintion of meta data is difficult since the term is being used in different contexts and distinctions can vary according to perspective.
Usually there is a distinction between discipline-specific and technical/administrative meta data. Whereas the latter are definitely considered to be metadata, the former might also be viewed as research data.
In order to raise the effectiveness of meta data, a standardisation of descriptions is necessary. By using meta data standards, meta data from different sources can be linked and processed together.
The term open access refers to free and unimpeded access to digital scientific content. Users are usually given a wide range of usage rights and provided with easy modes of access. The copyright, however, generally remains in the hands of the author. Through open access scientific information can be widely disseminated, used and re-processed. As such it represents an important achievement of the open science movement.
When publishing scientific content, there are two open access options:
- Publishing the content in a genuine open access medium is referred to as the „golden path“ of open access.
- Publishing the content in a traditional, subscription-based medium with an open access version paid for by the author is called the “green path”.
Persistent identification is the process of assigning a permanent, digital identifier consisting of numbers and/or alphanumerical characters to a data set (or any other digital object).
Frequently used identification systems are DOI (Digital Object Identifier) and URN (Uniform Resource Name). As opposed to other serial identifiers (such as URL addresses) a persistent identifier refers to the object itself rather than to its location on the internet. Even if the location of a persistently identified object changes, the identifier remains the same. All that needs to be changed is the URL location in the identification database. In this way it can be ensured that data sets are permanently findable, retrievable and citable.
German data protection law (BDSG) defines personal data as „information on personal characteristics or circumstances of a particular natural person (affected party)." Data are considered personal if they can be attributed to a particular natural person. Typical examples are name, profession, height or nationality of a person. German data protection law also stipulates that information on ethnicity, political opinion, religious or philosophical affiliation, union membership, health and sexuality are especially sensitive and therefore subject to even stricter protection.
Policies and Guidelines
Policies establish certain rules for the handling and managing of reasarch data for all employees of a research institution. They usually also determine which methods of research data management should be applied. In Germany most research data policies do not contain detailed regulations, but instead usually consist of a basic self-commitment to the principles of open access.
Primary research data are unprocessed and uncommented raw data which have not yet been associated with any meta data. They form the foundation of all scientific activity. The distinction between reserach data and primary research data usually only has theoretical merit because raw data are hardly ever published without any associated meta data. Digital artefacts are generally not published by their proprietors (such as scientific libraries) without background information such as provenance and other information.
As opposed to anonymisation, the technique of pseudonymisation simply substitutes letter and/or number codes for identifiyng charcateristics such as names in order to impede or ideally prevent any individuals from being identified (BDSG § 3, paragraph 6a). During the course of a scientific study the reference list of personal data and its associated code should be kept separate from the actual study data. An anonymisation of data can be achieved by deleting this reference list after the completion of the project so that no individual person can be connected to the study results.
A repository can be viewed as a particular kind of archive. In the digital age it refers to an administrated storage space for digital objects. Since repositories are generally publically accessible or at least accessible to a specific group of users it is closely connected to the issue of open access.
Data that are a) created through scientific processes/research (for example through measurements, surveys, source work), b) the basis for scientific research (for example digital artefacts), or c) documenting the results of research, can be called research data.
This means that research data vary according to projects and academic disciplines and therefore, require different methods of processing and management, subsumed under the term research data management. There is also a distinction between primary data and meta data, however, the latter do not strictly count as research data in many disciplines.
Research Data Management
The term research data management refers to the process of transforming, selecting and storing research data with the aim of making them accessible, re-usable and reproducible independently from the data author for a long period of time. To achieve that aim systematic actions can be taken at all points in the data life cycle in order to maintain the scientific value of research data, ensure their accessibility for analysis by third parties and to secure the chain of evidence.
Threshold of Originality
When an object or work is created the threshold of originality is a measurement of the degree to which it incorporates personal characteristics of its author. Whether a work reaches this threshold of originality is a decisive criterion for its protection by German copyright law . An important aspect of the threshold of originality is that the work is a result of its author's creativity and personality rather than an outcome of external circumstances (objective, functionality, objectivity etc.). This is why research data very rarely fall under German copyright law.