Logo


Data Management Plan of the H2020 Project Example Project


Action Number:

action number or funding programme name

Action Acronym:

Example Project add an acronym

Action Title:

Example Project

Creation Date:

xxxx-xx-xx

Modification Date:

xxxx-xx-xx

DMP version:

1.0


1    Introduction

Example Project is part of the Open Data Initiative (ODI) of the EU. To best profit from open data, it is necessary not only to store data but to make data Findable, Accessible, Interoperable, and Reusable (FAIR). We support open and FAIR data, however, we also consider the need to protect individual data sets.

This document outlines the principles of data management for Example Project. A Data Management Plan (DMP), created using responses to the EU DMP questionnaire, will detail key aspects such as data types, collection methods, storage, access, sharing, preservation, and reuse strategies, ensuring compliance with FAIR data principles.

The detailed DMP states how data will be handled during and after the project. The Example Project DMP is prepared according to the Horizon 2020 and Horizon Europe online manual. It will be updated/its validity checked during the Example Project project several times. At the very least, this will happen at month Example Month.

2    Data Management Plan EU Template

2.1    Data Summary

What is the purpose of the data collection/generation and its relation to the objectives of the project?

Example Project aims at Example Aim. Therefore, data collection and integration , integration and visualization using the DataPLANT Annotated Research Context (ARC) structure are essential, through a standardized data management process is absolutely necessary, because the data are used not only to understand principles, but also be informed about the provenance of data analysis information. Stakeholders must also be informed about the provenance of data. It is therefore necessary to ensure that the data are well generated and also well annotated with metadata using open standards, as laid out in the next section.

What types and formats of data will the project generate/collect?

Example Project will collect and/or generate the following types of raw data: genetic data, genomic data, Pangenomic data, cloned DNA data, transcriptomic data, spatial transcriptomic data, RNAseq data, single cell RNAseq data, Metabolomic data, proteomic data, phenotypic data, targeted assays (e.g. glucose and fructose content), image datasets, modelling data, computational code, Other data type and other types of data which are related to Example Topic. In addition, the raw data will also be processed and modified using analytical pipelines, which may yield different results or include ad hoc data analysis parts. These pipelines will be tracked in the DataPLANT ARC. Therefore, care will be taken to document and archive these resources (including the analytical pipelines) as well, relying on the expertise in the DataPLANT consortium . Example Project will use the following data format: other data formats FASTQ, FAST5, FASTA, BCL, SAM/BAM, VCF/BCF, CRAM, GBK, EMBL, GFT/GTF, MZML, MGF, mzIdentML, mzQuantML, pepXML, RAW, imzML, CDF, XLSX, TXT, CSV/TSV/PSV, PDF, JSON, XML/HTML, .

Will you re-use any existing data and how?

The project builds on existing data sets. For instance, without a proper genomic reference it is very difficult to analyze NGS data sets. For genetic data, genomic data, Pangenomic data, cloned DNA data, transcriptomic data, spatial transcriptomic data, RNAseq data, single cell RNAseq data, Metabolomic data, proteomic data, phenotypic data, targeted assays (e.g. glucose and fructose content), image datasets, modelling data, computational code, Other data type data of Example Topic, existing data sets of the partners: partner name as well as additional characterizations and background knowledge from prior publications will be used. Genomic references can simply be gathered from reference databases for genomes/sequences, like the National Center for Biotechnology Information: NCBI (US); European Bioinformatics Institute: EBI (EU); DNA Data Bank of Japan: DDBJ (JP).

What is the origin of the data?

Public data will be extracted as described in the previous paragraph. For Example Project, specific data sets will be generated by the consortium partners.

Data of different types or representing different domains will be generated using unique approaches. For example:

     
  • Genetic data will be generated from targeted crosses and in breeding experiments, and will include recombination frequencies and position of genetic markers. This data will be used to associate quantitative trait loci with physical genomic markers/variants.
  •  
  • Genomic data will be created from sequencing data, which will be processed to identify genes, regulatory elements, transposable elements, and physical markers such as SNPs, microsatellites and structural variants.
  •  
  • Pangenomic data will be collected by sequencing the genomes of multiple individuals within a clade. Then the the sequences are assembled and aligned to create a comprehensive gene reference.
  •  
  • The origin and assembly of cloned DNA will include (a) source of original vector sequence with adding gene reference where available, and source of insert DNA (e.g., amplification by PCR from a given sample, or obtained from existing library), (b) cloning strategy (e.g., restriction endonuclease digests/ligation, PCR, TOPO cloning, Gibson assembly, LR recombination), and (c) verified DNA sequence data of final recombinant vector.
  •  
  • Methods of transcriptomics data collection will be selected from microarrays, quantitative PCR, Northern blotting, RNA immunoprecipitation, fluorescence in situ hybridization. RNA-Seq data will be collected in seperate methods.
  •  
  • Spatial transcriptomics data will be collected using methods that spatially map RNA molecules to their precise tissue locations, ensuring the preservation of RNA data alongside comprehensive metadata about its origin.
  •  
       
  • RNA sequencing data will be generated using short-read or long-read platforms, either in-house or outsourced to academic facilities or commercial services, and the raw data will be processed using established bioinformatics pipelines.
  •  
  • Single-cell RNA-seq data will be collected by isolating single cells, extracting and barcoding RNA, preparing sequencing libraries, and generating high-quality transcriptomic data using platforms like Illumina, with meticulous metadata recording.
  •  
       
  • Metabolomic data will be generated by coupled chromatography and mass spectrometry using targeted or untargeted approaches.
  •  
  • Proteomic data will be generated using coupled chromatography and mass spectrometry for the analysis of protein abundance and protein identification, as well as additional techniques for structural analysis, the identification of post-translational modifications and the characterization of protein interactions.
  •  
  • Phenotypic data will be generated using phenotyping platforms and annotated using corresponding ontologies, including number/size of organs such as leaves, flowers, buds etc., size of whole plant, stem/root architecture (number of lateral branches/roots etc), organ structures/morphologies, quantitative metrics such as color, turgor, health/nutrition indicators, among others.
  •  
  • Targeted assay data (e.g. glucose and fructose concentrations or production/utilization rates) will be generated using specific equipment and methods that are fully documented in the laboratory notebook.
  •  
  • Image data will be generated by equipment such as cameras, scanners, and microscopes combined with software. Original images which contain metadata such as EXIF photo information will be archived.
  •  
  • Model data will be generated by using software simulations. The complete workflow, which includes the environment, runtime, parameters, and results, will be documented and archived.
  •  
  • Computer code will be produced by programmers.
  •  
  • Other data type
 

Data from previous projects such as Previous Project Name will be considered.

What is the expected size of the data?

We expect to generate raw data in the range of ??? GB of data. The size of the derived data will be about ??? GB.

To whom might it be useful ('data utility')?

The data will initially benefit Example Project partners, but will also be made available to selected stakeholders closely involved in the project, and then the scientific community working on Example Topic. Industry, politicians and students can also use the data for different purposes. In addition, the general public interested in Example Topic can also use the data after publication. The data will be disseminated according to Example Project's dissemination and communication plan, which aligns with DataPLANT platform or other means .

2.2    FAIR data

Making data findable, including provisions for metadata

Are the data produced and/or used in the project discoverable with metadata, identifiable and locatable by means of a standard identification mechanism (e.g. persistent and unique identifiers such as Digital Object Identifiers)?

All datasets will be associated with unique identifiers and will be annotated with metadata. We will use Investigation, Study, Assay (ISA) specification for metadata creation. Example Project will rely on community standards plus additional recommendations applicable in the plant science, such as the The following metadata/ minimum information standards will be used to collect metadata: MIxS (Minimum Information about any (X) Sequence), MigsEu (Minimum Information about a Genome Sequence: Eucaryote), MigsOrg (Minimum Information about a Genome Sequence: Organelle), MIMS (Minimum Information about Metagenome or Environmental), MIMARKSSpecimen (Minimal Information about a Marker Specimen: Specimen), MIMARKSSurvey (Minimal Information about a Marker Specimen: Survey), MISAG (Minimum Information about a Single Amplified Genome), MIMAG (Minimum Information about Metagenome-Assembled Genome), MINSEQE (Minimum Information about a high-throughput SEQuencing Experiment), MIAME (Minimum Information About a Microarray Experiment), REMBI (Recommended Metadata for Biological Images), MinSEQe (Minimum Information about a high-throughput Sequencing Experiment), MIAMET (Minimum Information About a METabolomics experiment), MIAPE (Minimum Information About a Proteomics Experiment), MIMix (The Minimum Information required for reporting a Molecular Interaction Experiment), MIAPPE (Minimum Information about Plant Phenotyping Experiment) . The Metabolights submission compliant standards are used for metabolomic data. <-- Please pay attention to the sentence before. Some metabolomics partners considers Metabolights not an accepted standard. These specific standard unlike cross-domain minimal sets such as the Dublin core, which mostly define the submitter and the general type of data, allow reusability by other researchers by defining properties of the plant (see the preceding section). However, Example Project also implement minimal cross-domain annotations such as Dublin Core, Darwin Core, Schema.org, BioSchemas, MARC 21 . The core integration with DataPLANT will also allow individual releases to be tagged with a Digital Object Identifier (DOI). Other standards such as other standards are also adhered to. The metadata standards will thus allow the integration of data across projects and safegard the established and tested protocols being reused. Additionally, we will use ontology terms to enrich the data sets relying on free and open ontologies. In addition, additional ontology terms might be created and be canonized during the Example Project.

What naming conventions do you follow?

Will search keywords be provided that optimize possibilities for re-use?

Keywords about the experiment and the general consortium will be included, as well as an abstract about the data, where useful. In addition, certain keywords can be auto-generated from dense metadata and its underlying ontologies. Here, DataPLANT strives to complement these with standardized DataPLANT ontologies that are supplemented where the ontology does not yet include the variables.

Do you provide clear version numbers?

To maintain data integrity and facilitate reanalysis, data sets will be allocated version numbers where this is useful (e.g. raw data must not be changed and will not get a version number and is considered immutable). This is automatically supported by the ARC Git DataPLANT infrastructure.

What metadata will be created? In case metadata standards do not exist in your discipline, please outline what type of metadata will be created and how.

All datasets will be associated with unique identifiers and will be annotated with metadata. We will use Investigation, Study, Assay (ISA) specification for metadata creation. Example Project will rely on community standards plus additional recommendations applicable in the plant science, such as the The following metadata/ minimum information standards will be used to collect metadata: MIxS (Minimum Information about any (X) Sequence), MigsEu (Minimum Information about a Genome Sequence: Eucaryote), MigsOrg (Minimum Information about a Genome Sequence: Organelle), MIMS (Minimum Information about Metagenome or Environmental), MIMARKSSpecimen (Minimal Information about a Marker Specimen: Specimen), MIMARKSSurvey (Minimal Information about a Marker Specimen: Survey), MISAG (Minimum Information about a Single Amplified Genome), MIMAG (Minimum Information about Metagenome-Assembled Genome), MINSEQE (Minimum Information about a high-throughput SEQuencing Experiment), MIAME (Minimum Information About a Microarray Experiment), REMBI (Recommended Metadata for Biological Images), MinSEQe (Minimum Information about a high-throughput Sequencing Experiment), MIAMET (Minimum Information About a METabolomics experiment), MIAPE (Minimum Information About a Proteomics Experiment), MIMix (The Minimum Information required for reporting a Molecular Interaction Experiment), MIAPPE (Minimum Information about Plant Phenotyping Experiment) . The Metabolights submission compliant standards are used for metabolomic data. <-- Please pay attention to the sentence before. Some metabolomics partners considers Metabolights not an accepted standard. These specific standard unlike cross-domain minimal sets such as the Dublin core, which mostly define the submitter and the general type of data, allow reusability by other researchers by defining properties of the plant (see the preceding section). However, Example Project also implement minimal cross-domain annotations such as Dublin Core, Darwin Core, Schema.org, BioSchemas, MARC 21 . The core integration with DataPLANT will also allow individual releases to be tagged with a Digital Object Identifier (DOI). Other standards such as other standards are also adhered to. The metadata standards will thus allow the integration of data across projects and safegard the established and tested protocols being reused. Additionally, we will use ontology terms to enrich the data sets relying on free and open ontologies. In addition, additional ontology terms might be created and be canonized during the Example Project.

Making data openly accessible

Which data produced and/or used in the project will be made openly available as the default? If certain datasets cannot be shared (or need to be shared under restrictions), we explain why, clearly separating legal and contractual reasons from voluntary restrictions.

By default, all data sets from Example Project will be shared with the community and made openly available. However, before the data are released, all will be provided with an opportunity to check for potential IP (according to the consortium agreement and background IP rights). This applies in particular to data pertaining to the industry. IP protection will be prioritized for datasets that offer the potential for exploitation.
Note that in multi-beneficiary projects it is also possible for specific beneficiaries to keep their data closed if relevant provisions are made in the consortium agreement and are in line with the reasons for opting out.

How will the data be made accessible (e.g. by deposition in a repository)?

Data will be made available via the Example Project platform using a user-friendly front end that allows data visualization. Besides this it will be ensured that data which can be stored in international discipline related repositories which use specialized technologies and preserve data for more than 10 years:

For genetic or genomic data: NCBI-GenBank, EBI-ENA, NCBI-SRA (Sequence Read Archive), EBI-ArrayExpress, NCBI-GEO (Gene Expression Omnibus), .

For Transcriptomic data: NCBI-SRA (Sequence Read Archive), NCBI-GEO (Gene Expression Omnibus), EBI-ArrayExpress, .

For image data: EBI-BioImage Archive, IDR (Image Data Resource), .

For metabolomic data: EBI-Metabolights, Metabolomics Workbench, IntAct (Molecular interactions), .

For proteomics data: EBI-PRIDE (PRoteomics IDEntifications Database), PDB (Protein Data Bank), Chebi (Chemical Entities of Biological Interest), .

For phenotypic data: e!DAL-PGP (Plant Genomics & Phenomics Research Data Repository), . Other repositories will also be used to store data and the data will be processed there as well.

For unstructured and less standardized data (e.g., experimental phenotypic measurements), these will be annotated with metadata and if complete allocated a digital object identifier (DOI). Whole datasets will also be wrapped into an ARC with allocated DOIs. The ARC and the converters provided by DataPLANT will ensure that the upload into the endpoint repositories is fast and easy.

What methods or software tools are needed to access the data?

Example Project relies on the tool(s) Proprietary Software. No specialized software will be needed to access the data, usually just a modern browser. Access will be possible through web interfaces. For data processing after obtaining raw data, typical open-source software can be used. DataPLANT offers opensource data curation tools such as the ARC management tool ARCitect , command line tool ARCcommander , DataPLANT Biological Ontology (DPBO), metadata annotation tool Swate, the Metadata Quiz and DataPLAN DMP generator.

Is documentation about the software needed to access the data included?

DataPLANT resources are well described, and their setup is documented on a github project guide is provided on the GitHub project pages. All external software documentation will be duplicated locally and stored near the software.

Is it possible to include the relevant software (e.g. in open-source code)?

As stated above, Example Project will use publicly available open-source and well-documented certified software, except for Proprietary Software as proprietary softare .

Where will the data and associated metadata, documentation and code be deposited? Preference should be given to certified repositories that support open access, where possible.

As noted above, specialized repositories will be used for common data types. For unstructured and less standardized data (e.g., experimental phenotypic measurements), these will be annotated with metadata and if complete allocated a digital object identifier (DOI). The Whole datasets will also be wrapped into an ARC with allocated DOIs.

Have you explored appropriate arrangements with the identified repository?

The submission is for free, and it is the goal (at least of ENA) to obtain as much data as possible. Therefore, arrangements are neither necessary nor useful. Catch-all repositories are not required. For DataPLANT, this has been agreed upon. <-- Please pay attention to the sentence before. If no data management platform such as DataPLANT is used, then you need to find appropriate repository to store or archive your data after publication.

If there are restrictions on use, how will access be provided?

There are no restrictions beyond the IP screening described above, which is in line with European open data policies.

Is there a need for a data access committee?

There is no need for a data access committee.

Are there well described conditions for access (i.e. a machine-readable license)?

Yes, where possible, e.g. CC REL will be used for data not submitted to specialized repositories such as ENA.

How will the identity of the person accessing the data be ascertained?

In case data is only shared within the consortium, if the data is not yet finished or under IP checks, the data is hosted internally and username and password will be required (see also our GDPR rules). In the case data is made public under final EU or US repositories, completely anonymous access is normally allowed. This is the case for ENA as well and both are in line with GDPR requirements. Currently, data management relies on the annotated research context ARC. It is password protected, so before any data can be obtained or samples generated an authentication needs to take place.

Making data interoperable

Are the data produced in the project interoperable, that is allowing data exchange and re-use between researchers, institutions, organizations, countries, etc. (i.e. adhering to standards for formats, as much as possible compliant with available (open) software applications, and in particular facilitating re-combinations with different datasets from different origins)?

What data and metadata vocabularies, standards or methodologies will you follow to make your data interoperable?

As mentioned above, we will use ISA specification for metadata creation. The following metadata standards will also be used: MIAPPE, MIxS, MigsEu, MigsOrg, MIMS, MIMARKSSpecimen, MIMARKSSurvey, MISAG, MIMAG, MINSEQE, MIAME, REMBI, MIAMET, MIAPE, MIMix, . The Metabolights submission compliant standards are used for metabolomic data. <-- Please pay attention to the sentence before. Some metabolomics partners considers Metabolights not an accepted standard. These specific standard unlike cross-domain minimal sets such as the Dublin core, which mostly define the submitter and the general type of data, allow reusability by other researchers by defining properties of the plant (see the preceding section). However, Example Project also implement minimal cross-domain annotations such as Dublin Core, Darwin Core, Schema.org, BioSchemas, MARC 21 . The core integration with DataPLANT will also allow individual releases to be tagged with a Digital Object Identifier (DOI). Other standards such as other standards are also adhered to. The metadata standards will thus allow the integration of data across projects and safegard the established and tested protocols being reused. Additionally, we will use ontology terms to enrich the data sets relying on free and open ontologies. In addition, additional ontology terms might be created and be canonized during the Example Project.

Will you be using standard vocabularies for all data types present in your data set, to allow inter-disciplinary interoperability?

Common and open ontologies will be used. In fact, open biomedical ontologies will be used where they are mature. As stated in the previous question, sometimes ontologies and controlled vocabularies might have to be extended. Here, Example Project will build on the DataPLANT biology ontology (DPBO) developed in DataPLANT. Ontology databases such as OBO Foundry will be used to publish ontology. The DPBO is also published in GitHub https://github.com/nfdi4plants/nfdi4plants_ontology .

In case it is unavoidable that you use uncommon or generate project specific ontologies or vocabularies, will you provide mappings to more commonly used ontologies?

Common and open ontologies will be used, so this question does not apply.

Increase data reuse (by clarifying licences)

How will the data be licensed to permit the widest re-use possible?

Open licenses, such as Creative Commons (CC), will be used whenever possible.

When will the data be made available for re-use? If an embargo is sought to give time to publish or seek patents, specify why and how long this will apply, bearing in mind that research data should be made available as soon as possible.

Some raw data is made public as soon as it is collected and processed. Relevant processed datasets are made public when the research findings are published. At the end of the project, all data without embargo period will be published. Data, which is subject to an embargo period, is not publicly accessible until the end of embargo period. Data is made available upon request, allowing controlled sharing while ensuring responsible use. IP issues will be checked before publication. All consortium partners will be encouraged to make data available before publication, openly and/or under pre-publication agreements such as those started in Fort Lauderdale and set forth by the Toronto International Data Release Workshop . This will be implemented as soon as IP-related checks are complete.

Are the data produced and/or used in the project usable by third parties, in particular after the end of the project? If the re-use of some data is restricted, explain why.

There will be no restrictions once the data are made public.

How long is it intended that the data remains re-usable?

The data will be made available for many years and ideally indefinitely after the end of the project .

Data submitted to repositories (as detailed above) would be subject to local data storage regulation.

Are data quality assurance processes described?

The data will be checked and curated by using data collection protocol, personnel training, data cleaning, data analysis, and quality control Furthermore, data will be analyzed for quality control (QC) problems using automatic procedures as well as by manual curation. Document all data quality assurance processes, including the data collection protocol, data cleaning procedures, data analysis techniques, and quality control measures. This documentation should be kept for future reference and should be made available to stakeholders upon request. PhD students and lab professionals will be responsible for the first-hand quality control. Afterwards, the data will be checked and annotated by Example data officer name. FastQC will be conducted on the base-calling. Before publication, the data will be controlled again.

2.3    Allocation of resources

What are the costs for making data FAIR in your project?

Example Project will bear the costs of data curation, ARC consistency checks, and data maintenance/security before transfer to public repositories. Subsequent costs are then borne by the operators of these repositories.
Additionally, costs for after publication storage are incurred by end-point repositories (e.g. ENA) but not charged against Example Project or its members but by the operation budget of these repositories.

How will these be covered? Note that costs related to open access to research data are eligible as part of the Horizon 2020 or Horizon Europe grant (if compliant with the Grant Agreement conditions).

The data related cost of Example Project are covered by the project funding. Pre-existing structures such as structures, tools, and knowledge laid down in the DataPLANT consortium will also be used.

Who will be responsible for data management in your project?

The responsible will be Example data officer name as data officer. The data responsible(s) (data officer or partner name ) decides on the preservation of data not submitted to end-point subject area repositories or ARCs in DataPLANT after the project end. This will be in line with EU institute policies, and data sharing based on EU and international standards.

Are the resources for long term preservation discussed (costs and potential value, who decides and how/what data will be kept and for how long)?

The data officer or partner name will ultimately decide on the strategy to preserve data that are not submitted to end-point subject area repositories or ARCs in DataPLANT when the project ends. This will be in line with EU guidlines, institute policies, and data sharing based on EU and international standards.

2.4    Data security

What provisions are in place for data security (including data recovery as well as secure storage and transfer of sensitive data)?

Online platforms will be protected by vulnerability scanning, two-factor authorization and daily automatic backups allowing immediate recovery. All partners holding confidential project data will use secure platforms with automatic backups and offsite secure copies. As ARCs are stored in the PLANTDataHUB of DataPLANT, data security will be imposed. This comprises secure storage, and the use of password and usernames is generally transferred via separate safe media.

Is the data safely stored in certified repositories for long term preservation and curation?

In addition to project related sharing platform, data will be stored in international discipline related repositories which use specialized technologies and preserve data for more than 10 years: NCBI-GenBank, EBI-ENA, NCBI-SRA, NCBI-GEO, EBI-ArrayExpress, EBI-BioImage Archive, IDR, EBI-Metabolights, Metabolomics Workbench, IntAct, EBI-PRIDE, PDB, Chebi, e!DAL-PGP, . Other repositories will also be used to store data and the data will be processed there as well.

2.5    Ethical aspects

Are there any ethical or legal issues that can have an impact on data sharing? These can also be discussed in the context of an ethics review. If relevant, include references to ethics deliverables and ethics chapter in the Description of the Action (DoA).

At the moment, we do not anticipate ethical or legal issues with data sharing. In terms of ethics, since this is plant data, there is no need for an ethics committee to deal with data from plants, although we will diligently follow the Nagoya protocol on access and benefit sharing. <-- Please pay attention to the sentence before. Please ensure that you complete any necessary due diligence. Currently, we are awaiting clarification on whether the Nagoya Protocol (🡺 see Nagoya Protocol) will encompass sequence information. Regardless, if you use material from a country other than your own (or that of your partner), and you conduct physical or biochemical characterization (e.g., metabolites, proteome, RNASeq, etc.), this may constitute an action relevant under the Nagoya Protocol. Exceptions might include materials from countries such as the U.S. (non-partner), Ireland (has not signed—still contact them), etc., though other laws could apply.

Is informed consent for data sharing and long term preservation included in questionnaires dealing with personal data?

The only personal data that will potentially be stored is the submitter name and affiliation in the metadata for data. In addition, personal data will be collected for dissemination and communication activities using specific methods and procedures developed by Example Project partners to adhere to data protection. <-- Please pay attention to the sentence before. You need to inform and better get WRITTEN consent that you store emails and names or even pseudonyms such as twitter handles, we are very sorry about these issues we didn’t invent them

2.6    Other issues

Do you make use of other national/funder/sectorial/departmental procedures for data management? If yes, which ones?

Yes, Example Project will use common Research Data Management (RDM) infrastructures developed by the NFDI of Germany, INRAe from France, EOSC (European Open Science Cloud), .

3     Annexes

3.1     Abbreviations

ARC Annotated Research Context
CC Creative Commons
CC CEL Creative Commons Rights Expressio Language
ChEBI Chemical Entities of Biological Interest
DDBJ DNA Data Bank of Japan
DMP Data Management Plan
DoA Description of Action
DOI Digital Object Identifier
EBI European Bioinformatics Institute
e!DAL-PGP Plant Genomics & Phenomics Research Data Repository
ENA European Nucleotide Archive
EU European Union
FAIR Findable Accessible Interoperable Reproducible
GDPR General data protection regulation (of the EU)
GEO Gene Expression Omnibus
IDR Image Data Resource
IP Intellectual Property
ISO International Organization for Standardization
MIAME Minimum Information About a Microarray Experiment
MIAMET Minimum Information About a METabolomics experiment
MIAPE Minimum Information About a Proteomics Experiment
MIAPPE Minimum Information about Plant Phenotyping Experiment
MigsEu Minimum Information about a Genome Sequence: Eucaryote
MigsOrg Minimum Information about a Genome Sequence: Organelle
MIMAG Minimum Information about Metagenome-Assembled Genome
MIMARKSSpecimen Minimal Information about a Marker Specimen: Specimen
MIMARKSSurvey Minimal Information about a Marker Specimen: Survey
MIMIX The Minimum Information required for reporting a Molecular Interaction Experiment
MIMS Molecular Interactions
MinSEQe Minimum Information about a high-throughput Sequencing Experiment
MISAG Minimum Information about a Single Amplified Genome
MIxS Minimum Information about any (X) Sequence
NCBI National Center for Biotechnology Information
NFDI National Research Data Infrastructure (of Germany)
NGS Next Generation Sequencing
PRIDE PRoteomics IDEntifications Database
PDB Protein Data Bank
RDM Research Data Management
REMBI Recommended Metadata for Biological Images
RNASeq Ribonucleic Acid Sequencing
SOP Standard Operating Procedures
SRA Sequence Read Archive
SWATE Swate Workflow Annotation Tool for Excel
ONP Oxford Nanopore
qRT PCR quantitative real time polymerase chain reaction
WP Work Package

1 Basic Information:

1.1 What is the project name or acronym?

Who is most likely to benefit from the data? 🛈

1.3 Other 🛈 DMP Metadata

1.4 Please select from the following options

2. What kind of data will you handle?

2.1 Where will you submit your data as endpoints?

3. How much data are you likely to generate?

GB
GB

3.3 Data formats used:

4. Are any of the following standards relevant to your project?

more info 🛈
more info 🛈
more info 🛈
more info 🛈
more info 🛈
more info 🛈
more info 🛈
more info 🛈
more info 🛈
more info 🛈
more info 🛈
more info 🛈
more info 🛈
more info 🛈
more info 🛈

4.1 Will you adhere to any high level metadata submission standards?

more info 🛈
more info 🛈
more info 🛈
more info 🛈

4.2 Project data will be published:

4.4 Which national, funder, sectorial, or departmental data management procedures do you follow, and which national research data infrastructure do you use?

5. Do you intend to use data visualization in your project?

























0%
The project aim should be a part of a sentence.

Example 1 : aims at creating a computational model of carbon and water flow within a whole plant architecture


Example 2: aims at generating data management plan with minimal effort and making the data as open as possible

The project object = target.

Example 1: carbon and water flow in plants


Example 2: data management plan

Here is the space for additional sentence.

Example 1: Industry, politicians and students can also use the data for different purposes.


Example 2: The data acquired in the project can be used by a wide range of people with different purpose.

Some information in this section is only used in DMP metadata and not used in the document

Data officers are also known as data stewards and curator.

software that legally remains the property of the organization, group, or individual who created it.

User-defined template

You can click the dotted box to start editing. Click the green buttons to reuse templates. Click [Finished Editing] to use it in the tool. A guide can be found here

MIAPPE is an open, community driven, data standard designed to harmonize data from plant phenotyping experiments. MIAPPE provides a specification including a checklist and a data model of metadata required to adequately describe plant phenotyping experiments. More information can be found on: MIAPPE Homepage

The Genomic Standards Consortium (GSC) has developed a standard called the Minimum Information about any (x) Sequence (MIxS) for reporting marker gene sequences. This standard includes the Minimum Information about a Marker Gene Sequence (MIMARKS) and introduces 'environmental packages' to describe the environment from which biological samples originate. MIxS aims to unify standards for describing sequence data, providing a single point of entry for the scientific community to access GSC checklists. Adoption of MIxS will improve the analysis of genetic diversity documented by extensive DNA sequencing efforts across various ecosystems. More information can be found on: MIxS Homepage

Use for eukaryotic genomic sequences. Organism must have lineage Eukaryota. More information can be found on: MIGS publication

Minimum Information about a Genome Sequence: Organelle. More information can be found on: MIGS publication

MIGS/MIMS (Minimum Information About a (Meta)Genome Sequence) outlines a conceptual structure for extending the core information that has been traditionally captured by the INSDC (DDBJ/EMBL/Genbank) to describe genomic and metagenomic sequences. The MIMS extension describes key aspects of environmental context. More information can be found on: MIGS/MIMS publication

Use for any type of marker gene sequences, eg, 16S, 18S, 23S, 28S rRNA or COI obtained from cultured or voucher-identifiable specimens. More information can be found on: MIMARKSSpecimen publication

Use for any type of marker gene sequences, eg, 16S, 18S, 23S, 28S rRNA or COI obtained directly from the environment, without culturing or identification of the organisms. More information can be found on: MIMARKSSurvey publication

The Minimum Information about a Single Amplified Genome (MISAG) and the Minimum Information about a Metagenome-Assembled Genome (MIMAG), including, but not limited to, assembly quality, and estimates of genome completeness and contamination. These standards can be used in combination with other GSC checklists, including the Minimum Information about a Genome Sequence (MIGS), Minimum Information about a Metagenomic Sequence (MIMS), and Minimum Information about a Marker Gene Sequence (MIMARKS). Community-wide adoption of MISAG and MIMAG will facilitate more robust comparative genomic analyses of bacterial and archaeal diversity. More information can be found on: MISAG publication

The Minimum Information about a Metagenome-Assembled Genome (MIMAG), including, but not limited to, assembly quality, and estimates of genome completeness and contamination. These standards can be used in combination with other GSC checklists, including the Minimum Information about a Genome Sequence (MIGS), Minimum Information about a Metagenomic Sequence (MIMS), and Minimum Information about a Marker Gene Sequence (MIMARKS). Community-wide adoption of MISAG and MIMAG will facilitate more robust comparative genomic analyses of bacterial and archaeal diversity. More information can be found on: MIMAG publication

the Minimum Information About a Microarray Experiment (MIAME), that describes the minimum information required to ensure that microarray data can be easily interpreted and that results derived from its analysis can be independently verified. The ultimate goal of this work is to establish a standard for recording and reporting microarray-based gene expression data, which will in turn facilitate the establishment of databases and public repositories and enable the development of data analysis tools. More information can be found on: MIAME publication

MIAMET (Minimum Information About a METabolomics experiment) is an analogous standard, but adapted to the specifics of metabolomics. More information can be found on: MIAMET publication

MINSEQE describes the Minimum Information about a high-throughput nucleotide SEQuencing Experiment that is needed to enable the unambiguous interpretation and facilitate reproduction of the results of the experiment. By analogy to the MIAME guidelines for microarray experiments, adherence to the MINSEQE guidelines will improve integration of multiple experiments across different modalities, thereby maximising the value of high-throughput research. More information can be found on: MINSEQE publication

Recommended Metadata for Biological Images (REMBI) provides guidelines for metadata for biological images to enable the FAIR sharing of scientific data. REMBI is the result of the bioimaging community coming together to develop metadata standards that describe the imaging data itself, together with supporting metadata such as those describing the biological study and sample. It will also help enable automated data harvesting using machine learning techniques. More information can be found on: REMBI publication

The MIAPE guidelines should require sufficient information about a dataset and its experimental context to allow a reader to understand and critically evaluate the interpretation and conclusions, and to support their experimental corroboration. Practicability. More information can be found on: MIAPE publication

Adherence to MIMIx, the minimum information required for reporting a molecular interaction experiment, will result in publications of increased clarity and usefulness to the scientific community and will support the rapid, systematic capture of molecular interaction data in public databases, thereby improving access to valuable interaction data. More information can be found on: MIMIx publication

The Dublin Core vocabulary, also known as the Dublin Core Metadata Terms (DCMT), is a general purpose metadata vocabulary for describing resources of any type. It was first developed for describing web content in the early days of the World Wide Web. The Dublin Core Metadata Initiative (DCMI) is responsible for maintaining the Dublin Core vocabulary. More information can be found on: Dublin Core official homepage

Darwin Core is a standard maintained by the Darwin Core Maintenance Interest Group. It includes a glossary of terms (in other contexts these might be called properties, elements, fields, columns, attributes, or concepts) intended to facilitate the sharing of information about biological diversity by providing identifiers, labels, and definitions. Darwin Core is primarily based on taxa, their occurrence in nature as documented by observations, specimens, samples, and related information. More information can be found on: Darwin Core official homepage

Schema.org is a collaborative, community activity with a mission to create, maintain, and promote schemas for structured data on the Internet, on web pages, in email messages, and beyond. More information can be found on: Schema.org official homepage

Bioschemas aims to improve the Findability on the Web of life sciences resources such as datasets, software, and training materials. It does this by encouraging people in the life sciences to use Schema.org markup in their websites so that they are indexable by search engines and other services. Bioschemas encourages the consistent use of markup to ease the consumption of the contained markup across many sites. This structured information then makes it easier to discover, collate, and analyse distributed resources. More information can be found on: Bioschemas official homepage

The MARC formats are standards for the representation and exchange of data in machine-readable form.The Network Development and MARC Standards Office (NDMSO), supported by the MARC Advisory Committee, is responsible for maintaining and developing the format. MARC 21 is also available in an XML structure as well as in two extended format variants in the context of MARC local fields. More information can be found on: MARC official homepage

a document template