Data Management

This section describes tools that assist in managing data across facilities.

Globus

Globus is commonly used for managing data within and between compute facilities and users’ home institutions. Details of using Globus at the compute facilities is available on the Data Transfer page.

DataFed

_images/Cross_Facility_Federation_of_Repositories.png

DataFed is scientific data management and collaboration system that dramatically simplifies tasks such as organizing, searching for, sharing, discovering, and reusing data , especially across facilities and organizations. This is made possible because DataFed “federates” repositories of data into a single, cohesive, and uniform platform. DataFed was designed to enhance productivity and reproducibility in scientific research, keeping the increasingly global and multidisciplinary nature of research involving “big data” in mind. DataFed supports the early lifecycle stages of “working” scientific data and serves as a tool to ease the burden associated with capturing, organizing, and sharing potentially large volumes of heterogeneous scientific data. DataFed provides a FAIR-principled environment in which scientific data can be precisely controlled and refined in preparation for eventual data publishing. DataFed can be accessed via a web portal or via command-line and python application programming interfaces.

The documentation website provides information on preparing prerequisites, installing and configuring DataFed, user guides for the command-line and python application programming interfaces, Jupyter notebooks and corresponding YouTube videos that were part of hands-on tutorials.