Research data has seen explosive growth with ever-increasing instrument resolution and the advent of “big data” methods such as deep learning. To share these large datasets, Einstein has implemented several different tools which apply to different scenarios.
This document outlines the use of Globus, a tool for easily moving large files, and to automate file transfers.
Below are instructions for:
Globus enables users to exchange files with a simple file explorer interface. Globus has been designed to deal with large files efficiently but is useful for any size data. Globus consists of a browser-based file manager user interface and Grid-FTP and other tools in the background to move and schedule data.
Globus transfers the data securely directly from the source to the destination, monitors the progress, and validates that the data has been copied correctly. Once the transfer has been initiated you do not need to watch over the process. Globus will navigate any network interruptions and resume the transfer where it left off.
Globus organizes data into collections. Collections are pre-defined data sources that can be at any institution that uses Globus. To use transfer data using Globus, you must have permission to access the source and destination collections set up ahead of time. For example, you may have a collection defined on data.einstein and one on your personal or lab workstation, then you can use Globus to move data without the need to mount data.einstein on your workstation.
For detailed information about Globus see Globus Documentation.
Everyone with either and einsteinmed.edu or montifiore.org email address can use Globus.
The Globus web application, https://app.globus.org, uses the Einstein campus Active Directory to identify allowed users. Once you are on the app, you can see any collections that you have permission to access. To create a collection on data.einstein, create a service request with Information Technology at Einstein.
To get started, navigate to https://app.globus.org where you will be prompted to login. Select Albert Einstein College of Med from the drop-down menu:
You will then be prompted to enter your credentials:
You will then be presented with the File Manager page:
Click on the Search area of the Collection menu and type in the name of your collection. Select the desired two collections, and Globus will list the contents in the file browser. You can then select files and folders to be transferred from one are to the other. Note that the two collections are both source and destination.
Data collections are central to Globus . You can set up your own personal collection by installing Globus Connect Personal on your system. This establishes your system as a Globus endpoint, and you can select a certain area of your local disk for sharing. Note that only people to whom you give permission can access your data. In this example, you only provide permission to yourself between the two systems.
If a collaborator has given permission to you to see their collection, you can access it through the file browser.
To create a collection on a computer for which you do not have admin privileges, such as data.einstein, create a service request with Information Technology at Einstein.
The primary resource available online is the HPC website:
Here you will find information on the cluster, training videos, and documentation on using the cluster and data storage.
For user support go to:
Our intent is that you will use this document as a guide. While we work diligently to prepare accurate documentation, the steps we have outlined and the screen shots we have provided above will not precisely replicate each person’s experience as technology evolves and varies based on device hardware, software version, and device customizations or configurations. If this guide is inadequate for your needs, please open a support request with us and we will help you.