The movement of large-scale (tens of Terabytes and larger) data sets between
high performance computing (HPC) facilities is an important and increasingly
critical capability. A growing number of scientific collaborations rely on HPC
facilities for tasks which either require large-scale data sets as input or
produce large-scale data sets as output. In order to enable the transfer of
these data sets as needed by the scientific community, HPC facilities must
design and deploy the appropriate data transfer capabilities to allow users to
do data placement at scale.
This paper describes the Petascale DTN Project, an effort undertaken by four
HPC facilities, which succeeded in achieving routine data transfer rates of
over 1PB/week between the facilities. We describe the design and configuration
of the Data Transfer Node (DTN) clusters used for large-scale data transfers at
these facilities, the software tools used, and the performance tuning that
enabled this capability.
