.. _gwdatafind-htcondor:
##############################
Using GWDataFind with HTCondor
##############################
`HTCondor `__ is a specialised workload management
system for compute-intensive processing.
HTCondor is used to specify discrete work units (jobs) you want completed
that are then distributed across the available resources with sophisticated
scheduling, prioritisation, monitoring, and reporting capabilities.
The LIGO Scientific Collaboration and its partners leverage HTCondor to
process huge amounts of scientific analysis.
=============================================
Configuring HTCondor job data with GWDataFind
=============================================
The most common use case of combining GWDataFind with HTCondor is to query
for the URIs of input data files as part of planning a job or workflow.
For large analyses, the URIs returned by GWDataFind are commonly split into
logical chunks, one or a few files at a time, where each HTCondor job will
only process data files from that chunk.
Other chunks are processed in parallel with results combined in a
subsequent analysis stage.
The best practice usage of input data files with HTCondor is to specify
each data file needed by a job as part of the
`transfer_input_files `__
submit command.
Each argument passed to ``transfer_input_files`` can be a file path or URI,
HTCondor will then transfer each file into the (temporary) working directory of
the job.
The process that is started on the compute node can then see each of the input
files as a local file in the current working directory tree.
.. admonition:: Pelican and OSDF
:class: tip
:name: _gwdatafind-htcondor-pelican
The LIGO Scientific Collaboration (and partners) leverage
`the Open Science Data Federation (OSDF) `__
for data distribution.
Depending on the GWDataFind Server you communicate with, you may be able
to directly query for OSDF URIs to pass to HTCondor.
-----
Rules
-----
The basic requirements for using GWDataFind URLs with HTCondor are:
1. Pass *absolute* URLs or paths to ``transfer_input_files`` for each job,
or via a macro variable for each DAGMan node.
2. Pass *relative* paths (normally just a file (base)name) to
the executable, either directly or via a cache file.
3. Include the disk space required to store the data files in the
``request_disk`` command for the job. If you're note sure how big
the files will be, it's probably OK to give a conservative overestimate.
4. If access to the files requires an authorisation token, include that
in the job configuration.
------------------------------
Example 1: Explicit file paths
------------------------------
To configure a single job where the executable takes explicit file paths
as arguments, consider the following example:
.. code-block:: python
:name: gwdatafind-htcondor-file-transfer-explicit
:caption: Passing input files to HTCondor (explicit)
from os.path import basename
from gwdatafind import find_urls
# find input data OSDF URIs for GW170817
urls = find_urls(
"L",
"L1_GWOSC_O2_4KHZ_R1",
1187008880,
1187008884,
host="datafind.gwosc.org",
urltype="osdf",
)
filenames = map(basename, urls)
# write condor file transfer instructions for the job
with open("job.submit", 'w') as submit_file:
print(f"""
universe = vanilla
executable = /bin/head
arguments = -c4 {' '.join(filenames)}
log = job.log
error = job.err
output = job.out
request_cpus = 1
request_disk = 10GB
request_memory = 100MB
should_transfer_files = YES
transfer_input_files = {','.join(urls)}
queue
""", file=submit_file)
This will lead to a `job.submit` file that looks something like this:
.. code-block:: ini
:name: gwdatafind-htcondor-file-transfer-explicit-submit
:caption: ``job.submit``
universe = vanilla
executable = /bin/head
arguments = -c4 L-L1_GWOSC_O2_4KHZ_R1-1187008512-4096.gwf
log = job.log
error = job.err
output = job.out
request_cpus = 1
request_disk = 10GB
request_memory = 100MB
should_transfer_files = YES
transfer_input_files = osdf:///gwdata/O2/strain.4k/frame.v1/L1/1186988032/L-L1_GWOSC_O2_4KHZ_R1-1187008512-4096.gwf
queue
.. admonition:: Directory structure on the execute machine
:class: note
The simple example above demonstrates how to transfer files into the
top-level job directory, assuming that the process spawned by the
job doesn't attempt to change directories or expect data to exist in
a subdirectory.
If the executable doesn't run from the base directory, or changes
directory *before* reading the data, ensure that the local cache file
is written from the point-of-view of the executable at the moment
it attempts to read the data.
-----------------------------
Example 2: Using a cache file
-----------------------------
A common pattern is for an executable to read a file that lists the paths
of the data files to be used for the job.
GWDataFind includes a `gwdatafind.io.Cache` object that simplifies translating
lists of URLs into various common cache formats.
Consider the following example:
.. code-block:: python
:name: gwdatafind-htcondor-file-transfer-cache
:caption: Passing input files to HTCondor with a cache file
from gwdatafind import find_urls
from gwdatafind.io import Cache
# find input data OSDF URIs for GW170817
urls = find_urls(
"L",
"L1_GWOSC_O2_4KHZ_R1",
1187008880,
1187008884,
host="datafind.gwosc.org",
urltype="osdf",
)
# create a cache containing just the basenames of each file, as seen
# from the job running on the HTCondor Execute Point (compute node)
cache = Cache(map(basename, urls))
cachefile = "cache.txt"
# write the cache in LAL format (by default) to be used by the job
cache.write(cachefile)
# write condor file transfer instructions for the job
with open("job.submit", 'w') as submit_file:
print(f"""
universe = vanilla
executable = /bin/science
arguments = {cachefile}
... other instructions ...
transfer_input_files = {','.join(urls)},{cachefile}
queue
""", file=submit_file)
This example will result in a local cache file that looks like this:
.. code-block:: text
:name: gwdatafind-htcondor-file-transfer-local-cache
:caption: ``cache.txt``
L L1_GWOSC_O2_4KHZ_R1 1187008512 4096 L-L1_GWOSC_O2_4KHZ_R1-1187008512-4096.gwf
The job submit file should then include the following:
.. code-block:: ini
:name: gwdatafind-htcondor-file-transfer-local-cache-submit
:caption: ``job.submit``
should_transfer_files = YES
transfer_input_files = osdf:///gwdata/O2/strain.4k/frame.v1/L1/1186988032/L-L1_GWOSC_O2_4KHZ_R1-1187008512-4096.gwf,cache.txt
.. admonition:: Include the cache file in ``transfer_input_files``
:class: important
For jobs that use a cache file, it is critical to include the cache
file itself in the ``transfer_input_files`` list, otherwise it won't
be available to the executable.