.. _gwdatafind-htcondor: ############################## Using GWDataFind with HTCondor ############################## `HTCondor `__ is a specialised workload management system for compute-intensive processing. HTCondor is used to specify discrete work units (jobs) you want completed that are then distributed across the available resources with sophisticated scheduling, prioritisation, monitoring, and reporting capabilities. The LIGO Scientific Collaboration and its partners leverage HTCondor to process huge amounts of scientific analysis. ============================================= Configuring HTCondor job data with GWDataFind ============================================= The most common use case of combining GWDataFind with HTCondor is to query for the URIs of input data files as part of planning a job or workflow. For large analyses, the URIs returned by GWDataFind are commonly split into logical chunks, one or a few files at a time, where each HTCondor job will only process data files from that chunk. Other chunks are processed in parallel with results combined in a subsequent analysis stage. The best practice usage of input data files with HTCondor is to specify each data file needed by a job as part of the `transfer_input_files `__ submit command. Each argument passed to ``transfer_input_files`` can be a file path or URI, HTCondor will then transfer each file into the (temporary) working directory of the job. The process that is started on the compute node can then see each of the input files as a local file in the current working directory tree. .. admonition:: Pelican and OSDF :class: tip :name: _gwdatafind-htcondor-pelican The LIGO Scientific Collaboration (and partners) leverage `the Open Science Data Federation (OSDF) `__ for data distribution. Depending on the GWDataFind Server you communicate with, you may be able to directly query for OSDF URIs to pass to HTCondor. ----- Rules ----- The basic requirements for using GWDataFind URLs with HTCondor are: 1. Pass *absolute* URLs or paths to ``transfer_input_files`` for each job, or via a macro variable for each DAGMan node. 2. Pass *relative* paths (normally just a file (base)name) to the executable, either directly or via a cache file. 3. Include the disk space required to store the data files in the ``request_disk`` command for the job. If you're note sure how big the files will be, it's probably OK to give a conservative overestimate. 4. If access to the files requires an authorisation token, include that in the job configuration. ------------------------------ Example 1: Explicit file paths ------------------------------ To configure a single job where the executable takes explicit file paths as arguments, consider the following example: .. code-block:: python :name: gwdatafind-htcondor-file-transfer-explicit :caption: Passing input files to HTCondor (explicit) from os.path import basename from gwdatafind import find_urls # find input data OSDF URIs for GW170817 urls = find_urls( "L", "L1_GWOSC_O2_4KHZ_R1", 1187008880, 1187008884, host="datafind.gwosc.org", urltype="osdf", ) filenames = map(basename, urls) # write condor file transfer instructions for the job with open("job.submit", 'w') as submit_file: print(f""" universe = vanilla executable = /bin/head arguments = -c4 {' '.join(filenames)} log = job.log error = job.err output = job.out request_cpus = 1 request_disk = 10GB request_memory = 100MB should_transfer_files = YES transfer_input_files = {','.join(urls)} queue """, file=submit_file) This will lead to a `job.submit` file that looks something like this: .. code-block:: ini :name: gwdatafind-htcondor-file-transfer-explicit-submit :caption: ``job.submit`` universe = vanilla executable = /bin/head arguments = -c4 L-L1_GWOSC_O2_4KHZ_R1-1187008512-4096.gwf log = job.log error = job.err output = job.out request_cpus = 1 request_disk = 10GB request_memory = 100MB should_transfer_files = YES transfer_input_files = osdf:///gwdata/O2/strain.4k/frame.v1/L1/1186988032/L-L1_GWOSC_O2_4KHZ_R1-1187008512-4096.gwf queue .. admonition:: Directory structure on the execute machine :class: note The simple example above demonstrates how to transfer files into the top-level job directory, assuming that the process spawned by the job doesn't attempt to change directories or expect data to exist in a subdirectory. If the executable doesn't run from the base directory, or changes directory *before* reading the data, ensure that the local cache file is written from the point-of-view of the executable at the moment it attempts to read the data. ----------------------------- Example 2: Using a cache file ----------------------------- A common pattern is for an executable to read a file that lists the paths of the data files to be used for the job. GWDataFind includes a `gwdatafind.io.Cache` object that simplifies translating lists of URLs into various common cache formats. Consider the following example: .. code-block:: python :name: gwdatafind-htcondor-file-transfer-cache :caption: Passing input files to HTCondor with a cache file from gwdatafind import find_urls from gwdatafind.io import Cache # find input data OSDF URIs for GW170817 urls = find_urls( "L", "L1_GWOSC_O2_4KHZ_R1", 1187008880, 1187008884, host="datafind.gwosc.org", urltype="osdf", ) # create a cache containing just the basenames of each file, as seen # from the job running on the HTCondor Execute Point (compute node) cache = Cache(map(basename, urls)) cachefile = "cache.txt" # write the cache in LAL format (by default) to be used by the job cache.write(cachefile) # write condor file transfer instructions for the job with open("job.submit", 'w') as submit_file: print(f""" universe = vanilla executable = /bin/science arguments = {cachefile} ... other instructions ... transfer_input_files = {','.join(urls)},{cachefile} queue """, file=submit_file) This example will result in a local cache file that looks like this: .. code-block:: text :name: gwdatafind-htcondor-file-transfer-local-cache :caption: ``cache.txt`` L L1_GWOSC_O2_4KHZ_R1 1187008512 4096 L-L1_GWOSC_O2_4KHZ_R1-1187008512-4096.gwf The job submit file should then include the following: .. code-block:: ini :name: gwdatafind-htcondor-file-transfer-local-cache-submit :caption: ``job.submit`` should_transfer_files = YES transfer_input_files = osdf:///gwdata/O2/strain.4k/frame.v1/L1/1186988032/L-L1_GWOSC_O2_4KHZ_R1-1187008512-4096.gwf,cache.txt .. admonition:: Include the cache file in ``transfer_input_files`` :class: important For jobs that use a cache file, it is critical to include the cache file itself in the ``transfer_input_files`` list, otherwise it won't be available to the executable.