The SKA Data Challenge 2

These pages contain the documentation related to the software developed by the HI-FRIENDS team to analyse SKA simulated data to participate in the second SKA Science Data Challenge (SDC2). The SDC2 is a source finding and source characterisation data challenge to find and characterise the neutral hydrogen content of galaxies across a sky area of 20 square degrees.

The HI-FRIENDS solution to the SDC2

HI-FRIENDS is a team participating in the SDC2. The team has implemented a scientific workflow for processing the 1TB SKA simulated data cube and produce a catalog with the properties of detected sources. This workflow, the required configuration and deployment files and this documentation are maintained with version control with git in GitHub to facilitate its re-execution. The software and parameters are optimized for the solution of this challenge, although the workflow can be used to analyse other radio data cubes because the software can deal with cubes from other observatories. The workflow is intended for SKA community members or any astronomer interested in our approach for HI source finding and characterization. This documentation aims at assisting these scientists to understand and re-use the published scientific workflow as well as to verify it.

The HI-FRIENDS Github repository contains the workflow used to find and characterize the HI sources in the data cube of the SKA Data Challenge 2. This is developed by the HI-FRIENDS team. The execution of the workflow was conducted in the SKA Regional Centre Prototype cluster at the IAA-CSIC (Spain).

Workflow general description

The details on the approach of our solution is described in the Methodology section. The workflow is managed and executed using snakemake workflow management system. It uses spectral-cube based on dask parallelization tool and astropy suite to divide the large cube in smaller pieces. On each of the subcubes, we execute Sofia-2 for masking the subcubes, find sources and characterize their properties. Finally, the individual catalogs are cleaned, concatenated into a single catalog, and duplicates from the overlapping regions are eliminated. The catalog is filtered based on the physical properties of the sources to exclude some outliers. Some diagnostic plots are produced using Jupyter notebook. Specific details on how the workflow works can be find in the Workflow section. The workflow is general purpose, but the results from the execution on thte SDC2 data cube are summarized in the SDC2 HI-FRIENDS results section.


  • Mohammad Akhlaghi - Instituto de Astrofísica de Canarias

  • Antonio Alberdi - Instituto de Astrofísica de Andalucía, CSIC

  • John Cannon - Macalester College

  • Laura Darriba - Instituto de Astrofísica de Andalucía, CSIC

  • José Francisco Gómez - Instituto de Astrofísica de Andalucía, CSIC

  • Julián Garrido - Instituto de Astrofísica de Andalucía, CSIC

  • Josh Gósza - South African Radio Astronomy Observatory

  • Diego Herranz - Instituto de Física de Cantabria

  • Michael G. Jones - The University of Arizona

  • Peter Kamphuis - Ruhr University Bochum

  • Dane Kleiner - Italian National Institute for Astrophysics

  • Isabel Márquez - Instituto de Astrofísica de Andalucía, CSIC

  • Javier Moldón - Instituto de Astrofísica de Andalucía, CSIC

  • Mamta Pandey-Pommier - Centre de Recherche Astrophysique de Lyon, Observatoire deLyon

  • Manuel Parra - Instituto de Astrofísica de Andalucía, CSIC

  • José Sabater - University of Edinburgh

  • Susana Sánchez - Instituto de Astrofísica de Andalucía, CSIC

  • Amidou Sorgho - Instituto de Astrofísica de Andalucía, CSIC

  • Lourdes Verdes-Montenegro - Instituto de Astrofísica de Andalucía, CSIC