SuperTAD is an open-source command-line TAD detection package written in C++. It takes either raw or normalized Hi-C contact maps as inputs. Given an input matrix, SuperTAD provides two modes for users, both are to find the optimal coding tree from the input. If a user supplies an integer parameter *h*, it will construct the optimal coding tree of height at most *h*, as SuperTAD(*h*); otherwise, it will construct the optimal tree among all the possible height. Given the optimal trees, we also design and provide a filter to the tree nodes and prune the non-TAD nodes.

The analysis of simulation data illustrates that SuperTAD has higher accuracy and robustness under great noise ratio and variance of sizes. With the constraint of two-layer, our experiments show that SuperTAD(2) finds the structure with less structure entropy than deDoc. The comparison with other seven methods shows that SuperTAD has a significant enrichment of structural proteins around predicted boundaries and histone modifications within TADs, and displays a high consistency between different resolutions of an identical Hi-C matrix, which proves that SuperTAD has the potential to identify the essential structure of the Hi-C data.

With the same input matrix, SuperTAD provides two modes for users. SuperTAD (the first mode) does not require any user-defined parameter and can determine the height of the coding tree by self-learning. SuperTAD(*h*) (the second mode) receives the manually selected *h* as the only parameter and find the optimal coding tree with the constraint of *h*. For both modes, many coding tree candidates with various leaves number *k* are created. The optimal coding tree is selected by determining the most appropriate *k*. For SuperTAD, optional nodes filtering is performed to prune false-positive TADs from the optimal binary coding tree. The result after pruning is referred to as SuperTAD(F).

use git:

git clone https://github.com/deepomicslab/SuperTAD SuperTAD

or download from source

wget https://supertad.deepomics.org/home/download_src -O SuperTAD.tar.gz

tar -xzvf SuperTAD.tar.gz

then

cd ./SuperTAD

mkdir build

cd build

cmake ..

make

COMMANDS:

`binary`

: The first mode requires no user-defined parameters, run the nodes filtering by default

e.g. `./SuperTAD binary <input Hi-C matrix> [-option values]`

OPTIONS:

`--no-filter`

: If given, do not filter TADs after TAD detection

`multi`

: The second mode requires a parameter h to determine the number of layers

e.g. `./SuperTAD multi <input Hi-C matrix> -h <height> [-option values]`

OPTIONS:

`-h <int>`

: The height of coding tree, default: 2

SHARED OPTIONS for `binary`

and `multi`

COMMAND:

`-K <int>`

: The number of leaves in the coding tree, default: nan (determined by the algorithm)

`--chrom1 <string>`

: chrom1 label, default: chr1

`--chrom2 <string>`

: chrom2 label, default: the same as chrom1

`--chrom1-start <int>`

: start pos on chrom1, default: 0

`--chrom2-start <int>`

: start pos on chrom2, default: the same as --chrom1-start

`-r/--resolution <int>`

: bin resolution, default: 10000

`filter`

: The nodes filter for optimal coding tree:

e.g. `./SuperTAD filter <input Hi-C matrix> -i <original result>`

OPTIONS:

`-i <string>`

: The list of TAD candidates

`compare`

: The symmetric metric overlapping ratio to assess the agreement between two results

e.g. `./SuperTAD compare <result1> <result2>`

GLOBAL OPTIONS:

`-w <string>`

: Working directory path, default: the directory where the input Hi-C matrix is located

`-v/--verbose`

: Print verbose

SuperTAD only supports the Hi-C contact matrix as input for now. We upload two examples of Hi-C matrix from *Rao et al., Cell 2014* as well as the results into `./data`

.

The binary mode's result before filtering is stored in `*.binary.original.tsv`

;

The binary mode's result after filtering or the filter mode's result is stored in `*.binary.filter.tsv`

;

The multi mode's result is stored in `*.multi.tsv`

;

All of the TAD results use the eight-column format, which records the bin indexes of detected boundaries and the genomic start and end coordinates.

An example output is shown below (resolution=1kb):

chr1 1 0 1000 chr1 44 43000 44000

chr1 9 8000 9000 chr1 16 15000 16000

chr1 17 16000 17000 chr1 44 43000 44000

...

Each column is represented as:

1st-the chromosome of left boundary

2nd-the bin index that identified as the left boundary (start bin)

3rd-the start coordinate of start bin, in bp

4th-the end coordinate of start bin, in bp

5th-the chromosome of right boundary

6th-the bin index that identified as the right boundary (end bin)

7th-the start coordinate of end bin, in bp

8th-the end coordinate of end bin, in bp

One example result as well as its input Hi-C contact map is shown in the left, the formed coding tree of the example result is shown in the right.

`./build/SuperTAD binary ./data/example_sub_GM12878_chr19_KR25kb_matrix.txt --chrom1 chr19 -r 25000 --chrom1-start 30000000`

This command will run binary mode (SuperTAD) on the contact map of GM12878,chr19 at 25kb resolution and save all TADs to the example_sub_GM12878_chr19_KR25kb_matrix.txt.binary.original.tsv.

As `--no-filter`

is not given, the mode runs nodes filtering by default and saves the filtered TADs to the example_sub_GM12878_chr19_KR25kb_matrix.txt.binary.filter.tsv.

`./build/SuperTAD multi ./data/example_sub_GM12878_chr19_KR25kb_matrix.txt -h 2 --chrom1 chr19 -r 25000 --chrom1-start 30000000`

This command will run multi-nary mode (SuperTAD(*h*)) on the contact map of GM12878,chr19 at 25kb resolution and save all TADs to the example_sub_GM12878_chr19_KR25kb_matrix.txt.multi.tsv.

`./build/SuperTAD filter ./data/example_sub_GM12878_chr19_KR25kb_matrix.txt -i ./data/example_sub_GM12878_chr19_KR25kb_matrix.txt.binary.original.tsv`

This command will independently run the nodes filtering for the TADs in `-i`

indicated result and save the selected TADs to *.binary.filter.tsv.

`./build/SuperTAD compare ./data/example_sub_GM12878_chr19_KR25kb_matrix.txt.multi.tsv ./data/example_sub_IMR90_chr19_KR25kb_matrix.txt.multi.tsv`

This command will compute the overlapping ratio between two results.