SuperTAD is an open-source command-line TAD detection package written in C++. It takes either raw or normalized Hi-C contact maps as inputs. Given an input matrix, SuperTAD provides two modes for users, both are to find the optimal coding tree from the input. If a user supplies an integer parameter h, it will construct the optimal coding tree of height at most h, as SuperTAD(h); otherwise, it will construct the optimal tree among all the possible height. Given the optimal trees, we also design and provide a filter to the tree nodes and prune the non-TAD nodes.
The analysis of simulation data illustrates that SuperTAD has higher accuracy and robustness under great noise ratio and variance of sizes. With the constraint of two-layer, our experiments show that SuperTAD(2) finds the structure with less structure entropy than deDoc. The comparison with other seven methods shows that SuperTAD has a significant enrichment of structural proteins around predicted boundaries and histone modifications within TADs, and displays a high consistency between different resolutions of an identical Hi-C matrix, which proves that SuperTAD has the potential to identify the essential structure of the Hi-C data.
With the same input matrix, SuperTAD provides two modes for users. SuperTAD (the first mode) does not require any user-defined parameter and can determine the height of the coding tree by self-learning. SuperTAD(h) (the second mode) receives the manually selected h as the only parameter and find the optimal coding tree with the constraint of h. For both modes, many coding tree candidates with various leaves number k are created. The optimal coding tree is selected by determining the most appropriate k. For SuperTAD, optional nodes filtering is performed to prune false-positive TADs from the optimal binary coding tree. The result after pruning is referred to as SuperTAD(F).
git clone https://github.com/deepomicslab/SuperTAD SuperTAD
or download from source
wget https://supertad.deepomics.org/home/download_src -O SuperTAD.tar.gz
tar -xzvf SuperTAD.tar.gz
binary: The first mode requires no user-defined parameters, run the nodes filtering by default
./SuperTAD binary <input Hi-C matrix> [-option values]
--no-filter: If given, do not filter TADs after TAD detection
multi: The second mode requires a parameter h to determine the number of layers
./SuperTAD multi <input Hi-C matrix> -h <height> [-option values]
-h <int>: The height of coding tree, default: 2
SHARED OPTIONS for
-K <int>: The number of leaves in the coding tree, default: nan (determined by the algorithm)
--chrom1 <string>: chrom1 label, default: chr1
--chrom2 <string>: chrom2 label, default: the same as chrom1
--chrom1-start <int>: start pos on chrom1, default: 0
--chrom2-start <int>: start pos on chrom2, default: the same as --chrom1-start
-r/--resolution <int>: bin resolution, default: 10000
filter: The nodes filter for optimal coding tree:
./SuperTAD filter <input Hi-C matrix> -i <original result>
-i <string>: The list of TAD candidates
compare: The symmetric metric overlapping ratio to assess the agreement between two results
./SuperTAD compare <result1> <result2>
-w <string>: Working directory path, default: the directory where the input Hi-C matrix is located
-v/--verbose: Print verbose
SuperTAD only supports the Hi-C contact matrix as input for now. We upload two examples of Hi-C matrix from Rao et al., Cell 2014 as well as the results into
The binary mode's result before filtering is stored in
The binary mode's result after filtering or the filter mode's result is stored in
The multi mode's result is stored in
All of the TAD results use the eight-column format, which records the bin indexes of detected boundaries and the genomic start and end coordinates.
An example output is shown below (resolution=1kb):
chr1 1 0 1000 chr1 44 43000 44000
chr1 9 8000 9000 chr1 16 15000 16000
chr1 17 16000 17000 chr1 44 43000 44000
Each column is represented as:
1st-the chromosome of left boundary
2nd-the bin index that identified as the left boundary (start bin)
3rd-the start coordinate of start bin, in bp
4th-the end coordinate of start bin, in bp
5th-the chromosome of right boundary
6th-the bin index that identified as the right boundary (end bin)
7th-the start coordinate of end bin, in bp
8th-the end coordinate of end bin, in bp
One example result as well as its input Hi-C contact map is shown in the left, the formed coding tree of the example result is shown in the right.
./build/SuperTAD binary ./data/example_sub_GM12878_chr19_KR25kb_matrix.txt --chrom1 chr19 -r 25000 --chrom1-start 30000000
This command will run binary mode (SuperTAD) on the contact map of GM12878,chr19 at 25kb resolution and save all TADs to the example_sub_GM12878_chr19_KR25kb_matrix.txt.binary.original.tsv.
--no-filter is not given, the mode runs nodes filtering by default and saves the filtered TADs to the example_sub_GM12878_chr19_KR25kb_matrix.txt.binary.filter.tsv.
./build/SuperTAD multi ./data/example_sub_GM12878_chr19_KR25kb_matrix.txt -h 2 --chrom1 chr19 -r 25000 --chrom1-start 30000000
This command will run multi-nary mode (SuperTAD(h)) on the contact map of GM12878,chr19 at 25kb resolution and save all TADs to the example_sub_GM12878_chr19_KR25kb_matrix.txt.multi.tsv.
./build/SuperTAD filter ./data/example_sub_GM12878_chr19_KR25kb_matrix.txt -i ./data/example_sub_GM12878_chr19_KR25kb_matrix.txt.binary.original.tsv
This command will independently run the nodes filtering for the TADs in
-i indicated result and save the selected TADs to *.binary.filter.tsv.
./build/SuperTAD compare ./data/example_sub_GM12878_chr19_KR25kb_matrix.txt.multi.tsv ./data/example_sub_IMR90_chr19_KR25kb_matrix.txt.multi.tsv
This command will compute the overlapping ratio between two results.