Landscape of cohesin-mediated chromatin loops in the human genome

Table of Contents

Cell lines

The cell types and lines in this study were either obtained from cell repositories or established or differentiated in the Snyder and Dalton laboratories at Stanford University and the University of Georgia, respectively (Supplementary Tables 1, 2). All tissue culture was done according to the manufacturer’s recommendations. One of the commercially available cell lines, K1 (thyroid, papillary carcinoma), is on the list of commonly misidentified cell lines (ICLAC). The relevant cell line (CVCL_9918) was also derived from a thyroid papillary carcinoma. In the event of misidentification, the conclusions of our study would not be affected because both cell lines represent papillary thyroid carcinoma.

ChIA-PET experiments

We performed ChIA-PET experiments with modifications to previously published protocols2,22. These modifications have also been independently described17,23. We used Illumina’s Nextera tagmentation to generate sequencing libraries. In brief, cells were crosslinked and subjected to nuclear lysis followed by chromatin shearing (no restriction enzyme was used). Immunoprecipitation was performed overnight at 4 °C with antibodies against the cohesin subunit RAD21 (Abcam Anti-RAD21 antibody (ab992) https://www.encodeproject.org/antibodies/ENCAB529YRC/). The immuno-complexes were pulled down with Protein-G dynabeads (Life Technologies #10003D, New York). Biotinylated linkers were ligated to the enriched fragments, followed by proximity ligation overnight at 16 °C.

Crosslinking was reversed at 65 °C with the use of Proteinase K followed by DNA purification. We used Illumina Nextera Transposase to add sequencing adapters to ChIA-PET libraries. Biotinylated fragments were enriched by pull-down with Streptavidin Dynabeads (M-280; Lifetechnologies #11205D, New York). The final libraries were sequenced on an Illumina HiSeq 2000.

ChIP–seq experiments

Chromatin immunoprecipitation followed by massively parallel sequencing was carried out as previously described32. Cells were crosslinked with formaldehyde at a final concentration of 1% for 10 min at room temperature. The reaction was quenched with glycine at a final concentration of 125 mM and nuclear lysates were sonicated using a Branson 250 Sonifier (power setting 2, 100% duty cycle for 7 × 30-s intervals). Clarified lysates corresponding to 20 million cells were treated with 1–5 μg of antibody against H3K27ac (Abcam #4729; https://www.encodeproject.org/antibodies/ENCAB000BSK/) coupled to Protein G Dynabeads (Life Technologies #10003D). The protein–DNA complexes were washed with RIPA buffer and eluted in 1% SDS TE at 65 °C. Following cross-link reversal and purification, the ChIP DNA sequencing libraries were generated according to Illumina DNA TruSeq DNA Sample Preparation Kit Instructions (Illumina Part # FC-121-2001). Pooled libraries were sequenced on an Illumina Hi-Seq 4000. To generate high-quality data sets, we used the same antibodies as in our previous studies9,32 which have been validated according to ENCODE standards27.

RNA-seq experiments

RNA samples were extracted using the Qiagen All-Prep kit, following the manufacturer’s instructions. Libraries were prepared from total RNA using the TruSeq Stranded Total RNA Library Prep Kit, following the manufacturer’s instructions. All libraries were sequenced on the Illumina Hiseq 4000

ATAC–seq experiments

ATAC–seq was carried out as previously described68 and sequencing was carried out on an Illumina HiSeq 2000 with 2 × 100 paired-end sequencing.

ChIA-PET processing pipeline

ChIA-PET data were generated in replicate for all 24 cell lines; all libraries were sequenced to an average depth of 214 ± 5.5 (mean ± s.d.) million paired-end reads (referred to as paired-end tags or PETs) (Supplementary Table 2). Data were processed in a similar way to the workflow used in the Mango toolkit28, as follows.

Trim adaptor sequences

Illumina Nextera adaptor sequences (CTGTCTCTTATA and TATAAGAGACAG) were trimmed from all PETs using cutadapt in paired-end mode (version 1.11; non-default parameters: -q 15 -O 4 -m 20).

Trim linker sequence

All PETs were scanned to identify and remove the linker sequence (GTTGGATAAG), as well as any sequences downstream of the linker sequence. PETs less than 20 bp in length after linker removal were discarded.

Align paired-end sequences

Each set of paired-end reads was aligned to the hg19 genome separately using bowtie (version 0.12.8; non-default parameters: -n 2 -l 50 -k 1 –mapq 40 –best -m 1). Paired-end reads that mapped to multiple locations were discarded.

Remove duplicate paired-end sequences

PETs that mapped to identical locations were filtered to retain only a single PET.

Generate a set of unified peak calls

For each sample, the two sets of uniquely mapped paired-end reads were merged and peaks were called using MACS269 (version 2.1.1.20160309; parameters: -g hs -f BED -q 0.01). Peak calls across all samples were combined and then extended by 500 bp in either direction. Overlapping peaks were merged to form a single interval that spanned all overlapping peaks, after which peaks in ENCODE-defined blacklist regions were filtered. In total, we obtained 286,620 RAD21 peaks (Supplementary Table 3). These merged peak regions were used as our ‘anchor regions’ for all subsequent analysis.

Generate a set of linked paired peaks

For all pairs of peaks that were >10,000 bp and <5,000,000 bp apart on chr1-22 and chrX, the total number of PETs that linked each pair was tabulated. For samples with >2,250,000 unique PETs, the total number of PETs was down-sampled to 2,250,00 before any further analysis.

Our final data set consisted of a matrix, Mi,j, in which each row (i) represents a single paired-peak, and each column (j) represents a single sample. Element mi,j indicates the number of PETs linking the two anchor regions. We normalized the data by standardizing each row in Mi,j, and then quantile-normalizing the columns. The range of values in each column was then re-scaled to between 0 and 1000.

Generating the pan-cell line loop-call data set

Unique PET data (that is, data from ‘Remove duplicate paired-end sequences’ in the ChIA-PET processing pipeline above) from all cell lines and all replicates were pooled together. Next, we tabulated the number of PETs that connected all pairs of anchor regions >10 kb and <5 Mb apart in our unified peak set (Supplementary Table 3). Finally, the Mango scoring methodology28 was used to assign each peak pair a P value; Mango uses a Bayesian scoring methodology to determine the expected number of PETs connecting any two regions on the basis of the distance between the two regions and the local ChIP-efficiency. We used a threshold of P < 2.3 × 10−9 to arrive at our pan-cell line loop set (Supplementary Table 4). We used a relatively stringent cutoff due to the large number of PETs being analysed. At this cutoff our FDR was 2.7 × 10−6 using the Benjamini–Hochberg procedure and 0.013 using the Bonferroni approach. For all subsequent analysis described below, we used the FDR estimate from the Benjamini–Hochberg procedure.

RNA-seq processing

RNA-seq data were generated in replicate for 23 out of 24 cell lines (Supplementary Table 2); we obtained on average 66 ± 18 million paired-end reads per sample (mean ± s.d.). For samples with >60 million reads, FASTQ files were down-sampled to 60 million reads before further analysis. Transcript abundances were quantified using kallisto70 (version 0.43.0; non-default parameters:–bias). Transcript sequences (that is, target sequences) were obtained from Gencode (release 25; lifted to GRCh37 coordinates). Duplicate transcripts were removed, as well as transcripts not classified as ‘protein_coding’ or ‘lncRNA’, yielding a final list of 93,430 transcripts. For all analyses, we considered only 69,598 transcripts with a maximum abundance of >1 transcripts per million (TPM) across all 23 cell lines. To produce gene-level estimates of expression, we summed the TPM values for all transcripts that belonged to the same gene. For all analyses, we considered only 22,197 genes with a maximum abundance of >1 TPM across all 23 cell lines. For GM12878 cells, we used data from a previous study32. To normalize RNA-seq data, we first standardized (that is, z-score scaled) TPM values for each transcript or gene across all cell lines and then quantile-normalized all transcript or gene abundance levels between samples.

To visualize RNA-seq data as signal tracks, down-sampled FASTQ files were aligned to the hg19 genome using HiSat2 (version 2.0.5; non-default parameters: -X 1000–fr–no-mixed–no-discordant)71, after which genome-wide coverage tracks were produced using bedtools (bedtools genomecov -bga -split -ibam). Coverage values were scaled by a constant factor (109/total number of reads) to account for differences in sequencing depth.

H3K27ac ChIP–seq data processing

ChIP–seq data were generated in replicate for 22 out of 24 cell lines (Supplementary Table 2); we obtained on average 43 ± 9 million paired-end reads per sample (mean ± s.d.). Illumina TruSeq adaptor sequences were trimmed using cutadapt in paired-end mode (non-default parameters: -q 15 -O 4 -m 20). Reads were aligned to hg19 using bowtie (version 0.12.8; non-default parameters: -m 1–fr–chunkmbs 500 -n 2 -l 50–mapq 40 –best) after which duplicate reads were removed using Picard MarkDuplicates. Finally, peaks were called using MACS269 (non-default parameters: -q 0.01). Peaks across all samples were combined and overlapping peaks were merged to form a single interval spanning all overlapping peaks. Peaks seen in fewer than two samples, peaks that overlapped ENCODE blacklisted regions (https://sites.google.com/site/anshulkundaje/projects/blacklists), and peaks on chrM and chrY were removed from further consideration. The final list of ‘enhancer’ regions consists of 288,711 peaks (Supplementary Table 5).

Genome-wide signal tracks for each sample were generated in two stages: (i) assess ChIP–seq quality and obtain the predominant fragment length using phantompeakqualtools (https://code.google.com/archive/p/phantompeakqualtools/); (ii) use align2rawsignal (https://code.google.com/archive/p/align2rawsignal/wikis/Method.wiki) to generate signal track (parameters: –n = 5, -k = epanechnikov, -l = [fragment length from step (i)], -w = 150, -f = 0). Finally, for each cell line, we extracted the signal in each of 288,711 peaks using bwtools72 (bwtools extract bed) and calculated the average value for each peak. The final data set consists of a matrix Mi,j, in which each row (i) represents a single peak and each column (j) represents a single sample. We normalized the data by standardizing each row in Mi,j, and then quantile-normalizing the columns. These normalized data were used for all downstream analyses.

Identifying super-enhancers

To call super-enhancers in each cell line we used the ROSE pipeline73,74 (default parameters).

ATAC–seq data processing

ATAC–seq data were generated in 18/24 cell lines; we obtained on average 13 ± 7 million paired-end reads. Adaptor sequences were trimmed using cutadapt in paired-end mode (non-default parameters: -q 15 -O 5 -m 30). Reads were aligned to hg19 using bowtie (version 0.12.8; non-default parameters: -X 2000, -m 1) after which duplicate reads were removed using Picard MarkDuplicates. Genome-wide signal tracks for each sample were generated using align2rawsignal (https://code.google.com/archive/p/align2rawsignal/wikis/Method.wiki)

Overlap between cohesion-mediated chromatin loops and high-resolution Hi-C loops, contact domains and TADs

We obtained the coordinates for Hi-C loops from seven cell lines (including GM12878) and contact domains in GM1287812 to calculate the overlap with our pan-cell line loops (Fig. 1d). We also obtained the coordinates for TADs across 21 human tissues and cell types19 and compared the size of these TADs to our pan-cell line loops (Fig. 1c).

Assessing CTCF motif orientation

A list of CTCF motif positions and orientations was downloaded from the ENCODE project53. We used the CTCF_known1 motif for all analysis; this motif most closely matched the one used in a previous analysis12. Next, for all loops that contained exactly one instance of the CTCF motif at both ends (that is, in both anchor regions), we calculated the percentage of loops that had each of four possible orientations (+/−, −/+, +/+, and −/−). This result was relatively robust to the choice of threshold used to define the pan-cell line loop set (FDR<10−5: 69%, FDR<10−4: 68%, FDR<0.01: 66%, FDR<0.05: 64%).

Characterizing ‘hub’ anchor regions

Promoter regions were defined as a 500-bp region immediately upstream of a gene; gene coordinates were taken from Gencode Release 25. Enhancer regions were defined as the set of 288,711 H3K27ac peaks defined from our ChIP-seq data set (see ‘H3K27ac ChIP-seq data processing’ for more information). All anchor regions were binned by the number of interactions they had in the ‘merged loop-call’ data set (Supplementary Table 4). We assessed whether anchor regions in a particular bin were enriched for overlap with functional elements such as enhancers, promoters, or contact domain boundaries (taken from a previous publication12) using Fisher’s exact test. For each bin, we tabulated the number of anchor regions that overlapped or did not overlap a functional element; we then tabulated the number of anchor regions in all other bins that overlapped or did not overlap a given functional element. These four values were used to populate a 2 × 2 contingency table and to compute a significance of enrichment. To test the robustness of our results with respect to the threshold used to define the set of merged loop-calls, we repeated this analysis using an FDR<1% (summary statistics for the fold-enrichment and P values can be found in Supplementary Table 9).

Qualitatively, we observe very similar results to Fig. 3a—regions with many interactions are enriched for enhancers and contact domain boundaries, whereas promoters tend to overlap regions with fewer interactions.

PCA

We performed PCA on the matrix of normalized interaction frequencies of 85,294 loops by 48 samples using the prcomp function in R (default options). The 85,294 loops were derived from the set of pan-cell line loops (Supplementary Table 4) after filtering for interactions that had >4 PETs in at least one sample. We repeated the analysis using the entire set of pan-cell line loops at various FDR cutoffs and observed high correlation in PC1 and PC2 values (FDR < 10−5: rPC1 = 0.996, rPC2 = 0.995; FDR < 0.05: rPC1 = 0.983, rPC2 = 0.981). We also observed similar results when using different PET cutoffs to filter loops (>2 PETs: rPC1 = 0.999, rPC2 = 0.997; >10 PETs: rPC1 = 0.993, rPC2 = 0.985).

Testing for similarity in interaction profiles between similar cell types

For a pair of samples, we calculated the Spearman rank correlation between the raw PET counts across the set of pan-cell line loops identified (124,830 loops) for which there were at least four PETs in at least one sample (85,294 loops). For Fig. 2c, we plotted the distribution of correlation coefficients for the following groups: ‘all’ (all pairs of samples excluding replicates); ‘same germline layer’ (the assignment of individual cell lines to germline layers is provided in Supplementary Table 1; note that replicate pairs are included in this grouping); ‘same tissue’ (the assignment of individual cell lines to tissue is provided in Supplementary Table 1; note that replicate pairs are included in this grouping); ‘biological replicates’ (replicate samples); and ‘isogenic cell types’ (these include cell lines derived from a single male individual (MSLCL, MSFIB, and MSiPS); note that replicate pairs are included in this grouping).

Differences in the distribution of correlation coefficients were assessed using a two-sided Wilcoxon rank-sum test. P values were corrected for multiple hypothesis testing using the Bonferroni approach. We repeated the analysis including replicate pairs in the ‘all’ distribution and observed similar results (Pall vs isogenic cell types = 0.4, Pall vs biological replicates = 4.38 × 10−15, Pall vs same tissue = 2.52 × 10−38, Pall vs same germline layer = 1.23 × 10−25). The results were also robust to the particular PET threshold used (we examined thresholds of 1–20 PETs in at least one sample; Extended Fig. 2e). Finally, qualitatively similar results were observed when we used normalized PET interaction frequencies instead of raw PET counts (Pall vs isogenic cell types = 0.79, Pall vs biological replicates = 2.6 × 10−15, Pall vs same tissue = 9.2 × 10−27, Pall vs same germline layer = 3.2 × 10−9).

Assessing the effect of technical confounders on loop interaction frequency

For each ChIA-PET sample, we recorded the following potential confounding variables: batch (the set of samples which were processed at the same time and pooled together for sequencing); normalized strand cross-correlation coefficient (NSC; a metric of ChIP efficiency/quality27); number of peaks called; and number of uniquely mapped PETs between 10 kb and 5 Mb.

We tested for an association between principal components 1–10 (see ‘PCA’) and each covariate described above using a linear model (PC ~ technical_variable) and assessed significance using the ANOVA implementation in R. P values were corrected for multiple hypothesis testing using the Benjamini–Hochberg procedure. At an FDR <10%, we detected no significant associations. Thus, we chose not to correct for any of these technical confounders when testing for variable loops (see below).

Identifying variable loops

We began with the set of 124,830 merged loop calls and filtered loops to include only those that had ≥4 PETs in at least one sample yielding 85,294 loops. Next, we estimated the mean to variance relationship in the data using the voom method75 and used the inverse variance weights in the subsequent analysis. To assess loops that exhibited significant variability across cell types, while accounting for technical variables observed between replicates from the same cell type, we used a linear mixed effects model as previously described76. For each of the 85,294 loops, we modelled the log(normalized interaction frequency) as a function of the cell line (treated as a random effect) using the ‘lmer’ function from the lme4 R package. We then compared the mixed effects model to a simple linear model that lacked the random effect component; a P value was then calculated using a log-likelihood ratio test. P values were corrected for multiple hypothesis testing using the Benjamini–Hochberg procedure.

We tested two alternate approaches and found significant overlap with the approach described above.

Linear model

For each loop, we fitted a linear model [log(normalized interaction frequency) ~ cell type] and assessed its significance using the ANOVA implementation in R. P values were corrected for multiple hypothesis testing using the Benjamini–Hochberg procedure. At an FDR <10%, we found 21,353 loops; 20,926 of these were also found using the approach described above (98%; 2.34 fold-enriched compared to hypergeometric expectation)

Non-parametric approach

For each loop, we tested for differences in the normalized interaction frequency using a Kruskal–Wallis test. As a non-parametric approach is likely to be under-powered, we rank ordered all interactions according to P values and examined the overlap for the top 35,698 interactions (that is, the same number as found using the mixed effects linear model). A total of 23,117 overlapping hits were found (64% of the set found using the mixed effects linear model; 1.55 fold-enriched compared to hypergeometric expectation).

Defining a set of non-variable loops (static loops)

To compare various attributes of our differential loops, we defined two sets of invariant or static loops as follows.

Static (null set) 1

For each of the 85,294 loops we tested for differential interaction frequency, we computed an ad hoc metric as follows:

$$rmSrmcrmormrrme_rmsrmtrmarmtrmirmc=frac1rmrrmermlrmarmtrmirmvrme,rmermnrmtrmrrmormprmytimes rmmrmermarmn,rmPrmErmT,rmfrmrrmermqrmurmermnrmcrmy$$

in which relative entropy is defined as follows:

$$rmRelative,rmentropy=sum _jf_jlog _2fracf_jq_j$$

j sums across all samples (that is, cell lines) and fj represents the fractional PET count in sample j (that is, the ratio of the number of PETs in sample j divided by the total number of PETs for this particular loop). qj represents the fractional PET count under a null model assuming an equal number of PETs in each sample. In essence, a high static score would indicate a strongly interacting loop with uniform interaction frequencies across all cell lines. All loops were ranked in descending order by their static score and we selected the same number of high-scoring interactions as differential interactions identified (FDR <10%).

Static (null set) 2

From the set of 85,294 loops tested for differential activity, we selected a set of interactions found to not have differential activity (FDR >50%), but matched for the following properties to the set of differential interactions (FDR <10%): number of loops; distribution of loop sizes; and distribution of P values assigned by Mango (from the merged loop data set).

The last criterion helps to ensure that the static set of interactions is roughly comparable in quality to the differential interaction set.

Defining housekeeping and cell-type-specific genes

For all 22,197 genes, we computed a relative entropy score as defined in ‘Defining a set of nonvariable loops (static loops)’ above. We then removed genes with low expression (minimum expression across all samples had to be >1 TPM). Genes in the top and bottom 10% as ranked by the relative entropy score were designated as ‘cell-type-specific’ and ‘housekeeping’ genes, respectively. Finally, we assessed whether variable or non-variable loops were enriched for housekeeping genes or cell-type-specific genes as follows. For the set of variable or non-variable loops (both null set 1 and null set 2), we tabulated the number that contained or overlapped more than one housekeeping or cell-type-specific gene. Similarly, we tabulated the number of variable or non-variable loops that contained or overlapped no genes in either the housekeeping or cell type-specific set. Enrichment was assessed using a two-sided Fisher’s exact test.

Chromatin state analysis with cell-type-specific loop ends

Chromatin state calls using a 15-state model for 12 cell lines were obtained from the Roadmap Epigenomics Mapping Consortium39 (Supplementary Table 1). We merged chromatin states calls into eight categories as follows: (1) TSS: 1_TssA, 2_TssAFlnk; (2) BIV: 10_TssBiv, 11_BivFlnk; (3) TX: 3_TxFlnk, 4_Tx, 5_TxWk; (4) REPRESS: 13_ReprPC, 14_ReprPCWk; (5) REPEAT: 8_ZNF/Rpts; (6) ENH: 12_EnhBiv, 6_EnhG, 7_Enh; (7) HET: 9_Het; and (8) QUIES: 15_Quies.

 Next, for each cell line, we identified a set of loops that were present only in the cell line of interest (CellLinequery) and not in all other cell lines (CellLineothers) as follows: 1. Calculate a t-statistic based on the comparison of interaction frequencies (raw PET count) for all samples in CellLinequery and CellLineothers. 2. Rank order each vector of t-statistics in descending order. 3. Define the set of cell-type-specific loops as the top 10% of loops identified in Step 2.

To assess the enrichment of various chromatin states at cell-type specific loop ends, we generated a 2 × 2 contingency table populated with the following four values: 1. Number of loop-ends that participated in a cell-type-specific interaction that overlapped a chromatin element. 2. Number of loop-ends that participated in a cell-type-specific interaction that did not overlap a particular chromatin element. 3. Number of loop-ends that did not participate in a cell-type-specific interaction that overlapped a particular chromatin element. 4. Number of loop-ends that did not participate in a cell-type-specific interaction that did not overlap a particular chromatin element.

Significance was assessed using the Fisher’s exact test. P values were corrected for multiple hypothesis testing (12 cell lines × 8 chromatin states) using the Benjamini–Hochberg procedure. We repeated our analysis using different rank thresholds to define the set of cell-type specific interactions by repeating this analysis using different thresholds (5%, and 15%) and assessed the robustness of our results, by comparing the overlap in enriched/under-enriched chromatin states. At a 5% rank threshold cutoff, eight cell lines had perfect agreement (H1-hESC, NCI-H1437, H9-hESC, HepG2, K562, LX, MSiPS, MSFIB). Three agreed for 7/8 chromatin states (HPAEC, GM12878, NP) and one agreed for only 5/8 (Jurkat). At a 15% rank threshold cutoff: Eight cell lines had perfect agreement (HPAEC, NCI-H1437, H9-hESC, HepG2, K562, LX, MSIPS, NP). Four agreed for 7/8 chromatin states (MSFIB, Jurkat, HepG2, and H1-hESC).

In cases of disagreement, except for H1-hESC, the typical change in result was the BIVALENT state going from over-enriched to no enrichment. For H1-hESC, the REPEAT state went from under-enriched to no-enrichment. Nevertheless, the vast majority of results were similar across all thresholds.

To assess whether cell-type-specific loops were enriched for TSS–TSS, TSS–ENH, or ENH–ENH, we first identified cell-type-specific loops, genes, and enhancer peaks as described above. To have adequate numbers, we defined the set of cell-type specific genes as the top 20% of genes identified using the procedure above.

Next, we counted the number of cell-type-specific loops whose ends overlapped one of the three chromatin state combinations described above. Similarly, we counted the number of non-cell-type-specific loops whose ends overlapped one of the three chromatin state combinations described above. An enrichment test was then performed using Fisher’s exact test.

Testing for an association between gene expression level and number of linked enhancers

For each cell line, we identified a set of (i) cell-type specific loops (that is, high interaction frequency in cell line of interest and not in others), (ii) enhancers, and (iii) genes (that is, high normalized expression levels in cell line of interest and not in others) using the procedure outlined above (see ‘Chromatin state analysis with cell-type-specific loop ends’). Next, for each gene that was expressed in a single cell type of interest, we tabulated the number of cell-type-specific enhancers that were linked to its promoter. To generate Fig. 3f we aggregated results across all cell lines. To test for differences in the distribution of normalized expression levels between numbers of linked enhancers, we used the Wilcoxon rank-sum test. We repeated the analysis using different cutoffs to define cell-type specific loops, including 1% (P1 vs 2 = 0.008, P1 vs 3+ = 0.81, P2 vs 3+ = 0.37), and 15% (P1 vs 2 = 5.3 × 10−11, P1 vs 3+ = 2.5 × 10−13, P2 vs 3+ = 2.4 × 10−3). We also tested different cutoffs to define genes with cell-type-specific expression including 10% (P1 vs 2 = 1.4 × 10−5, P1 vs 3+ = 5.1 × 10−4, P2 vs 3+ = 0.47) and 17.5% (P1 vs 2 = 3.6 × 10−9, P1 vs 3+ = 2.7 × 10−7, P2 vs 3+ = 0.011). Lastly, we tested different cutoffs to define cell-type-specific enhancers including 5% (P1 vs 2 = 1.6 × 10−8, P1 vs 3+ = 3.1 × 10−6, P2 vs 3+ = 0.18) and 25% (P1 vs 2 = 1.6 × 10−19, P1 vs 3+ = 5.0 × 10−13, P2 vs 3+ = 0.049).

Loop architecture in disease-associated genes

We downloaded the lists of disease-associated genes from ClinVar47, the GWAS catalogue46 and haploinsufficient genes45. The set of housekeeping genes was defined as above (‘Defining housekeeping and cell-type-specific genes’). For each list of genes, we tested the association of the gene being part of the specific category (ClinVar, GWAS or haploinsufficient) and having at least X loops connected to its promoter where X was a number from 1 to 10. We repeated the same test, filtering the loops for only enhancer loops (with a H3K27ac signal at the other end), and cell-type-specific enhancer loops (a H3K27ac mark in a given cell type). P values were calculated using Fisher’s exact test and corrected for multiple testing using the Benjamini–Hochberg approach. A list of all enrichments and P values is provided in Supplementary Table 9.

Mapping genes to loops

To integrate gene expression and histone data, we generated a map of genes to loops as follows: ‘All’ (a gene was assigned to any loop within 1 kb of its start or end coordinates, as defined in Gencode version 25 lifted to hg19, or if the ORF overlapped partially with the loop); ‘Promoter’ (a gene was assigned to any loop for which its TSS was within 1 kb of either anchor region); ‘Contained’ (a gene was assigned to any loop it was entirely contained within (that is, start and end coordinates of the gene fell entirely within a loop) and its promoter was more than 1 kb from either anchor region); and ‘Promoter–enhancer’ (one loop end overlaps a promoter, the other end overlaps an H3K27ac peak).

Linking gene expression changes to changes in loop interaction frequency

For each loop, we correlated the normalized interaction frequencies across all cell types (Spearman rank correlation; n = 23 cell types with RNA-seq and ChIA-PET data) with the normalized gene expression levels across all cell types. If a loop mapped to multiple genes, we computed all possible loop–gene correlations. As a control, we shuffled the mapping between loops and genes, while maintaining the total number of genes mapped to a single loop, and re-examined the correlation between loop interaction frequency and gene expression values. This procedure was repeated 100 times and we recorded the mean correlation coefficient for each loop–gene pairing.

In Fig. 4c, we have restricted our analysis to the set of variable loops (FDR < 10%) and plotted the distribution of actual versus randomized correlation coefficients (absolute value) for all loop-gene pairs (n = 90,657). We compared the distribution of actual correlation coefficients to ‘null’ correlation coefficients using the Mann-Whitney U test (P < 2.2 × 10−16). We repeated the analysis using the set of all loops tested for variable interaction frequencies (n = 251,678 loop-gene pairs) and observed significant results (P = 2.2 × 10−16), albeit with a lower mean correlation (0.17 versus 0.19 for the set of variable loops).

To assess what effect the mapping between loop and gene might have, we compared the distribution of correlation coefficients (absolute value) for all loop-gene pairings for all four maps described above (All, Promoter, Promoter–enhancer and Contained). Significance was assessed using a two-sided t-test and P values were adjusted for multiple hypothesis testing using the Bonferroni approach. We performed three versions of this analysis: (i) using all loops tested for variability (n = 85,294) and all histone peaks (= 288,711) (PAll vs Contained = 6.5 × 10−212, PAll vs Promoter = 2.1 × 10−260, PAll vs Promoter-enhancer = 1.9 × 10−268, PPromoter vs Promoter-enhancer = 1.0), (ii) using all loops tested for variability and histone peaks with variable activity. Variability in H3K27ac was assessed using the procedure outlined in ‘Identifying variable loops’. We set a threshold of FDR < 1% to define the set of variable histone peaks (PAll vs Contained = 6.5 × 10−212, PAll vs Promoter = 2.1 × 10−260, PAll vs Promoter-enhancer = 0, PPromoter vs Promoter-enhancer = 4.9 × 10−20). (iii) using all variable loops (FDR < 10%) and all histone peak with variability activity (PAll vs Contained = 2.2 × 10−119, PAll vs Promoter = 1.9 × 10−141, PAll vs Promoter-enhancer = 3.4 × 10−13, PPromoter vs Promoter-enhancer = 2.7 × 10−26). Taken together, these analysis indicate a stronger link between loop interaction frequency and gene expression when the loop is making direct contact with the gene’s promoter or when linking and enhancer to the promoter. Subsetting either loops or enhancers based on variability does not appear to improve the results.

Finally, we analysed if there was an enrichment for positive loop-gene correlation coefficients for the four maps described above. We tabulated the number of positive and negative coefficients for actual and randomized loop-gene pairs and assessed significance using Fisher’s exact test.

Identifying group-specific loops

All analysis was performed on the set of loops tested for variability (n = 85,294). For each group (blood, embryonic, and solid-tissue-derived), we identified a set of loops that were present only in their member cell lines (Groupquery) and that did not differ between the other two groups (Groupother1, Groupother2) as follows: 1. Compute three sets of t-statistics based on the following three pairwise comparisons: interaction frequencies (normalized interaction frequency) for all cell lines in Groupquery versus Groupother1 (t1), interaction frequencies for all cell lines in Groupquery versus Groupother2 (t2), and interaction frequencies for all cell lines in Groupother1 versus Groupother2 (t3). 2. Rank order each vector of t-statistics in descending order. 3. Define three sets of loops (T1, T2, T3) such that their respective t-statistics are in the top 10% of t1, t2, and t3, respectively. 4. Define the final set of group-specific loops as ((T_1cap T_2)-T_3).

In this way, we specifically identified loops with a high interaction frequency in the group of interest compared to the other two groups and no difference between the other two groups.

Annotating different DUEs

We used bioconductor´s package DEXSeq77 to identify DUEs. In brief, we flattened the Gencode (release 25; lifted to GRCh37 coordinates) file with parameters ‘-r no’ and used a modified script to extract counts with subRead (parameters -f -O -s 2 -p -T 40) as described in the vignette78. We classified the RNA-seq libraries either according to the three clusters identified with the PCA as described above, or by cell line (n = 22). Next we normalized for library size and dispersion, tested for DUEs, and estimated the exon log2-fold changes between (a) solid vs blood and stem cell-like vs blood, or (b) by cell type vs the median exon abundance. In this way, we identified (a) 95,137 and (b) 39,832 DUEs (FDR = 10%).

Defining intragenic loops

As a way to identify intragenic loops that go from promoters to gene bodies, we followed the methods described previously51. Starting from the Gencode annotation (release 25; lifted to GRCh37 coordinates), we only kept protein-coding genes with at least one middle exon. We also removed all exons that overlapped previously defined CAGE peaks79. Based on visual inspection, we defined the promoter window as ±1 kb from the TSS and the upstream window as −5 kb from the 5′ exon boundary. We then identified intragenic loops as those loops for which one anchor fell in the promoter and the second in the upstream window of the same gene. In this way, we identified 1,372 loops within 1,074 genes. From this set, we identified exon–loop pairs (real pairs) by associating an exon with an anchor of an intragenic loop within 5 kb of their 5′ boundaries.

Correlation of exon and loop anchors

We kept unique exon–loop pairs and correlated the normalized counts of exon and anchor strength across the 22 cell lines. As a control, we permuted all exons 100 times, creating new exon–loop pairs. We also accounted for gene expression by correlating all other exons within the same ‘looping’ gene and removed any exons within 20 kb of the centre of the anchor (all pairs). Then we performed a Pearson correlation for all complete observations and depicted only the DUEs across the 22 cell lines. For the scatterplot, we used the three-group classification specified above and we tested for correlation between real pairs and all pairs of the DUEs.

TF enrichment analysis

We obtained the genomic coordinates for motif matches for 598 TFs from a previously published study53. For each TF, we tabulated the following four numbers: (i) the number of group-specific loop-ends overlapping a motif location, (ii) the number of group-specific loop-ends not overlapping a motif location, (iii) the number of non group-specific loop-ends overlapping a motif location, and (iv) the number of non group-specific loop-ends not overlapping a motif location. We assessed the significance of enrichment using a two-sided Fisher’s exact test. In cases in which any of values (1)–(4) were less than 5, we excluded this TF from further analysis. P values were corrected for multiple hypothesis testing using the Benjamini–Hochberg procedure. We repeated the analysis using different rank thresholds used to define the set of group-specific loops. Using a 5% threshold, we observed high correlation of fold-enrichment values (rBlood = 0.89, rEmbryonic = 0.88). Moreover, out of the 120 significant TF enrichments for the blood-specific loops (FDR < 0.1), 74 were significant at this new threshold (3.74 fold-enrichment, P = 5.5 × 10−41 via hypergeometric test). For the 89 significant TF enrichment (FDR < 0.1) from embryonic-specific loops, 39 were significant at this new threshold (5.6 fold-enrichment, P = 1.1 × 10−28 via hypergeometric test). Using a 20% threshold, we again observed high correlation of fold-enrichment values (rBlood = 0.83, rEmbryonic = 0.86). Moreover, out of the 120 significant TF enrichments (FDR < 0.1) for blood-specific loops, 95 were significant at this new threshold (2.72 fold-enrichment, P = 1.85 × 10−38 via hypergeometric test). For the 89 significant TF enrichments (FDR < 0.1) for embryonic-specific loops, 75 were significant at this new threshold (3.06 fold-enrichment, P = 1.5 × 10−34 via hypergeometric test).

Transcription factor footprinting in ATAC–seq data

ATAC–seq data were processed (Methods) for signal tracks. Motifs for each TF were intersected with the loop annotations and ATAC–seq data were averaged across all motif instances using a custom Python script. Averaged signal was compared between blood-specific, embryonic-specific, and all loops, and the relevant ratios were computed and plotted for a given TF.

GO biological process enrichment of group-specific loops

Using the procedure outlined in ‘Identifying group-specific loops’ above, we defined 3,384 blood-specific loops, 2,894 embryonic-specific loops, and 2,215 ‘misc’-specific loops. For each loop, we defined its ‘coordinates’ as the midpoint of loop end 1 to the midpoint of loop end 2. All three sets of loop-coordinates (blood, embryonic, and misc.) were examined for GO enrichment using the GREAT66 web tool with default options (version 3.0) (Supplementary Table 6).

GWAS analysis

To test for enrichment of GWAS variants in our peak sets, we used all GWAS data sets in the GRASP database61 (n = 178). The GWAS SNPs were pruned to contain no variants in linkage disequilibrium by keeping the most significant P value where there were multiple linked variants for the same trait. We only kept GWAS with at least 1,000 SNPs after pruning in the analysis for sufficient quality to calculate an enrichment (n = 86). The set of pruned SNPs was then expanded to all linked variants with European r2 ≥ 0.8 for all further analysis.

We performed a rank-based enrichment of GWAS variants in each set of group-specific loops. We segmented each GWAS study into bins that represented decreasing tiers of significance. We set a minimum bin size of 50 and filled the first bin with the 50 most significantly associated variants for each study. We then filled the next bins with 2 × 50, 4 × 50 and 8 × 50 variants and then segmented the remaining variants into bins at the four quartiles of the remaining P value distribution. We used the pruned set of SNPs to set the bin thresholds. We then computed the rank fold change enrichment of peaks across the segmented GWAS80. For each bin we computed the fraction of GWAS variants that were less than or equal to the bin’s P value threshold that overlapped the loop regions. We calculated the fold change enrichment by dividing this fraction by the fraction of all GWAS variants of any significance level that overlapped our regions. Baseline enrichment is 1, which indicates no change from the base rate of overlap of all significant and non-significant variants in the study. An enrichment less than 1 means the most significant variants are depleted relative to the baseline and any value greater than 1 indicates that significant variants are enriched. To compute the significance of these enrichments, we permuted the P value associated with each GWAS SNP in the study 200 times and re-computed the enrichment relative to baseline. The empirical P value indicates the number of permuted studies for which the true study has a greater enrichment for the most significant bin of GWAS hits.

To compare the enrichment of each given GWAS study between sets of regions, we computed the total number of pruned genome-wide significant (P < 10 × 10−8) SNPs that overlapped each set of peaks and the total number that did not. An overlap was counted if any SNP in LD with the pruned SNP overlapped the regions of interest. This is important as we do not know which is the causal SNP. We then used Fisher’s exact test to statistically compare the rate of overlap between the two studies and to determine whether a set of regions was statistically enriched relative to another (Supplementary Table 7).

LD score regression

Partitioned LD score regression (LDSC) is a method to determine whether there is an enrichment of GWAS effect sizes in a given portion of the genome62. We used LDSC to test whether our loop anchors, called loops, and DNase peaks within called loops that changed between cell types were associated with GWAS signal of complex traits. Using publicly available summary statistics of GWAS for complex traits63, we ran LDSC with the standard 1000G Phase III derived LD scores and weights, correcting for the baseline annotations (which contain the union of H3K27ac marked regions in the genome, H3K4me3 marked regions, and so on62 and the full set of Rad21-bound looped regions genome-wide. Regression coefficients were estimated using the overlap-annot option to partition effects across overlapping regions62 and with frequency files derived from 1000G Phase III Europeans and filtered for SNPs with minor allele counts of at least five. The following command was used: ldsc.py–h2< input summary statistics>–ref-ld-chr <1000G_EUR_Phase3_baseline>,<tested anchor regions>,<all rad21 peaks>–w-ld-chr < weights_hm3_no_hla>–overlap-annot–out < output estimates>–frqfile-chr <1000G.mac5eur>. Results were parsed for the enrichment of the tested anchor region and the reported statistics are taken directly from the command output.

Correction for super-enhancers and cell type effects in LDSC

Super-enhancers are associated with increased chromatin looping and also with GWAS enrichment, so we wanted to test whether our signal was due to a super-enhancer signal. As such, we excluded called super-enhancers from any cell type from the tested anchor and loop annotations and re-ran the enrichment. In addition, after filtering out anchors from any loops that overlapped with super-enhancers we still see enrichment for the same traits (Extended Data Fig. 7, Supplementary Table 8). To assess whether the signal we observed might be just attributable to active chromatin in the cell types of interest, we added in all ten cell-type group annotations as covariates to the regression, along with the Roadmap control signal for per-mark accounting as previously described67 (Extended Data Fig. 7, Supplementary Table 8). The resulting regression was: ldsc.py–h2 input.path–ref-ld-chr <1000G_EUR_Phase3_baseline>,<tested anchor regions>,<all rad21 peaks>,<roadmap control>,<cell_type_group 1>,<cell_type_group 2>,<cell_type_group 3>,<cell_type_group 4>,<cell_type_group 5>,<cell_type_group 6>,<cell_type_group 7>,<cell_type_group 8>,<cell_type_group 9>,<cell_type_group 10>–w-ld-chr < weights_hm3_no_hla>–overlap-annot–out < output estimates>–frqfile-chr <1000G.mac5eur>.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.