STARR‐seq and UMI‐STARR‐seq: Assessing Enhancer Activities for Genome‐Wide‐, High‐, and Low‐Complexity Candidate Libraries

Abstract The identification of transcriptional enhancers and the quantitative assessment of enhancer activities is essential to understanding how regulatory information for gene expression is encoded in animal and human genomes. Further, it is key to understanding how sequence variants affect enhancer function. STARR‐seq enables the direct and quantitative assessment of enhancer activity for millions of candidate sequences of arbitrary length and origin in parallel, allowing the screening of entire genomes and the establishment of genome‐wide enhancer activity maps. In STARR‐seq, the candidate sequences are cloned downstream of the core promoter into a reporter gene's transcription unit (i.e., the 3′ UTR). Candidates that function as active enhancers lead to the transcription of reporter mRNAs that harbor the candidates’ sequences. This direct coupling of enhancer sequence and enhancer activity in cis enables the straightforward and efficient cloning of complex candidate libraries and the assessment of enhancer activities of millions of candidates in parallel by quantifying the reporter mRNAs by deep sequencing. This article describes how to create focused and genome‐wide human STARR‐seq libraries and how to perform STARR‐seq screens in mammalian cells, and also describes a novel STARR‐seq variant (UMI‐STARR‐seq) that allows the accurate counting of reporter mRNAs for STARR‐seq libraries of low complexity. © 2019 The Authors. Basic Protocol 1: STARR‐seq plasmid library cloning Basic Protocol 2: Mammalian STARR‐seq screening protocol Alternate Protocol: UMI‐STARR‐seq screening protocol—unique molecular identifier integration Support Protocol: Transfection of human cells using the MaxCyte STX scalable transfection system

To identify active enhancers and quantify their activity, a functional readout for enhancer activity is required (Catarino & Stark, 2018). Because enhancers retain their activity outside of their endogenous sequence context (Catarino & Stark 2018;Shlyueva, Stampfel, et al., 2014), the gold standard to assess enhancer activities has been reporter gene assays (e.g., luciferase assays). These assays directly test the ability of a candidate sequence to drive reporter gene transcription, which is quantified by the abundance of the resulting reporter proteins (e.g., luciferase via chemiluminescence). However, these classical reporter-gene assays are limited in throughput, as candidates have to be tested one-byone. To overcome this limitation, a variety of massively parallel reporter assays (MPRAs) have been developed over the past years that couple a candidate sequence to unique DNA sequences that serve as molecular barcodes. This allows the investigator to directly read out reporter transcript abundance by deep sequencing in experiments that assess many candidates in parallel (Inoue & Ahituv, 2015;Santiago-Algarra, Dao, Pradel, España, & Spicuglia, 2017).
To identify enhancers and quantitatively measure their strength on a genome-wide scale, we developed STARR-seq in Drosophila melanogaster cells and initially demonstrated its applicability to focused libraries in human cells (Arnold et al., 2013). In STARR-seq, a comprehensive library with candidate DNA fragments of arbitrary origin and length is cloned into the 3 UTR of a reporter gene, rendering the candidate sequence part of the reporter-gene transcription unit. Consequently, if a candidate exhibits enhancer activity, it will activate reporter gene transcription, thereby also transcribing itself as part of the reporter transcript. The abundance of the resulting reporter mRNAs directly reports on each candidate's enhancer activity, and is quantified by deep sequencing. In STARR-seq, each candidate therefore serves as its own barcode, and this direct coupling of candidate sequence and activity allows the parallel screening of highly complex candidate libraries (Arnold et al., 2013;Fig. 1, left). Shlyueva, Brunner, Stark, & Basler, 2017;Shlyueva, Stelzer, et al., 2014) in Drosophila melanogaster cell lines. Furthermore, it has led to the identification of enhancer-core promoter specificity as the basis of two fundamentally different transcriptional regulatory programs (Zabidi et al., 2015). It has also been successfully used by our laboratory and others for the parallel assessment of medium-size and large candidate pools in human cell lines (Johnson et al., 2018;Vanhille et al., 2015;Wang et al., 2018), revealing insights into stem cell enhancers (Barakat et al., 2018), acquisition of resistance to treatment in cancer cells due to transcriptional network shifts (Rathert et al., 2015), active p53-target enhancers (Verfaillie et al., 2016), and glucocorticoid receptor dependent enhancers (Vockley et al., 2016).
We recently adapted the STARR-seq protocol for genome-wide screening in human cell lines (Muerdter et al., 2018), which required a solution to two problems that apply to all episomal reporter assays in mammalian cells involving plasmid transfection: the activation of a type I interferon response upon plasmid transfection (Chen, Sun, & Chen, 2016;Muerdter et al., 2018;Nejepinska, Malik, Wagner, & Svoboda, 2014) and the aberrant transcription initiation from the bacterial origin of replication (ORI) on reporter plasmids (Lemp, Hiraoka, Kasahara, & Logg, 2012;Muerdter et al., 2018). The new/adapted protocol allows genome-wide enhancer-activity screens in human cells and has enabled the identification of genome-wide sets of constitutively active and interferoninducible enhancers in HeLa cells (Muerdter et al., 2018).
Here, we provide a step-by-step procedure to perform STARR-seq in human cell lines, including the cloning of medium-size (or "focused") libraries derived from bacterial artificial chromosomes (BACs) and highly complex genome-wide libraries (Basic Protocol 1), as well as transfection of human cells, RNA processing, and reporter transcript amplification for deep sequencing (Basic Protocol 2; Fig. 1, left). Additionally, we provide advice on critical steps and present a novel protocol variant, UMI-STARR-seq (Alternate Protocol), which uses unique molecular identifiers (UMIs) to enable the quantification of reporter transcripts for candidate libraries of low complexity (e.g., pools of individual candidates or synthesized DNA oligo pools; Fig. 1, right).

Considerations when Establishing STARR-seq
To establish STARR-seq, we recommend using a focused candidate library from BAC DNA. The selected BACs need to include known enhancers that are active in the cell line of interest to serve as positive controls. Their detection by STARR-seq indicates successful performance. Due to their lower complexity, focused libraries also make it possible to perform STARR-seq on a much smaller scale (at least 10-fold), using much fewer cells compared to genome-wide libraries. This facilitates the establishment and optimization of the STARR-seq protocol for the cell line of interest. We advise investigators to always perform a focused STARR-seq screen prior to scaling up to genome-wide STARR-seq screens.

Choice of STARR-seq Screening Plasmid
For human STARR-seq, we recommend using the mammalian STARR-seq plasmid (hSTARR-seq_ORI vector; Addgene #99296) from Muerdter et al. (2018). This plasmid uses the bacterial ORI as core promoter [see Background Information and Muerdter et al. (2018) for details]. Compared to the first-generation human STARR-seq plasmid (SCP1 as core promoter; pSTARR-seq_human; Addgene #71509), this plasmid provides the strongly improved signal-to-noise levels required for genome-wide screens.
Neumayr et al.

IMPORTANT NOTE:
The human and the fly STARR-seq plasmids contain different introns and therefore require different primers for the junction PCR (Basic Protocol 2, steps 127 to 130). These primers span the exon-exon-splice junction and do not align to the STARR-seq plasmid sequence.

STARR-seq Plasmid Library
The candidate fragments in STARR-seq libraries can be of arbitrary length, limited only by cloning efficiency. We frequently use lengths from 150 to 1500 bp and recommend 1200 to 1500 bp for human STARR-seq. To avoid library distortion during the PCR amplification steps and deep sequencing, we recommend keeping the length range of the candidates within an individual STARR-seq library rather narrow (within approximately ß300 bp). The candidate sequences can be obtained from arbitrary sources of DNA (Arnold et al., 2013), including BAC and genomic DNA, DNA fragments enriched for regions of interest (e.g., via fragment capture, ChIP or similar), and synthesized DNA oligo pools. When performing STARR-seq with libraries of very low complexity (e.g., individual candidates, synthesized oligo pools, or pools of pre-selected or pre-enriched candidates), we recommend using the UMI-STARR-seq screening protocol (Alternate Protocol), as low-complexity libraries are more susceptible to PCR amplification biases and other distortions. To prevent such biases and distortions, the UMI-STARR-seq protocol employs a unique-molecular-identifier (UMI), which allows the counting of individual reporter transcript molecules, enabling their precise quantification.

Choice of Cell Line and Transfection Method
If possible, use a cell line that is highly transfectable and easy to culture (both adherent and suspension cell lines are suitable for STARR-seq). The transfection efficiency may vary depending on cell line, transfection method, and protocol. Before performing STARR-seq, establish a transfection protocol for the cell line of interest, which yields high transfection efficiency (recommended >60%) and cell viability. Determine the transfection efficiency by transfecting a control plasmid that expresses GFP (or any other fluorescent protein) under the control of a strong promoter (e.g., CMV) and quantifying the fraction of GFP-positive cells by FACS analysis-higher transfection efficiency usually results in stronger STARR-seq signals. We recommend electroporation to achieve high transfection efficiencies, and regularly perform STARR-seq using plasmid library electroporation in HeLa S3 (use INF inhibitors; see below) and HCT116 cells. Chemical transfection (Barakat et al., 2018;Vockley et al., 2016) as well as lentiviral or adenoassociated virus (AAV) transfection (infection) should also be applicable (Inoue et al., 2017;Maricque, Dougherty, & Cohen, 2017;Nguyen et al., 2016;Shen et al., 2016) with appropriately adjusted STARR-seq plasmids.
Note that low transfection efficiency or cell viability may limit the performance of STARR-seq. For cell lines that are difficult to transfect by any method, an increase in cell number used for STARR-seq could compensate for the low transfection efficiency.

Interferon Response in Mammalian Cells Post Plasmid Transfection
The transfection of plasmid DNA triggers a type I interferon (INF-I) response in many mammalian and human cell lines via the cGAS/STING pathway (Muerdter et al., 2018;Paludan & Bowie, 2013), making it necessary to evaluate the cell line of interest prior to performing STARR-seq [see Muerdter et al. (2018)]. An active interferon response would Neumayr et al.

of 47
Current Protocols in Molecular Biology lead to false-positive and false-negative enhancer activities (see Background Information for details). To prevent the mounting of an interferon response, inhibitors for the key kinases of the cGAS/STING pathway (PKR and TBK1; see Materials) should be used.
To get more detailed background information regarding the use of the ORI as core promoter, different cell lines eliciting an type-I interferon response, or alternative STARRseq strategies, see Muerdter et al. (2018).

STARR-seq PLASMID LIBRARY CLONING
Due to the unique location of the candidate fragments within the reporter gene transcription unit of the STARR-seq screening plasmid, each candidate serves as its own barcode, which enables the straightforward and simple cloning of candidate libraries. The transcription unit consists of a core promoter followed by an intron and a reporter gene (ORF, a truncated form of GFP). The candidate fragments are cloned between the ORF and the poly(A) site, i.e., into the 3 UTR of the reporter gene ( Fig. 1 top). For human STARRseq, we recommend using the second-generation human STARR-seq screening plasmid from Muerdter et al. (2018) (hSTARR-seq_ORI vector; Addgene #99296). Compared to the first-generation human STARR-seq screening plasmid (SCP1 as core-promoter; pSTARR-seq_human; Addgene #71509), this plasmid exhibits strongly improved signalto-noise levels, as it uses the bacterial ORI as core promoter [see Background Information and Muerdter et al. (2018) for details].
To enable highly efficient library cloning and avoid the cleavage of candidates, we clone the STARR-seq plasmid library by recombination (Gibson assembly is also possible). This requires that the candidate fragments be flanked by constant sequences that match to the plasmid backbone sequences around the insertion site. To ease cloning and the final deep sequencing on the Illumina platforms, we first ligate the Illumina adapters (here we use the NEBNext hairpin-adapters) to randomly sheared and size-selected BAC or genomic DNA or pre-enriched candidate pools. As the Illumina adapter is a Y-shaped adapter (double-stranded on one end), candidates will be ligated in both orientations and will, in the resulting library, be present in both orientations (roughly equimolar). In principle, any source of DNA is applicable (Arnold et al., 2013). When using synthesized DNA oligo pools, we recommend including the adapter sequences into the DNA oligos. This allows the PCR amplification of the DNA oligo pool and the directional cloning of all candidates (note that the adapter sequences have to be included directionally, i.e., the P5 adapter sequence at the 5 end and the reverse-complement P7 adapter sequence at the 3 end).
Also, the length of the candidate fragments can be of a wide range, essentially limited only by cloning efficiencies. We have used fragments between 150 and 1500 bp, and we recommend 1200 to 1500 bp for human STARR-seq. Note, however, that an individual library should only contain inserts of a limited size range of 100 to 300 bp around the desired fragment length, as the various amplification and deep sequencing steps can distort the length distribution. Note that the source of DNA, i.e., the input material for the library insert, is the major determinant of library complexity.
The adapter-ligated DNA is then PCR amplified using primers that bind to the Illumina adapters (constant part of the insert) and add homology arms for directional cloning into the STARR-seq screening plasmid. Importantly, the candidate fragments cloned via Y-linker adapters will become directional at this PCR step, such that candidates cloned in either direction can be unambiguously identified by deep sequencing. Next, the library insert is cloned by recombination into the STARR-seq screening plasmid using the homology arms added to the Illumina adapters during library insert amplification. As Neumayr et al.

of 47
Current Protocols in Molecular Biology the cloning efficiency strongly impacts library complexity, it is necessary to ensure the highest possible efficiency.
Third, the library cloning reactions are column purified and transformed into competent bacteria, which are subsequently grown in liquid cultures for the large-scale amplification of the STARR-seq plasmid library. A high transformation efficiency is critical for the complexity and quality of the library (especially for genome-wide libraries). Therefore, transformation should be performed by electroporation using electrocompetent bacteria with the highest efficiency.
This protocol describes how to clone a medium-sized focused library from BAC-derived DNA fragments and a highly complex genome-wide library.

Sonication and size-selection of human BAC and genomic DNA
12. Sonicate DNA to a target size of 1200 to 1500 bp.
Sonicate 10 µg BAC DNA or 50 µg genomic DNA in total (you will recover ß10% DNA after sonication and size selection).
Prepare two samples of sonicated DNA for a focused library or ten samples for a genome-wide library.
14. Collect the sonicated DNA on ice.
15. Pool all sonicated DNA samples and add the corresponding volume of 6× DNA loading dye.
16. Load 12 µl of a 1-kb DNA ladder in both outermost wells of a 1% agarose gel (well size: 60 µl). See Current Protocols article Voytas (2000) for essential agarose gel electrophoresis protocols. 27. Measure the concentration of the sonicated, size-selected and purified DNA pool (see Current Protocols article: Gallagher & Desjardins, 2006).

Generation of focused and genome-wide library inserts (Illumina adapter ligation)
28. Perform adaptor ligation using the NEBNext Ultra TM II DNA Library Prep Kit for Illumina. 49. Analyze the test PCR by gel electrophoresis (Voytas, 2000).

Purification and size-selection of the amplified library insert with AMPureXP beads
52. Pool four PCR reactions in a 1.5-ml DNA LoBind tube for a focused library or pool ten PCR reactions per tube for a genome-wide library.
Clean up 3 × 10 PCR reactions for genome-wide libraries.
Do not clean up more than 10 PCR reaction in one tube.
Neumayr et al.
57. Re-apply the eluate to the column and elute again.
For genome-wide library inserts, pool the elutions from all three columns.

Gel purification of the digested STARR-seq screening plasmid
62. Add 10 µl of 6× DNA loading dye to 50 µl restriction digest reaction.

Pool all purified library cloning reactions
This only applies to a genome-wide library (5 × 12.5 µl).

Transformation of electrocompetent MegaX DH10B bacteria (Invitrogen)
NOTE: Transform bacteria in the afternoon, grow them overnight, and harvest the next morning. Avoid growing them for more than 14 to 16 hr.
Grow 12 to 24 L for a genome-wide library and 4 L for a focused library. The scale of the liquid culture depends on the demand of the library. We did not experience any influence of the scale of the liquid culture on transformation efficiency. 91. Immediately add 1 ml of pre-warmed (37°C) recovery medium to the cuvette and transfer resuspended bacteria to a 14-ml polypropylene round-bottom tube.
The recovery medium comes with the MegaX DH10B TM T1R electrocompetent bacteria.
It is critical to immediately add the pre-warmed medium. Efficiency drops on the order of seconds.
Recovering the transformed bacteria in 14-ml round bottom tubes increases efficiency, compared to recovery in 1.5 ml tubes.
94. Prepare a dilution series of the pooled bacterial culture and the transformation control (pUC19). Determine the transformation efficiency of the MegaX DH10B bacteria using the colony counts of pUC19 (also see manufacturer's instructions). Efficiency should be >3 × 10 10 .
More than 50 colonies should be obtained from the 1:5000 dilution of the library transformed bacteria.
96. Add equal volumes of the pooled bacteria culture from step 93 to 2 L pre-warmed (37°C) LB medium containing 100 µg/ml ampicillin in 5-L Erlenmeyer flasks.
OD 600 should be between 2 and 2.6 when harvesting the bacteria culture.
Neumayr et al.

of 47
Current Protocols in Molecular Biology 99. Decant the supernatant, leaving behind ß10 ml medium for resuspension.
100. Resuspend the bacteria pellets in the residual medium by vortexing and pipetting in the centrifugation bottles.
101. Pool all resuspended bacteria in one centrifugation bottle.
102. Rinse centrifugation bottles subsequently with 10 to 15 ml LB medium to collect remaining bacteria.
103. Add to bacterial pool and resuspend bacteria by vortexing until homogenous.
Determine the tare weight before and note on the tube.
107. Determine the weight of the bacteria pellets in the 50-ml conical tubes, taking the tare weight into account.

Deep sequencing of the STARR-seq plasmid library
The quality and complexity of the STARR-seq plasmid library needs to be assessed by deep sequencing.

NOTE:
The STARR-seq plasmid library is used as input for STARR-seq analysis (see Statistical Analyses).

MAMMALIAN STARR-seq SCREENING PROTOCOL
STARR-seq is a plasmid-based enhancer-activity assay that allows the identification of enhancers on a genome-wide scale and the quantitative assessment of the enhancers' Neumayr et al.

of 47
activities by deep sequencing of reporter transcripts. As for all ectopic assays based on reporter transcript quantification, the STARR-seq candidate library has to be transfected into the cells of interest, and the reporter transcripts have to be harvested, processed, and sequenced. This protocol describes these steps in human cells for focused and genomewide candidate libraries using the hSTARR-seq_ORI vector (Muerdter et al., 2018). Note that the ORI is used as core promoter; see Background information for details. Library cloning is covered in Basic Protocol 1 (also see Fig. 1, top).
The first step in STARR-seq is the transfection of the STARR-seq plasmid library (from Basic Protocol 1) into the cells of interest. The transfected cells are then incubated to allow the transcription of the reporter mRNAs. For human cells, a 6-hr incubation resulted in the strongest STARR-seq signal over background (incubation time may vary depending on cell line and transfection method used). After harvesting the cells, total RNA is isolated and the reporter transcripts are enriched as part of the cellular mRNA by oligo(dT)based mRNA isolation, followed by Turbo DNase digestion to remove residual plasmids. cDNA synthesis is performed using a reporter-transcript-specific reverse-transcription (RT) primer, and the reporter cDNAs are then selectively amplified by a nested twostep PCR amplification strategy. To ensure that only reporter cDNA and not residual library plasmids are amplified, we use a primer for the first PCR step (junction PCR) that specifically binds across the splice junction of the reporter transcript. The second PCR step specifically amplifies the candidate sequences flanked by Illumina adapters (Fig. 1 gene specific RT primer (GSP): CTCATCAATGTATCTTATCATGTCTG junction forward (fw) primer: TCGTGAGGCACTGGGCAG*G*T*G*T*C junction reverse (rv) primer: CTTATCATGTCTGCTCGA*A*G*C * = phosphorothioate bond (protection of primer from 3 to 5 exonuclease activity of proof-reading DNA polymerase; especially important for junction fw primer that specifically binds across the splice junction of the reporter cDNA/transcript) Illumina-compatible i5 and i7 index primers SuperScript R III reverse transcriptase (supplied with 5× first-strand buffer and 0.

Transfection of human cells with the STARR-seq plasmid library
Establishing transfection conditions for human cells 1. Select a suitable transfection method for the cell line of interest that yields high transfection efficiencies and makes it possible to scale to the required throughput.
We recommend using electroporation for transfection due to scalability and efficiency (see Support Protocol). Other transient transfection methods like chemical transfection have been successfully used for STARR-seq (Barakat et al., 2018;Vockley et al., 2016). (Inoue et al., 2017;Maricque et al., 2017;Nguyen et al., 2016;Shen et al., 2016); however, not for STARR-seq. We anticipate that viral infection should be compatible with STARR-seq as well. This would require the construction of a virus-compatible STARR-seq plasmid.

Test and optimize the transfection efficiency of the cell line of interest by trans-
fecting control plasmids that strongly express a fluorescence protein (e.g., GFP).
We recommend using pIRES-EGFP .
The transfection efficiency should be as high as possible, ideally >60% for genome-wide screens to account for the high library complexity and to ensure high-quality screens (high library coverage).

Determine if the cell line of interest is generating a type-I interferon (INF) response upon plasmid transfection.
Consult the literature (e.g., Muerdter et al., 2018;supplementary Fig. 2) or test the upregulation of INF-responsive genes such as IFIT2, MX1, OAS3, or ISG15 after plasmid transfection (24 and 48 hr post transfection) using qPCR (see Muerdter et al., 2018).

Transfection of human cells
We provide two variants of the transfection protocol: (1) a general protocol for transfection of human cells with the STARR-seq plasmid library (see below) and (2) an optimized transfection protocol for human cells using the electroporation device "MaxCyte STX scalable transfection system" (Support Protocol).

Use 4 × 10 8 cells per replicate for a genome-wide screen (five to six square plates) or 8 × 10 7 cells for a focused screen (one to two square plates). Note that these cell numbers were determined in HCT116 and HeLa S3 cells using electroporation (transfection efficiency >80%, viability >90%). The number of cells may vary, depending on transfection method, cell line, transfection efficiency, cell viability, and STARR-seq plasmid library complexity. The number of cells needs to be determined prior to transfection of the STARR-seq plasmid library for every cell line.
Perform at least two independent replicates per screen. To monitor the transfection efficiency for each STARR-seq screen, always include a transfection control (e.g., transfect a small number of cells using a GFP expressing plasmid, e.g., pIRES-EGFP). Measure efficiency by FACS (Robinson et al., 2019).

Harvest cells after 24 hr for transfection.
6. Remove the growth medium completely.
Neumayr et al.

of 47
Current Protocols in Molecular Biology 8. Add 8 ml of 1× trypsin per square plate (cells need to be covered completely) and incubate at 37°C for ß5 min.
Steps 8 and 9 are only needed for adherent cells.
9. Add 12 ml growth medium to stop the action of trypsin, resuspending the cells by pipetting up and down thoroughly.

Pool cells from all square plates in a T-175 or a T-225 flask, depending on total volume, and mix thoroughly.
11. Count the pooled cells using an automated cell counter.

Transfer the required number of cells to 50-ml conical tubes.
Use cell count from step 11 to calculate the volume of cell suspension needed.
14. Remove medium and resuspend each cell pellet in 5 ml transfection buffer, then pool all resuspended pellets in a 50 ml conical tube.

The transfection buffer depends on transfection method/protocol.
15. Spin down cells 5 min at 125 × g, room temperature.

Resuspend cells in appropriate transfection buffer.
The transfection buffer as well as the volume depend on transfection method/protocol.

Transfect cells with the STARR-seq plasmid library using the previously determined transfection method according to manufacturer's instructions and previously determined conditions.
Transfect 20 µg STARR-seq plasmid library per 1 × 10 7 cells.

For cells with an interferon (INF) response, add C16 and BX-795 inhibitors to the transfected cells at a final concentration of 1 µM per inhibitor (add 40 µl of 1 mM stock solution to 40 ml growth medium).
19. Incubate at 37°C for 6 hr.

Using electroporation, we found 6 hr to be ideal for STARR-seq in several human cell types (determined by the signal over background ratio). The optimal incubation time of STARR-seq may depend on the transfection method and cell line, and should be determined individually.
If using lipofection, the incubation time should be extended to 12 to 24 hr. The optimal time frame needs to be determined experimentally.

Harvesting cells transfected with STARR-seq plasmid library
20. Harvest cells to ensure cell lysis 6 hr post transfection when using electroporation.
If using adherent cells: 30% to 40% of the cells are not yet attached after 5 to 6 hr post transfection. Therefore, non-attached cells from the growth medium also need to be harvested.
Neumayr et al.

Current Protocols in Molecular Biology
If performing a genome-wide screen, start harvesting of cells after ß5 hr to ensure cell lysis after 6 hr.
21. Transfer 50 ml growth medium (from one square plate) to two 50-ml conical tubes.

The growth medium contains transfected cells that have not attached to the culture flask yet.
22. Wash the cells with 10 ml of 1 × PBS.
23. Transfer the PBS wash to the 50-ml Falcon tubes from step 21, for maximum recovery.
24. Add 8 ml 1× trypsin per square plate and incubate at 37°C for 4 to 5 min.
25. During trypsin incubation, spin down growth medium from step 21 for 5 min at 125 to 300 × g, room temperature, and aspirate supernatant.
26. Add 12 ml serum-containing growth medium per square plate to stop the action of trypsin, resuspending the cells by pipetting up and down thoroughly.
27. Combine this cell suspension with the cell pellet from step 21.
28. Spin down the cells 5 min at 125 to 300 × g, room temperature.

Remove PBS by aspiration.
Leave 200 to 400 µl of 1× PBS (ß1 to 2 mm) covering the cell pellet in the 50-ml conical tube.

Total RNA isolation
Perform total RNA isolation using the Qiagen RNeasy Maxi Kit. Read the manufacturer's instructions before use. Clean and wipe fume hood and all equipment with RNase Zap (or similar) to ensure an RNase-free working environment.
Prepare the following reagents freshly before starting (the indicated volumes refer to one sample). Calculate the total volume of the reagents by multiplying the total number of samples plus one (e.g., 4 samples +1 = 5) with the reagent volume per sample.
RLT buffer (15 ml 32. Resuspend the cell pellet in the 50-ml conical tube by vortexing at medium speed.

This is important to ensure complete and efficient disruption/lysis of cells.
Neumayr et al.

of 47
Current Protocols in Molecular Biology 33. Lyse the cells by adding 15 ml RLT buffer (+β-ME) to the resuspended cell pellet.
First add ß5 ml RLT (+β-ME) dropwise while vortexing the cells, then add the remaining RLT (+β-ME), close the tube, and vortex until the solution is homogeneous (5 to 10 s).

Homogenize the cells using the Qiagen TissueRuptor with a disposable probe at full speed.
Homogenize each cell pellet/sample for 4.5 min while constantly moving the disposable probe within the tube. Note that shorter homogenization times lead to reduced RNA yield.
Repeat step 34 for all samples. To avoid cross-contamination of samples, use a fresh disposable probe for each sample (meaning a different STARR-seq plasmid library or cell line). 37. Centrifuge 5 min at >3200 × g, 25°C, and discard flow-through.
38. Repeat steps 36 and 37 with the remaining 15 ml lysate and discard flow-through.
40. Add 10 ml RPE buffer prepared as described in the text above step 32, centrifuge 2 min at >3200 × g, 25°C, and discard the flow-through.
41. Add 10 ml RPE buffer prepared as above, then centrifuge for 10 min at >3200 × g, 25°C The extended centrifugation time (10 min

of 47
Current Protocols in Molecular Biology 46. Pool elution fractions to obtain a final concentration of ࣘ750 ng/µl and a volume matching full or half milliliters (to facilitate handling during mRNA isolation in subsequent steps).

ng/µl is the maximum concentration you can use in the next step due to the binding capacity of the Oligo-dT 25 beads.
If processing more than one sample in parallel, adjust all samples to the same volume, to facilitate handling during mRNA isolation.
Samples can be stored at −80°C; save 5 µl for gel analysis to determine RNA integrity.
The maximum volume of the sample for 5-ml tubes is 2 ml, for 15-ml tubes, 5 ml.

Optional: Spike-in control for library normalization
Spike-in controls are highly recommended if different conditions are examined using STARR-seq or global changes in enhancer activities are expected (e.g., the induction or inactivation of many or all enhancers). Spike-in controls are used to normalize different STARR-seq screens to perform comparative analyses.
Total RNA from a previous screen of individual enhancers or a focused library can be used for spike-in controls-the candidates for the spike-in library must be different from the candidates for the current screen (e.g., use a different BAC region or DNA from a different closely related species). This ensures that the STARR-seq and spike-in transcripts are processed simultaneously throughout all steps of the protocol. We prepare spike-in controls beforehand and keep aliquots of total RNA. "Spike-in (total) RNA" is added to the total RNA of the STARR-seq screens.
We recommend two variants of spike-in controls: a. Pool of individual enhancers (steps 47 to 55). b. Genomic regions covered by BACs containing multiple enhancers (steps 56 to 62).

Pool of individual enhancers
47. Identify putative enhancers from closely related species that can be mapped unambiguously to only the respective reference genome to avoid cross mapping between STARR-seq and spike-in reads.

Current Protocols in Molecular Biology
Avoid repeated freeze-thaw cycles.
Store spike-in total RNA at −80°C.
54. Add an appropriate amount (ratio) of spike-in to focused or genome-wide STARRseq screens.
The ratio of spike-in to STARR-seq total RNA depends on the size, i.e., complexity of the STARR-seq plasmid library (ratios successfully tested: focused library: 1:10,000 or genome-wide: 1:1000).
55. Add spike-in total RNA to STARR-seq total RNA after its concentration has been adjusted prior to mRNA isolation (step 46).
The ratio of spike-in to STARR-seq total RNA needs to be experimentally determined by testing different ratios (e.g., 1:100, 1:1000, 1:10,000)  61. Process a fraction of the spike-in control screen to confirm successful performance of STARR-seq using the spike-in control library. Make sure that the control enhancers are detected after deep sequencing.
62. Test the ratio of spike-in to STARR-seq total RNA (see step 55).
Prepare before starting: 2× binding buffer (quantity 2.5× the starting volume of beads) Washing buffer (quantity 2× the starting volume of beads) Storage buffer (quantity 5× the starting volume of beads) Reconditioning buffer (quantity 3× the starting volume of beads).
All buffers need to be at room temperature before use.

Prepare RNA for binding to Dynabeads Oligo (dT) 25
63. Heat total RNA to 65°C in a heating block.
Incubate the RNA for 12 min when using 5-ml or 15-ml tubes (genome-wide screen); incubate for 7 min when using 1.5-ml tubes (focused screen).
64. Place total RNA on ice immediately for 5 or 3 min.
65. Incubate total RNA at room temperature for 1 min.
Neumayr et al.

Current Protocols in Molecular Biology
Washing procedure for Dynabeads Oligo (dT) 25 This step prepares the beads for binding to the mRNA. Resuspend Dynabeads Oligo (dT) 25 beads thoroughly by vortexing before use.
66. Transfer the beads to an appropriate tube.

Use 2 vol of Dynabeads Oligo (dT) 25 for 1 vol of total RNA.
67. Place the tube on a suitable magnetic separator and incubate until the solution is clear.
68. Remove the storage buffer completely from the beads.
69. Wash beads twice with 1 vol (volume refers to starting volume of beads) of 2× binding buffer. Vortex thoroughly, then place on magnetic separator for 1.5 min (until solution is clear).
70. Resuspend beads in ½ vol of 2× binding buffer (referring to starting volume of beads).

Oligo-dT purification of mRNA using Dynabeads Oligo (dT) 25
71. Add 1 × vol of total RNA (from step 65) to 1 vol of beads (resuspended in 2× binding buffer) to obtain a final concentration of 1× binding buffer.

Mix gently but thoroughly by carefully vortexing or pipetting up and down.
72. Incubate at room temperature for 10 min under constant rotation using a rolling shaker. To collect all beads from the lid, gently quick-spin tubes at <100 × g for ß5 s.
73. Place the tube on the magnetic separator for 2 min and completely remove the supernatant.
74. Wash the beads twice carefully using 1 vol washing buffer (volume corresponds to the starting volume of the beads). To resuspend the beads, gently invert the tube. Place the tube on the magnetic separator for 2 min and completely remove the supernatant.
IMPORTANT NOTE: Do not vortex while mRNA is bound to the beads! 75. Remove the remaining washing buffer completely before elution. Spin tubes at <100 × g for 5-10 s. Place on the magnetic separator and remove the remaining liquid completely.
This is important to avoid contamination of the eluate with washing buffer.
77. Immediately transfer the tubes to the magnetic separator and incubate for >1 min.
The immediate transfer to the magnet is essential to avoid re-annealing of mRNA to Oligo-dT beads.
78. Transfer the eluted mRNA (supernatant) to a new RNase-free 1.5 ml LoBind tube. 100. Resuspend the beads completely by vortexing thoroughly and incubate at 37°C on a thermomixer for 3 min while shaking (300 rpm).
101. Transfer the beads/tubes immediately onto the magnetic separator and incubate for 1 min.
Carefully tilt magnet to allow the beads to migrate upwards on wall.
102. Transfer mRNA (supernatant) to a new RNase-free 1.5-ml LoBind tube and measure the mRNA concentration.
Pool all eluted mRNA of the same screen/sample. Save 200 ng of pooled mRNA for gel analysis.

Reverse transcription (RT)/first-strand cDNA synthesis
Using a reporter transcript-specific RT primer (GSP) allows processing of up to 4 to 5 µg mRNA per RT reaction. To determine the number of RT reactions, divide the RNA concentration by 5 and round this number up to the nearest multiple of 5 (e.g., 4 becomes 5; 8 becomes 10). This is the total number of RT reactions per sample 103. Prepare RT master mix I (MMI). For one reaction: X µl mRNA (divide total volume of mRNA by total number of RT reactions) 1 µl dNTP mix (10 µM) 1 µl GSP (2 µM) (gene-specific RT primer; CTCATCAATGTATCTTATC ATGTCTG) Make up to 13 µl with RNase-free H 2 O.
Process five RT reactions per tube (65 µl total volume).

Perform 1 minus RT control (replace SSIII with H 2 O).
104. Distribute 65 µl of RT master mix I per well to PCR strip (corresponding to 5 reactions/well).
105. Incubate at 65°C for 5 min in a thermal cycler, then transfer to ice for 1 min.

Junction PCR (jPCR)
The number of junction PCR (jPCR) reactions corresponds to the number of RT reactions.
Each cleaned-up RT reaction/cDNA sample (4 to 5 µg mRNA) serves as template for one jPCR reaction.
The jPCR specifically enriches reporter transcripts and disfavors the candidate library plasmids, as the forward primer binds only to the spliced intron of the reporter transcript.
The complexity of the reporter transcript pool linearly correlates with amount of cDNA amplified.
For maximum complexity/coverage, amplify all of the cDNA during the junction PCR.
CAUTION: When processing more than one STARR-seq screen in parallel, work extremely carefully to avoid cross contamination of screens. Ideally separate screens physically (e.g., work on different benches, use different pipets, regularly change gloves). 146. Amplify the minus RT reaction with nine cycles only.
149. Determine the optimal number of PCR cycles by band intensity of the PCR product after five and nine PCR cycles on the 1% agarose gel (see Fig. 2).
For the example in Figure 2, a faint band/smear is observed around 1.2 to 1.5 kb after five cycles and overamplification after nine cycles (in this case use five cycles to amplify the jPCR products). 167. Store sample at −20°C.

UMI-STARR-seq SCREENING PROTOCOL: UNIQUE MOLECULAR IDENTIFIER INTEGRATION
This protocol is an addition to the standard STARR-seq protocol (Basic Protocol 2) and explains how to introduce unique molecular identifiers (UMIs) into the reporter transcripts prior to amplification (Fig. 1, right). This allows the counting of individual reporter transcripts, which is preferable when performing STARR-seq with low-complexity candidate libraries that are prone to PCR-amplification biases, especially when individual candidates or defined synthetic oligos are screened. Note that a spike-in control might be needed for normalization (see Basic Protocol 2, steps 47 to 62, spike-in control for library normalization. As this protocol is a variant of Basic Protocol 2, only steps that deviate are described (Fig. 1, right). Please follow Basic Protocol 2 until RNaseA treatment (step 112).
During this protocol, indexing of different STARR-seq screens is only possible using the i5 index, as the i7 index is replaced with the UMI, making it possible to count individual reporter transcripts. It is important to note that the UMI will be read during deep sequencing on an Illumina platform as Index 1 (i7) read. This requires that the Index 1 (i7) read be read for 11 cycles (instead of the standard 8 cycles) to account for the UMI length (10 nucleotides).

Additional Materials (also see Basic Protocol 2)
Primers: Second strand primer: GTCGTGAGGCACTGGGCA*G P7-UMI-primer: CAAGCAGAAGACGGCATACGAGATNNNNNNNNNN GTGACTGGAGTTCAGACGTGT*G P7-junction rev primer: CAAGCAGAAGACGGCATACG*A P7-seqReady rev: CAAGCAGAAGACGGCATACGAGA*T * = phosphorothioate bond (protection of primer from 3 to 5 exonuclease activity of proof-reading DNA polymerase; especially important for junction fw primer that specifically binds across the splice junction of the reporter transcripts/cDNA)

of 47
Current Protocols in Molecular Biology

Degradation of RNA
Ensure RNase-free working environment and reagents throughout Basic Protocol 2 and the Alternate Protocol to prevent RNA degradation, which would result in the loss of STARR-seq reporter mRNAs.

PCR amplification biases
To prevent PCR amplification biases, it is critical to use the KAPA HiFi DNA Polymerase (KAPA 2× HiFi HotStart Ready Mix; KAPA Biosystems cat. no. KK2601) for all PCR amplification steps throughout Basic Protocols 1 and 2 as well as the Alternate Protocol. PCR amplification biases would otherwise negatively impact library cloning and STARRseq.

Discrimination between STARR-seq plasmid and reporter transcript/cDNA
To discriminate between STARR-seq library plasmids and reporter transcript cDNAs (Basic Protocol 2, steps 127 to 130), we included an intron in the STARR-seq screening plasmid. We ensure that only spliced reporter cDNAs are amplified during junction PCR by using a primer that exclusively binds to the spliced reporter cDNA across the exon-exon splice junction but not the intron-containing plasmid DNA. This makes it possible to specifically enrich reporter cDNAs but not residual library plasmids.
Note that the junction PCR is indispensable, as STARR-seq plasmids can still be detected in the RNA sample even after Turbo DNase treatment (for junction PCR, see Basic Protocol 2, steps 127 to 130).

Number of PCR cycles for final amplification (sequencing-ready PCR)
After the selective amplification of reporter cDNA in a first PCR step (junction PCR, see Basic Protocol 2, steps 127 to 130), the candidates are further amplified for deep sequencing (sequencing-ready PCR). The extent of amplification during this second step is critical, and the optimal number of PCR cycles needs to be experimentally determined for every STARRseq screen (test PCR, Basic Protocol 2, steps 144 to 149). It is important to avoid overamplification while at the same time obtaining enough material for deep sequencing (200 to 1000 ng)-we achieve this typically with five to eight PCR cycles. An unexpected high number of PCR cycles (>12) determined during the test PCR may indicate low complexity of the STARR-seq cDNA pool and failure of STARR-seq. The major cause for this scenario is poor transfection performance.

Troubleshooting
See Table 2 for troubleshooting suggestions.

Statistical Analyses
Data processing and statistical analyses are very similar to other deep-sequencing based approaches that enrich for specific sequences (peaks) compared to overall genomic coverage, i.e., ChIP-seq, ATAC-seq, DNase-seq, MNase-seq.

Data Visualization
Map single or paired-end reads to the respective reference genome (recommended tool: "Bowtie 1") and create position-specific read-coverage tracks (bigwig format), which can be visualized using the UCSC genome browser in the context of annotated genes and published datasets (e.g., from ENCODE). Visually inspect the respective tracks (see Understanding Results for details). Confirm successful performance of STARR-seq by the presence of positive control peaks (if included in library).
To call STARR-seq peaks over input, use MACS2 or any other peak-calling algorithm that is commonly used for ChIP-seq or ATACseq. The STARR-seq sample corresponds to the reporter-transcript-derived sequencing reads as obtained via Basic Protocol 2, and the input is generated by deep sequencing of the STARR-seq plasmid library. For this, the STARR-seq plasmid library is amplified using the Illumina i5 and i7 index primers before transfection (Basic Protocol 1, steps 112 to 126). We found no difference between sequencing the transfected DNA and original library and conclude that is not necessary to extract plasmid DNA from transfected cells for normalization (Arnold et al., 2013). The peak height (STARR-seq signal at peak summits over input) corresponds to enhancer activity.

Understanding Results
Evaluate the read coverage profile for the plasmid library (input). The read coverage profile should appear smooth if coverage is sufficiently high.
Peaks (enhancers) should have a bell shape rather than a rectangular appearance-the latter indicates sparse stochastic signals that could stem from a low-complexity input library or poor STARR-seq performance dominated by PCR-amplification artifacts. This becomes especially important if using all reads Neumayr et al.

of 47
Current Protocols in Molecular Biology for analysis-instead of position-collapsed reads-to discriminate between peaks that stem from enhancer activity and peaks created by PCR duplications. Only "peaks" from low-complexity libraries cloned from individual enhancers or synthesized DNA oligo pools should appear rectangular, as every position is usually covered by only one fragment. Quantification of such low-complexity libraries should be done using UMI-STARRseq.

STARR-seq screening protocol
Seed cells 24 to 48 hr before transfection to reach 80% confluency on the day of transfection. For focused (BAC) screens, use 8 × 10 7 cells; for a genome-wide screen use at least 4 × 10 8 cells. The time from starting the transfection of the STARR-seq library into cells until isolation of total RNA from the transfected cells is 6 hr.
Recommended: Start with transfection of cells in the morning and start harvesting cells after 5 hr (for a genome-wide screen, harvesting cells takes ß1 hr) to lyse cells after 6 hr. After total RNA isolation, the samples can be frozen at −80°C. The STARR-seq screening protocol (Basic Protocol 2) can be conducted in 1½ days [if using the UMI-STARR-seq protocol (Alternate Protocol), 2 days]. We recommend processing mRNA to cDNA on one single day to reduce the risk of RNA degradation. cDNA processing and library amplification takes around 1/2 day.

STARR-seq library cloning protocol
The STARR-seq library cloning protocol (Basic Protocol 1) takes approximately 1½ days. On day 1, the STARR-seq reporter plasmid is digested and cleaned up, the candidate library insert is generated and amplified, and the library cloning reaction is performed, cleaned up, and transformed into highly competent bacteria. The electroporated bacteria Neumayr et al.

of 47
Current Protocols in Molecular Biology are inoculated into liquid cultures and grown overnight. On the second day the bacteria are harvested, and the STARR-seq plasmid library is prepared from bacterial pellets.