Insertion Pool Sequencing for Insertional Mutant Analysis in Complex Host‐Microbe Interactions

Abstract Insertional mutant libraries of microorganisms can be applied in negative depletion screens to decipher gene functions. Because of underrepresentation in colonized tissue, one major bottleneck is analysis of species that colonize hosts. To overcome this, we developed insertion pool sequencing (iPool‐Seq). iPool‐Seq allows direct analysis of colonized tissue due to high specificity for insertional mutant cassettes. Here, we describe detailed protocols for infection as well as genomic DNA extraction to study the interaction between the corn smut fungus Ustilago maydis and its host maize. In addition, we provide protocols for library preparation and bioinformatic data analysis that are applicable to any host‐microbe interaction system. © 2019 The Authors. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.


INTRODUCTION
Insertional mutagenesis has been used in fungi, including baker's yeast and Cryptococcus neoformans, to decipher gene functions (Giaever et al., 2002;Idnurm, Reedy, Nussbaum, & Heitman, 2004;Winzeler et al., 1999). Negative depletion screening in combination with insertional mutagenesis libraries is a powerful approach to decipher gene functions
U. maydis is a smut fungus that colonizes and overcomes the immunity of the crop plant maize (Kamper et al., 2006). Many molecular and genetic tools are available for U. maydis, and therefore, the fungus is an important model organism in the field of plant-microbe interactions (Lanver et al., 2017). Especially useful for generation of an insertional mutagenesis library is the availability of a solopathogenic U. maydis strain that is haploid and capable of infecting (Kamper et al., 2006). Many plant pathogens, including U. maydis, rely on the versatile repertoire of effector genes that mediate and shape the interaction with the host plant. The majority of predicted U. maydis effector genes are unstudied, and it is unclear if and to what extent they contribute to virulence (Kamper et al., 2006). To gain insights about the effector repertoire of U. maydis, we generated an insertional mutagenesis library and employed it in a negative depletion screen during infection of maize to elucidate novel virulence factors in the fungus.
Transformation of insertional cassettes via homologous recombination is well established in U. maydis and has been used successfully to delete clusters of predicted effector genes (Kamper et al., 2006). We created an insertional mutant library for U. maydis via homologous recombination of a selectable marker conferring resistance to hygromycin. Next, we established the iPool-Seq workflow based on this library, allowing for controlled insertional mutagenesis at loci of predicted effectors that are likely to contribute to the virulence of U. maydis . All newly generated U. maydis mutants were verified for deletion of the targeted effector genes via PCR on cultures and on extracted genomic DNA (gDNA). Eventually, the library comprised three independent replicates for each insertional mutant, with 195 putative virulence factor mutants for U. maydis in total. We used this library to conduct a negative depletion screen by infection of the host plant maize and subsequent analysis of the input and output compositions. The input, i.e., the gDNA of the mutant pool before the infection, and the output, i.e., the gDNA of the infected host material, were prepared, deep-sequenced, and bioinformatically analyzed.
iPool-Seq was performed on the entire library of 195 insertional mutants. The method was highly selective for reads from U. maydis insertional mutant loci, yielding >75% informative reads for input and output samples. Moreover, we identified 28 reproducibly and significantly depleted mutants from next-generation sequencing (NGS) reads after infection in the maize Early Golden Bantam. Several of the mutants that we found with iPool-Seq were previously shown to be virulence factors. For instance, mutants of the U. maydis effectors Pep1, ApB73, and, more recently, Cce1 displayed severe virulence defects based on classical disease ratings (Doehlemann et al., 2009;Seitner, Uhse, Gallei, & Djamei, 2018;Stirnberg & Djamei, 2016). The confirmation of these known virulence factors by iPool-Seq indicates that iPool-Seq yields reliable results for depleted mutants. To further strengthen this finding, we tested three potential virulence factors identified by iPool-Seq and were able to show a virulence phenotype for these mutants based on classical disease ratings in comparison to the wildtype solopathogenic strain SG200 .
The protocols described here aim to make the full potential of iPool-Seq accessible to the larger scientific community. The iPool-Seq workflow, from infection to sequence analysis, is divided into four parts ( Fig. 1): Basic Protocol 1 describes the process of infection of the host plant maize with insertional mutant pools. Basic Protocol 2 describes the extraction of gDNA from input samples before infection and from output samples Uhse et al.

Figure 1
Overview of the iPool-Seq pipeline. The pipeline contains four parts, which can be finished sequentially in ß20 days. gDNA, genomic DNA; NGS, next-generation sequencing; UMI, unique molecular identifier.
after infection. Both Basic Protocol 1 and Basic Protocol 2 were established for the U. maydis-Z. mays pathosystem and might require adaptation when applied to other hostmicrobial interaction systems. In contrast, Basic Protocols 3 and 4 are applicable to any host-microbial interaction system: Basic Protocol 3 describes the NGS library preparation in detail, and Basic Protocol 4 details bioinformatic analysis of the sequencing results for the input and output libraries, with the goal of detecting changes in the virulence of particular insertional mutants compared to a reference set of neutral controls.

BASIC PROTOCOL 1 U. MAYDIS INSERTIONAL MUTANT POOL INFECTION IN MAIZE
For generation of a negatively depleted output, the insertional mutant library of U. maydis must be raised, pooled, and infected into its host maize. The following protocol describes the processes of infection and of harvest of the infected maize tissue.

NOTE:
Repeat the procedure for a total of three biological replicates. NOTE: All reagents, consumables, and equipment coming into contact with living U. maydis axenic culture cells must be sterile. Working in a laminar flow hood is recommended, if possible.

Potting of maize
1. Distribute soil in 12-cm-diameter round pots and water pots sufficiently. Seed five EGB maize seeds per pot for a total of >100 seeds per insertional mutant pool replicate. Treat each pot with 100 ml nematode solution for pest control.
5. Inoculate 5 ml YepsLight per 15-ml Falcon tube with pre-culture at a 1:3000 ratio to form infection main cultures. Grow overnight (for 15 hr) at 28°C in a rotator at 20 rpm.
6. Measure optical density at 600 nm in a photometer/plate reader for each individual mutant strain infection main culture. Adjust the amount of culture to achieve an optical density between 0.6 and 1 for each strain and pool equal volumes of cultures of all mutant strains in a 500-ml centrifuge tube.
7. Centrifuge 10 min at 2000 × g and discard supernatant by decanting, ensuring removal of all supernatant. Resuspend pellet in double-distilled water to an optical density at 600 nm of 1 by pipetting up and down.

Infection of U. maydis insertional mutant library in maize seedlings
8. Using a 1-ml syringe and a 0.45-mm-diameter needle, inject ß250 µl pooled infection culture into 7-day-old EGB maize seedlings (see step 2) that display three juvenile leaves. Make sure to infect maize seedlings in the center of the leaf whirl by piercing the stem halfway. Infect a total of >100 maize plants with the pool.
9. For the input control, centrifuge 10 ml pooled infection culture in a 15-ml Falcon tube for 1 min at 10,000 × g. Discard supernatant and store pellet at −70°C until isolation of gDNA from the input sample (see Basic Protocol 2).

of 21
Harvest of infected maize tissue 10. Grow infected maize seedlings from step 8 for another 7 days in a phytochamber with a 14-hr/10-hr light/dark cycle at 28°C/20°C with a total light intensity of 183.21 µmol m −2 s −1 .
11. Harvest infected second and third maize leaves at a 1-cm distance from the infection site with scissors, making sure to restrict the harvested material to infected tissue. Wash harvested tissue in 0.05% Tween-20 in double-distilled water in a 1-L beaker by stirring on a magnetic stirrer with a magnetic stir bar at 200 rpm for 5 min. Subsequently, wash leaves twice in double-distilled water. Air-dry wet leaves at room temperature before cryopreservation (see steps 12 to 14).
The wash steps facilitate removal of any remaining dead insertional mutants located on the leaf epidermis.
12. Crush dry infected maize tissue in liquid nitrogen with a mortar and pestle. 14. Using a metal spatula, transfer milled maize powder into a 50-ml Falcon tube and store at −70°C until gDNA extraction from the output sample (see Basic Protocol 2).

gDNA EXTRACTION FROM MUTANT POOL BEFORE AND AFTER INFECTION OF MAIZE
The starting material for the iPool-Seq Illumina library preparation is gDNA (Basic Protocol 1). Firstly, gDNA from the input library is required to analyze the composition of the initial insertional mutants. Secondly, the gDNA of the infected material is required to obtain insights about the insertional mutant pool composition after infection. U. maydis gDNA extraction is based on a protocol established in yeast that was adapted for iPool-Seq (Hoffman & Winston, 1987).
CAUTION: Store the ethanol and isopropanol at 5°C to 30°C in a safety cabinet for flammable liquids.
2. Vortex mixture at 1500 rpm for 15 min at room temperature in a VXR basic Vibrax or equivalent device.
5. Add 400 µl of upper, aqueous layer from step 3 to the tube from step 4 and mix vigorously by vortexing for 30 sec.
Stopping point: Store the mixture overnight at −20°C.
9. Remove supernatant carefully by pipetting, briefly spin down tube, and remove residual supernatant.
11. Incubate for 15 min at 55°C on a ThermoMixer C with agitation at 800 rpm and with an open lid.
This step allows for evaporation of residual ethanol.
gDNA extraction from output 13. Weigh 1 g homogenized infected maize tissue (output; see Basic Protocol 1, step 14) and transfer into a 7-ml Precellys tube.
15. Vortex mixture at 5000 rpm for 30 sec at room temperature in a Precellys Evolution bead mill. Transfer mixture into 5-ml Eppendorf tubes.

of 21
17. In the meantime, prepare a 5-ml Eppendorf tube containing 2.2 ml isopropanol.
18. Add 1.5 ml of upper, aqueous layer from step 16 to the tube from step 17 and mix vigorously by vortexing for 30 sec.
Stopping point: Store the mixture overnight at −20°C.
22. Remove supernatant carefully, briefly spin down tube, and remove residual supernatant.
24. Incubate for 15 min at 55°C on a ThermoMixer C with agitation at 800 rpm and with an open lid.
As in step 11, this step allows for evaporation of residual ethanol.

ILLUMINA SEQUENCING LIBRARY PREPARATION USING gDNA FROM INSERTIONAL MUTANT POOLS
The purified gDNA from the insertional mutant library input and output (Basic Protocol 2) is further processed via an iPool-Seq library preparation protocol to obtain Illumina sequencing-compatible libraries. The protocol is optimized for specific enrichment of insertion mutant flanks and high double-stranded DNA (dsDNA) yields, enabling library preparation directly from infected host tissue.

NOTE:
Check the dsDNA concentration after each step to ensure successful preparation. We recommend quantification via a fluorescence assay, e.g., PicoGreen.

Uhse et al.
Gel electrophoresis chamber or fragment analyzer 1.5-ml DNA LoBind tubes (Eppendorf) Magnetic stand Rotator Additional reagents and equipment for gel electrophoresis (see Current Protocols article; Gallagher, 2012) Tn5 fragmentation of input and output gDNA 1. Combine the following in a PCR tube to 100 µl total volume and mix well by pipetting up and down: r 25 µl of 100 µM Adapter P1 r 25 µl of 100 µM Adapter P2 r 50 µl reassociation buffer.
Perform primer annealing in a thermocycler starting from 90°C, with a 1°C decrement per minute.
2. Combine the following in a PCR tube to 100 µl total volume and mix well by pipetting up and down: r 25 µl purified Tn5 transposase r 25 µl annealed adapters (see step 1) r 50 µl of 100% glycerol.
3. Incubate for 30 min at 37°C in a thermocycler.
4. Combine the following in separate PCR tubes and mix well by pipetting up and down: r 250 ng input or output gDNA r Tn5 transposase loaded with adapters (see step 3) to a final concentration of 150 ng/μl Illumina Rd1 e 5 -ACACTCTTTCCCTACACGACGCTCTTCCGATCT-3 Illumina Rd2 e 5 -GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT-3 a Depending on the library of insertional mutants that is screened, these sequences may potentially need to be adjusted. b The 12 Ns in Adapter P1 constitute the UMI and should be random. c The 6 Ns in PCR2-R constitute the (single-index) library multiplexing barcode. d With PCR1-R and PCR2-R as listed, enrichment of insertion mutant flanks is specific for the sequence 5 -CCAGATGTCCTGTGGTATCCTGTGGCG-3 , and read2 will start with the underlined part of PCR2-R. e Illumina Rd1 and Rd2 are the standard Illumina TruSeq sequencing primers.

of 21
r 4 µl of 5× TAPS buffer and r Nuclease-free water to 20 µl. Specific PCR of mutant cassette genome junctions 9. Combine the following in a PCR tube and mix well by pipetting up and down.
10. Run the following PCR program in a thermocycler: 14. Repeat washing step three additional times.
15. Resuspend beads in 2× B&W buffer, matching the volume of the clean input and output PCR1 eluates.
16. Pool eluates and beads and allow for enrichment of biotinylated PCR amplicons by rotation at room temperature for 15 min.
17. Place tubes in a magnetic stand for 1 min and discard supernatant.
18. Wash beads with 200 µl of 1× B&W buffer while the tubes remain in the magnetic stand.
19. Repeat washing step three additional times. Resuspend beads in 34 µl nuclease-free water and proceed with nested PCR (see steps 20 and 21).
Stopping point: Store the enriched PCR fragments at −20°C until nested PCR.
Nested PCR of enriched fragments 20. Combine the following in PCR tubes to 50 µl total volume and mix well by pipetting up and down: r 34 µl nuclease-free water containing beads (see step 19) r 10 µl of 5× Phusion HF Buffer r 1 µl of 10 mM dNTPs r 2 µl of 5 µM PCR2-F r 2 µl of 5 µM PCR2-R r 1 µl Phusion High-Fidelity DNA Polymerase.
21. Run the following PCR program in a thermocycler:

BIOINFORMATIC ANALYSIS
This protocol describes how insertion pool data are analyzed, using the pipeline that we developed (available at http://www.cibiv.at/software/ipoolseq-pipeline), to find insertional mutants with significantly increased or decreased virulence. Virulence is measured as the abundance of a deletional mutant in the post-infection output pool relative to the Uhse et al.

of 21
pre-infection input and is compared to the virulence of a reference set of known neutral mutants to find significant deviations from neutral behavior.

NOTE:
In the following, commands intended to be entered on a Unix-style terminal, either directly on a Linux machine or via SSH, are printed in a monospaced font. Outside of such commands, filenames are printed monospaced and underlined. Placeholders for names of experiments or libraries that have to be supplied by the user are printed in italic.
NOTE: iPool-Seq pipeline is based on the workflow engine "snakemake" (Köster & Rahmann, 2012). It is thus not strictly necessary to proceed step by step; in particular, jumping to step 13 directly after adding all required data in step 6 will cause all intermediate steps to be executed automatically. However, given that it is good practice to check the results of key intermediate steps (like mapping and KO assignment) for validity before proceeding, we recommend following the steps outlined below and also performing the checks and validations suggested in the Troubleshooting section for each individual step.

Materials
Workstation running Linux or Windows 10 with Windows Subsystem for Linux (WSL), with 64-bit CPU and 8 GB or more of RAM and with free disk space ࣙ5 times size of raw sequencing data iPool-Seq analysis pipeline (http://www.cibiv.at/software/ipoolseq-pipeline) Reference genome of U. maydis in FASTA format FASTA file containing sequences at 5 end (named "5p") and 3 end (named "3p") of knockout (KO) cassette List of deletional mutants of U. maydis as GFF2 file listing KO cassette insertion positions Single BAM file containing unmapped sequencing reads or two separate compressed FASTQ files (one for read1 and one for read2) for each sequenced library (prepared according to Basic Protocol 3) Web browser PDF viewer NOTE: For the FASTA file describing the KO cassette, the sequences must reflect the part of read2 that overlaps with the cassette, i.e., start with the underlined part of PCR2-R (Table 1)  NOTE: For the list of deletional mutants, each entry must carry ࣙ2 two tags: a unique "Name" and a flag "Neutral" with value 0 or 1 that decides whether a particular deletion strain is included in the reference set of assumed neutral deletions. See the (included) list of deletional mutants used by Uhse et al. (2018) cfg/Uhse_et_al.2018/knockouts.gff for an example.

Trimming and UMI extraction (per library)
7. To trim the technical sequences ( Fig. 2A) Table 2. During this step, the TRUmiCount algorithm  also produces the accompanying PDF report data/your_design/your_lib.count.pdf.

Adjusting the TRUmiCount phantom rejection threshold (per library)
11. To see if adjustment of TRUmiCount's read-count threshold is necessary, check read-count distribution plot in the TRUmiCount report data/your_design/your_lib.count.pdf.

Optional:
If there is a clear overabundance of observed vs. predicted UMIs for read counts slightly larger than the threshold, increase threshold. If the predicted and observed numbers of UMIs agree well for read counts below the threshold, decrease threshold (Fig. 2C). To set a library-specific threshold T for library your_lib, add the following lines to the "trumicount" block in cfg/config.yaml (be sure to match the indentation of the existing lines in that block): -file: 'data/your_design/your_lib.*' opts: '--threshold T' IMPORTANT NOTE: After changing the setting, remove data/your _design/your_lib.count.tab and re-run step 10.

Finding differentially virulent KOs (per replicate)
13. To compute the log fold changes of KO virulence for your_replicate and p-values for how significantly these log fold changes deviate from zero (i.e., no change compared to the neutral reference set), do snakemake data/your_design/your_replicate.dv.tab The differential virulence analysis is based on the KO abundances and loss correction factors from the tables your_replicate-in.count.tab and your_replicate-out.count.tab, both created in the folder data/your_design during step 10.
Step 13 also produces an accompanying HTML report in data/your_design/your_replicate.dv.html.

Downstream analysis
14. To produce plots and to combine data from multiple replicates, load output table data/your_design/your_replicate.dv.tab from step 13 into a Uhse et al.  See Table 3 for a description of the columns of data/your_design/your_ replicate.dv.tab, which contains the results of the differential virulence analysis step.

B&W buffer, 1×
Add sterile deionized water and autoclave at 121°C Store ࣘ4 weeks at room temperature

COMMENTARY Background Information
The analysis of insertional mutant libraries is well established for bacteria (Gawronski et al., 2009;Goodman et al., 2009;Langridge et al., 2009;van Opijnen et al., 2009) but has not been applied extensively to eukaryotic microorganisms. Genome-wide insertional mutant libraries were generated successfully for baker's yeast (Saccharomyces cerevisiae) by homologous recombination (Winzeler et al., 1999) and, more recently, by transposition (Michel et al., 2017) and for the rice pathogenic fungus Magnaporthe oryzae by the kinase ATM (Jeon et al., 2007). However, tools that allow for efficient highthroughput analysis of a negative depletion screen in the context of a host, for instance in the case of M. oryzae and rice (Jeon et al., 2007), were not available until recently. Therefore, we developed iPool-Seq, which, due to its high selectivity and sensitivity, allows for analysis of pooled infections of insertional mutants directly from the infected host tissue . iPool-Seq provides a powerful tool for scientists who want to analyze mutant pool composition after colonization of the host, without any biases that could arise due to separation of the output fraction.

Outlook
We previously demonstrated ) that iPool-Seq offers an elegant possibility to analyze a U. maydis insertional mutant pool after colonization of its host maize. We further suggest that iPool-Seq could be applied to genome-wide insertional mutant pools generated by high-throughput techniques, e.g., transposon-mediated mutagenesis. Due to its high selectivity and sensitivity, iPool-Seq enables analysis of large insertional mutant pools directly from the colonized host tissue and obviates the need for separation of the host tissue and colonizing microbes. We propose that iPool-Seq not only is suitable for analysis of mutant pools of microorganisms in the context of a plant host but also may be applied to animal-microbe or microbe-microbe interaction systems.

Trimming and UMI extraction
During this step, all non-genomic sequences (with UMI and KO cassette overlap) are removed from the raw sequencing read pairs, as produced by (paired-end) sequencing ( Fig. 2A), so as to not interfere with the mapping process. The UMIs are instead stored as Uhse et al.

of 21
part of the pairs' read names to ensure that the UMIs are passed alongside the reads through the following processing steps. Reads that do not overlap with the KO cassette or that do not contain a UMI are removed.

Mapping and assignment to insertional KOs
During this step, the trimmed reads are mapped to the reference genome using NextGenMap (Sedlazeck, Rescheneder, & von Haeseler, 2013) and assigned to the individual insertional KOs (Fig. 2B). In short, proper read pairs (pairs with both mates mapped and correctly oriented) are assigned to a specific flank (5 or 3 ) of a KO cassette insertion (i.e., a KO strain) if one read starts close (±10 bp) to the respective end of the cassette and extends away from the cassette. Singleton reads (reads whose mate could not be mapped) must map ࣘ1000 bp away from the respective end of the cassette and extend toward the cassette. Reads that cannot be assigned unambiguously are not assigned at all.

Determining KO abundances (counting genomes)
This step consists of error correction, phantom removal, and loss estimation for the UMIs detected for a particular combination of flank and KO strain. The UMIs are first corrected for sequencing errors with UMI-Tools (Smith, Heger, & Sudbery, 2017), which merges similar UMIs found within the same flank of a KO cassette insertion. The merged UMIs are then processed further with TRUmiCount , which filters based on per-UMI read counts to remove additional phantom UMIs (mostly amplification artifacts) and then, for each flank of each KO cassette insertion, estimates and corrects for the percentage of lost (i.e., unobserved) true UMIs. This ensures that the estimated KO abundances are unaffected by PCR amplification bias. The output comprises, per combination of KO and flank, the filtered UMI count (number of observed genomes), the estimated loss, and the loss-corrected genome count (Table 2).

Generating insertion libraries suitable for iPool-Seq
There are different techniques available for generation of insertion libraries. We based the generation of the U. maydis insertional mutant library on homologous recombination. Insertional mutagenesis via homologous recombination has the advantage of not being prone to multiple insertions per individual, which is more likely to happen with untargeted, random insertional mutagenesis approaches, like Agrobacterium-mediated transformation (Michielse, Hooykaas, van den Hondel, & Ram, 2005). However, homologous recombination is more laborious than high-throughput methods. Moreover, the choice of the fungal model system is critical, and advantages and drawbacks of systems should be evaluated prior to insertional mutant library generation. Here, we use U. maydis, a fungal model that is genetically accessible to homologous recombination and well suited for the application of iPool-Seq in the context of a host infection due to the availability of a solo-pathogenic strain (Bölker, Genin, Lehmler, & Kahmann, 1995).
For the final analysis (Basic Protocol 4), an internal reference set of mutants with unaffected virulence is essential to find mutants that are comparatively depleted or enriched mutants. The reference set can be defined either by known unaffected mutants that have been published before or by individual mutants identified via infection tests.

Mutant pool infection
The iPool-Seq protocol provides a high specificity for the inserted sequences and thus can be applied directly to infected host material (Basic Protocol 1). However, the number of pathogens that infect an individual host can constitute a bottleneck. We suggest overcoming this bottleneck by increasing the number of infected host plants or by reducing the complexity of the insertion mutant library.

Sequencing
We recommend sequencing the finished library (Basic Protocol 3) on an Illumina MiSeq platform and aiming for 2 to 3 million reads per library. However, it is also possible to use a different platform and to sequence more deeply to improve individual mutant UMI counts.
For complex libraries, it is possible to increase the amount of gDNA for the infected output sample and to proportionally increase the sequencing depth, which will likely yield a higher coverage of individual mutants. For a given mutant library and amount of extracted gDNA, the average number of reads per UMI (found in the TRUmiCount report) is an indicator of whether increasing the sequencing depth would be beneficial. For libraries with <1 read per UMI on average, deeper sequencing can be expected to improve the accuracy of abundance measurements; after that, the benefit drops gradually, and more than ß10 reads per UMI will not provide any additional benefit.

Trimming and UMI extraction
The optional FastQC (Andrews, 2010) reports created for the first and second reads after trimming offer a first quality check of library preparation and sequencing (Basic Protocol 4). Aspects to check for are as follows: (a) under "Basic Statistics," that most (>90%) reads survived the trimming step; (b) under "Per base sequence quality" and "Per base N content," that the sequenced bases are high quality and do not contain many Ns; and (c) under "Adapter Content," that trimming indeed removed all sequencing adapters from the reads. If many reads are lost during the trimming step, they either were contaminants or did not match the sequence pattern that the library preparation should produce. In this case, we recommend blasting a few random reads to check for contamination and to manually compare their sequence composition to the expected pattern ( Fig. 2A) and to the sequences in data/your_design/cassette.fa. Should most reads survive but show either a strong drop-off of base qualities toward the end or many Ns, it may be necessary to include an additional quality-based trimming step in the Trimmomatic (Bolger, Lohse, & Usadel, 2014) command in cfg/config.yaml (see the Trimmomatic manual for details). If there are still adapter sequences detected in the trimmed reads, add any custom adapter sequences that differ from the adapters mentioned in Basic Protocol 3 to cfg/Uhse_et_al.2018.adapters.fa (again, see the Trimmomatic manual for details).

Mapping and assignment to insertional KOs
We recommend checking the results of the mapping and KO assignment process visually for a few libraries and KO cassette insertions in a genomic viewer like the Integrated Genomics Viewer (IGV; Robinson et al., 2011). You should find most of the reads mapped to the two flanks (3 and 5 ) of KO cassette insertions, carrying the name and flank of the insertional KO in the form "name:flank" in the XT tag, and not extending more than a few base pairs into the regions replaced by the KO cassette. You may find some spurious reads mapped to arbitrary locations in the genome; these will later be ignored by the pipeline and thus are not a cause for concern (unless overly abundant).
If reads are mapped correctly but not assigned to the correct KO cassette insertion, check that the genomic coordinates in your GFF2 files listing the KO cassette insertions are correct. If a substantial fraction of the sequenced fragments are longer than 1000 bp or if many reads extends more than a few base pairs into the regions replaced by the KO cassette due to mapping imprecisions, adjust the "mapping fuzzyness" or "max fragment length" parameter of the "knockout_assignment" step in cfg/config.yaml.

Adjusting the TRUmiCount phantom rejection threshold
For optimal separation of phantom UMIs (i.e., amplification artifacts) from true UMIs by the TRUmiCount algorithm , it can be necessary to adjust the automatically chosen read-count threshold T (Fig. 2C). This is true in particular at higher sequencing depths, where phantom UMIs can make up a large proportion of UMIs (but not of reads, due to the phantom's lower read counts). Setting the threshold too low (Fig. 2C, left) will cause phantom UMIs to be mistaken for true UMIs, which has the potential to distort the results. It will also distort TRUmiCount's parameter estimates (the PCR efficiency in particular), resulting in a bad fit of model and data. Choosing a threshold that is too high (Fig. 2C, right), in contrast, will cause more true UMIs (i.e., UMIs that reflect actual genomes in the sequenced pool) to be filtered out. However, given that TRUmiCount estimates and corrects for this loss, the net effect is only a reduced precision of the measured abundances (due to the lower absolute genome counts), and not an introduction of systematic biases. When choosing a threshold value, too high is thus preferable over too low.

Understanding Results
The results contain several types of information. Firstly, the KO abundance tables (produced in Basic Protocol 4, step 10) list the number of genomes per mutant found in the input and output libraries, i.e., they contain information about the absolute abundances of the individual KOs. These tables in particular also provide information about which mutants are not detected at all in either input or output (i.e., that show zero detected genomes), which is possibly due to very slow growth or not being viable at all. Gradual changes in a mutant's Uhse et al. virulence are detected by comparison of the mutant's input and output abundances to a reference set of neutral mutants and summarized in the differential virulence report (produced in Basic Protocol 4, step 13).

Finding differentially virulent KOs (statistical analysis of abundances)
Given that KO abundances in the input can be spread across multiple orders of magnitude, the dominant factor that determines the abundance of a KO in the output pool is its abundance in the input pool; the effects due to different genotypes that we want to detect is typically subordinate to that. The iPool-Seq pipeline accounts for this by assuming that a KO's loss-corrected abundance A out in the output pool depends (linearly) on its losscorrected abundance A in in the input pool, in addition to depending on the virulence factor v relative to neutral KOs (v = 1 for neutral KOs). To account for differences in genome capture efficiency and total genome count between the two libraries, the pipeline also includes a scaling factor λ (which is replicate-specific, but not KO-specific). The loss-corrected input and output abundances of a KO are computed from the (3 and 5 summed) filtered UMI counts N in and N out , which are corrected for unobserved UMIs by dividing by 1 − in and 1 − out , where in and out are the (3 and 5 averaged) loss estimates computed by TRUmiCount. The pipeline thus computes the log 2 fold change of a KO's virulence as To test for significant deviations of v from neutrality (i.e., v= 1), the pipeline assumes a negative binomial model for N out given N in , which accounts for the uncertainty in A out and A in due to sampling noise, as well as for an additional amount of dispersion d due to random fluctuations of the virulence of neutral KOs: The two parameters λ and d are estimated by fitting the model to the set of neutral KOs. Parameter λ measures the relative total size (i.e., the loss-corrected number of UMIs) of the output library relative to the input library, and d the squared coefficient of variation of output abundances due to biological noise (i.e., due to differences in the growth of neutral mutants).

Interpretation of the differential virulence report
The differential virulence report created in Basic Protocol 4, step 13, contains diagnostic statistics and plots that serve as quality checks and as verification that the assumptions of the statistical model are fulfilled to a reasonable degree.
The "Quality Control / Read and UMI count statistics" provide an overview of how many usable reads and UMIs remain after each step of the analysis pipeline. Discarding up to about one third of reads during the course of the analysis should be considered normal; if considerably fewer reads than that remain after the "TRUmiCount" step, the steps that remove the largest percentages of reads should be checked carefully for problems. For the number of UMIs, at high sequencing depths (on average ß10 reads per UMI or more), it is normal for the removed percentage to be considerably higher because of TRUmiCount filtering of UMIs with a low read count.
The precision of the KO abundances determined for the input and output pools is reflected by the correlation (found under "Quality Control / Correlation of 3 and 5 Flank Abundances") of the abundances computed for each flank of the KO cassette insertions. After loss correction by TRUmiCount, a correlation of ß0.9 or more can be expected.
For neutral KOs, our statistical model also assumes a linear relationship between preinfection input abundance and post-infection output abundance, which can be verified in the input vs. output plots and correlations (found under "Quality Control/Correlation of Input and Output Abundances"). There exists no generally applicable lower bound for the expected correlation of input and output because the expected correlation depends on the percentage of non-neutral KOs among the KOs in the experiment and on how much faster or slower these KOs proliferate. More important than the correlation is the qualitative behavior seen in the input vs. output plots: the relationship should be linear across the full range of observed input abundances, without any plateau effect for highly abundant KOs. If such a plateau effect is observed, it is likely that the carrying capacity of the host plants has been reached, and either the number of mutants that each plant is infected with should be reduced or the statistical model should be modified to account for the carrying capacity of the host plant.

Time Considerations
The iPool-Seq protocol can be finished in ß3 weeks for the U. maydis-Zea mays pathosystem (Fig. 1). For other pathosystems, we recommend first determining critical parameters and generating an insertional mutant library for infection. In comparison to U. maydis, variations in time considerations for other pathogens will mainly depend on the infection protocol.