How does transformation provide genetic variation
Importance: The human microbiome is rich with thousands of diverse bacterial species. One mechanism driving this diversity is horizontal gene transfer by natural transformation, whereby naturally competent bacteria take up environmental DNA and incorporate new genes into their genomes.
Competence is theorized to accelerate evolution; however, attempts to test this theory have proved difficult. And how does this DNA become part of a bacterium's genome? Natural transformation , as its name implies, is a natural mechanism used by some bacterial cells to take up DNA from the environment.
This environmental DNA was, at one point, located in other bacteria. For instance, when bacteria die and disintegrate, their chromosomal DNA is released. Fragments of this DNA remain in the environment and are freely available to other living cells, including other bacteria.
These naturally occurring DNA fragments can enter a living bacterium through its cell membrane, after contact with that membrane. If the DNA is double stranded, one of the strands will pass across the cell membrane into the cell, and the other strand will be dissolved, or hydrolyzed. Parts of the newly introduced single-stranded DNA molecule may then recombine with similar regions on the bacterial chromosome and become incorporated into the bacterium's genome.
In contrast, during artificial transformation , DNA uptake by bacterial host cells occurs under certain laboratory conditions. In the lab, scientists often introduce foreign DNA into bacterial cells via transformation in order to study specific genes and their functions. Typically, these researchers use E. In addition, transformation can be induced by electroporation , a process in which the bacterial host cells are subjected to an electric field that allows molecules to pass more easily across the membrane.
Heat shock is another way that transformation can occur, wherein host cells are exposed to extreme temperatures that also cause the cell membrane to temporarily allow molecules of foreign DNA into the cell. Within the lab environment, bacteria are also commonly transformed with sequences of DNA called plasmid vectors.
These naturally occurring DNA molecules are circular, and they can replicate inside a bacterium independent of the bacterial chromosome which can also be circular. Plasmid vectors can be used to clone, transfer, and manipulate genes. Often, these plasmids carry a gene for antibiotic resistance, which means that researchers can select for cells that are resistant to a given antibiotic in order to determine whether a bacterium has been successfully transformed.
Conjugation is a process by which one bacterium transfers genetic material to another bacterium through direct contact. The reference sequences for Rd and NP were compared using the Mauve whole-genome alignment software [42].
The complete genome sequences were aligned twice, once with Rd as the query and once with NP as the query. The few identified SNVs that were inconsistent between the two independent whole-genome alignments were excluded.
The two resulting files provided positions of each SNV in each genome, ordered against one or the other reference. About 10 million paired-end sequences of 42 bases were obtained from each library on individual lanes of an Illumina GA2 flow cell Table S1. Raw data was processed using Illumina Pipeline Version 1.
While this generates some spurious mapping artifacts, it ensures that reads will map to both references when possible, even where there is high divergence. A combination of two criteria was used to identify differences between sequence reads and their references and to flag positions with ambiguous base identity.
The first method used the SamTools version 1. Reference positions missing from the SamTools consensus were treated as unmapped positions presumably within or near deletions. The second method used direct calculation of the frequency of each base at each reference position.
Parsed pileup files were subsequently analyzed using custom scripts written in the R statistical programming language [61]. Control self-alignments. Control reciprocal alignments. Differences between our donor and recipient strains were identified from the reciprocal alignments of Rd-RR reads to the NP reference genome and of NP-NN reads to the Rd reference genome.
SNV positions were considered cross-validated, if both reciprocal alignments and whole-genome alignment identified the same SNV. Ambiguous positions prone to read mapping artifacts in reciprocal-alignments were also flagged using the same criteria as above. Transformant sequence reads were analyzed as above.
Recombination events were identified in the transformed clones by classifying the positions of cross-validated SNVs as donor, recipient, or ambiguous. Individual donor segments breakpoints were defined by the positions of their outermost donor-specific alleles. Donor segments were then manually inspected using the Integrated Genomics Viewer [63] to validate the donor segment breakpoint locations.
For the pooled sample of four transformed clones RRRR , donor-specific allele frequencies were determined at each cross-validated SNV position. Positions that were unmapped by reads in the reciprocal alignments but mapped in self-alignments were used as markers of indel differences and other structural variation between donor and recipient, and the donor segment intervals were examined for read coverage at positions unmapped by either reciprocal alignment. Indel differences flanking the observed donor segments were also tabulated.
GenomeMatcher [65] was used to view annotated sequence alignments at transforming and flanking structural variation to identify affected loci. Read depth varies consistently across the genome.
A shows a histogram of read depth per mapped Rd genome position. D shows variation in read depth for Rd-RR reads mapped to the Rd reference genome along a representative interval. Read depths were first normalized to the median read depth to account for differences in sequence yields. The genome-wide correlation between read depths for these two samples was 0. Ambiguous positions.
Plot of the non-reference variant frequency at positions classified as ambiguous for the indicated set of sequence reads aligned to the two references: A Rd, and B NP.
Data are tabulated in Table 2 and Table 3. The arrow indicates the bp interval expanded in Figure S3. Note the high variant frequency of ambiguous positions at intervals in the two transformed clones at intervals containing donor segments when using the Rd reference genome labeled with roman numerals as in Figure 2. Examples of two kinds of artifacts. The bps shown are indicated by the arrow in Figure S2B. Grey bars show positions with no detected variants i.
Blue lollipops show the non-reference variant frequency at positions classified as matching the reference; when the lollipop falls on the limit-of-detection line, a single non-reference variant was observed. Density histograms of read depth per position A and B and non-reference variant frequency per position C and D when mapping control sequence reads to the NP reference.
Also shown in B and D is the percent of mapped positions where no non-reference variants were detected and the percent of unmapped positions. Unmapped positions in reciprocal alignments mark structural variation. Note the scale compresses the individual positions horizontally, so exaggerates the total fraction of unmapped positions Table 2 and Table 3. False-positive positions.
SNVs detected in self-alignments were first accounted for. Zooms of the five intervals I to V containing donor-specific alleles in the transformants, as in Figure 4 , plotted against the Rd reference genome. The lower schematic shows each interval as in Figure 5B. Transformation of and near structural variation. In each plot, the top two rows show donor- and recipient-specific SNVs in blue and red, respectively.
Light blue and pink bars that span the plot show donor- and recipient-specific structural variation, respectively. The purple diamond show the position of the Nov R allele. Positions where the green line touches the x-axis were unmapped by Nov1 reads.
We thank Lexi Mithel and Jae Yun Lee for technical assistance and members of the Redfield lab for comments on the manuscript. The authors have declared that no competing interests exist. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
National Center for Biotechnology Information , U. PLoS Pathog. Published online Jul Hall , 2 and Rosemary J. Ira M. Rosemary J. David S. Guttman, Editor. Author information Article notes Copyright and License information Disclaimer. Received Feb 26; Accepted May Copyright Mell et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
This article has been cited by other articles in PMC. Figure S2: Ambiguous positions. Figure S3: Examples of two kinds of artifacts. Figure S4: Density histograms of read depth per position A and B and non-reference variant frequency per position C and D when mapping control sequence reads to the NP reference.
Figure S5: Unmapped positions in reciprocal alignments mark structural variation. Figure S6: False-positive positions. Figure S7: Zooms of the five intervals I to V containing donor-specific alleles in the transformants, as in Figure 4 , plotted against the Rd reference genome. Figure S8: Transformation of and near structural variation.
Table S1: Summary of sequencing results. Table S3: Read depth in pileups on NP. Table S5: Non-reference variants in reads mapped to NP. Table S Donor segments in four transformants using NP reference coordinates.
Table S Transformation of insertions and deletions within donor segments. Table S Indel and other rearrangements flanking donor segments. Abstract Many bacteria are able to efficiently bind and take up double-stranded DNA fragments, and the resulting natural transformation shapes bacterial genomes, transmits antibiotic resistance, and allows escape from immune surveillance.
Author Summary The ability of bacteria to acquire genetic information from their relatives—called natural competence—poses a major health risk, since recombination between pathogenic bacterial lineages can help bacteria develop resistance to antibiotics and adapt to host defenses. Introduction For many bacteria, natural transformation is the dominant mode of genetic transfer between close relatives.
Open in a separate window. Figure 1. Results Preliminary genetic analysis Competent cultures of H. Figure 2. Natural transformation of H. Table 1 Strains used. Table 2 Summary of read mapping to Rd KW The Rd KW20 reference is 1,, bp. Table 3 Summary of read mapping to NP. The NP references is 1,, bp, respectively. Control analyses of donor and recipient re-sequencing data Before transformant data sets were analyzed to identify recombination events, the control sequence reads were used to 1 identify differences between the published reference genome sequences and the genomes of the donor and recipient strains we used; 2 confirm the reliability of single-nucleotide variants SNVs for distinguishing between donor and recipient sequences; and 3 identify positions that were systematically error-prone, ambiguous, or unmapped in the alignment of reads to references.
Figure 3. The increasing availability and decreasing cost of multiplexed sequencing methods will partially circumvent this problem in the future. This is 7. The 16 donor segments had a mean length of 8. Although transformation might be expected to preferentially occur at regions with low sequence divergence, the regions participating in recombination had divergences typical of the whole genome 2. However this variation did not appear to affect recombination, since all donor segments contained regions of both high and low divergence, and there were no obvious correlations between recombination breakpoints and extremes of divergence.
Donor segments are shown as horizontal blue bars. Dark and light green lines indicate 1 kb and bp window sizes, respectively both with a step size of bp. The adjacent locations of many donor segments Figure 5 likely resulted from disruption of longer transforming DNA fragments rather than independent events.
For example the 6 donor segments in Nov1 were found in 2 clusters of 22 and 24 kb. The longest is the 5. When the 16 donor segments were treated as 10 clusters, the mean recombination tract length was Recombination not only brought thousands of donor-specific SNVs into the transformant genomes but introduced several donor-specific insertions and deletions Table S11 resulting in some donor segments being different lengths than the recipient segments they replaced Figure 5 , Tables S9 and S In particular, strain Nov1 received two large donor-specific insertions 1.
These were confirmed by read depth analysis along the two reference sequences Figure S8. The top drawings illustrate the inferred joint molecule intermediates that yielded the recombination products illustrated in the bottom drawings.
In A the 1. Thin dark lines show aligned sequence blue and red for donor and recipient sequences, respectively. Thick pale lines show unaligned indel differences between the genomes. For B , black arrows show putative cut sites by a mismatch correction endonuclease. On the other hand, indels and other structural variants between the donor and recipient chromosomes appear to have blocked progression of strand exchange in several instances Figure 5 and Table S Of the 32 donor segment breakpoints, 12 are within 5 kb of indel or other structural variation; 6 of these are within 3 of the 6 apparent disruptions described above and thus are likely sites of restoration repair.
Indeed, one structural variant gave different outcomes in different recombinants: the 2. Figure 7B illustrates another example of putative restoration repair at an insertional deletion difference between the donor and recipient, as indicated by the interruption between Segments D and E by the recipient insertion allele along with 26 flanking recipient SNVs Figure S8.
The plummeting cost of deep sequencing allowed us to characterize the genome-wide consequences of natural transformation, but the ability of this analysis to account for artefacts depended on our high-coverage control sequencing of the donor and recipient genomes. Aligning these control reads to the two reference genome sequences revealed many positions prone to ambiguity or false-positive SNV calls.
In the absence of these controls, such artifacts would have mistakenly been interpreted as recombination-induced mutations, since mapping reads to divergent references generated these erroneous variants, while mapping reads to highly similar references did not.
The frequency of these artifacts depends not only on nucleotide divergence, but also on the spectrum of structural variation and the complexity of the genome. Analysis of such high-coverage control datasets will be essential for reference-guided assembly approaches that use data with lower coverage, such as that obtained using inexpensive multiplexing methods.
The broad spectrum of sequence differences between donor and recipient used in these experiments is typical of the natural genetic variation between H. However most of the DNA in respiratory mucosa is from human cells and, although bacterial DNA is known to be abundant in biofilms, its fragment sizes and composition in mucus are not known.
The short DNA fragments also present in mucus may be taken up more efficiently than long fragments, since H. Competent cells incubated with short donor DNAs might acquire more donor segments, but short fragments will also be more severely affected by the exonucleolytic degradation that accompanies translocation into the cytoplasm.
The lengths of the donor segments we found in H. The difference suggests that population genetic models for measuring recombination in nature will require incorporating species-specific estimates of the distribution of recombination tract lengths [15]. The lengths of donor segment found in recombinant chromosomes may underestimate the original lengths of DNA fragments participating in uptake and recombination, because clustering of donor segments suggests that longer incoming DNA fragments are often disrupted before transformation is complete.
Similar clustering of donor segments was seen when recombination at a single locus was examined in Helicobacter pylori [33] — [34]. The clustering of H. More probable explanations are that 1 cytosolic or translocation endonucleases degrade incoming DNAs prior to strand exchange, or 2 sequence heterology blocks progression of strand exchange, with the heterologous sequences trimmed away by nucleases.
Intracellular cleavage of incoming DNA by restriction enzymes has been proposed for competent Helicobacter pylori [49] , but this is problematic because, in both H. Although McKane and Milkman have shown that restriction can create clustered recombination tracts in E. The effect of restriction in H. Similar accumulation might be a transformation-limiting factor for many species that normally live in mixed-species biofilms, whenever environmental DNA encounters restriction enzymes derived from other strains or species.
We found no evidence that recombination preferentially occurred in regions of lower nucleotide divergence than the genome-wide average. Instead, sequence divergence varied on a scale much shorter than the donor segments, with most segments spanning local regions of both high and low divergence Figure 6. This suggests that, although strand exchange may initiate between regions of high sequence identity, it readily extends into and through regions with many mismatches.
Measuring the effect of divergence on recombination break points and interruptions will require sequencing many more recombinants.
Effects of structural variation on recombination were evident even with this small sample size, as heterologous sequences were much more common at donor-segment breakpoints than expected from their abundance in the recombining genomes, e. This is consistent with previous genetic experiments showing that insertions and deletions transform at much lower rates than do substitutions [22] and may be due to inhibition of strand exchange or to subsequent excision of heteroduplex from recombination intermediates by a mismatch correction mechanism.
However, at other sites the donor versions of structural variation were acquired as parts of longer donor segments, showing that such accessory loci can indeed readily move by natural transformation.
Other factors could have influenced the transformation events we observed: 1 H. We do not know the extent of heteroduplex correction at these or other independently transforming sites, nor how recombination tracts are distributed between the two strands of the originally transformed chromosome. Although the relatively short putative restoration repair events observed in this study might suggest that heteroduplex correction only act on parts of larger heteroduplex recombination products, other repair events might have completely removed shorter segments of donor DNA.
Because clones chosen for sequencing had acquired one of two antibiotic resistance alleles from the donor, we were able to examine overlapping recombination events at each of these loci, detecting striking differences at the Nov R locus. On the other hand, the 11 kb overlap between the unselected donor segments A and G was unexpected given the transformation frequencies of single markers Figures 4C and 5 , and a sufficiently large dataset might identify a transformation hotspot, as has recently been found in Neisseria meningitidis [55].
The overlapping sequences do not have any obvious distinguishing features: divergence between donor and recipient is typical, no virulence genes have been annotated, and density of USSs is slightly lower than the genome average.
In addition to the selected antibiotic resistance alleles, the recombination events characterized here had the potential to significantly change the cell's biology, both by introducing new genes and by creating new genetic combinations by homologous recombination both between and within genes alleles.
Each recombinant clone also acquired donor-specific versions of 20—50 shared genes, and these may have altered phenotype both directly and because of new interactions with recipient alleles at unrecombined loci. Recombination breakpoints that were not at structural variation usually fell within genes Figure 5 and, because of the high level of sequence variation, these are likely to have created novel recombinant alleles potentially with substantial changes to function. The results presented above considered only four recombinant clones, but continuing advances in DNA sequencing technology and bioinformatics methods will allow characterization of many more recombinants under a variety of experimental conditions and using different donor DNAs.
This will help bridge experimental studies of transformation with the population genomic approaches used to detect recombination between bacterial lineages in nature. The comprehensive identification of donor segments in a large set of experimentally transformed clones will also provide a novel resource for the genetic mapping of phenotypes that differ between the donor and recipient strains, such as their dramatic natural variation in transformability [56] , as well as natural variation in pathogenesis-related traits like serum-resistance [57] — [58].
Standard protocols were used for growth and manipulation of H. The H. For the Nov R allele of gyrB , a 2. For the Nal R allele of gyrA , a 2. Experiments were performed in triplicate from frozen aliquots of competent cultures prepared on three separate occasions. Cells from defrosted aliquots were pelleted and resuspended in fresh MIV before transformation. The reference sequences for Rd and NP were compared using the Mauve whole-genome alignment software [42].
The complete genome sequences were aligned twice, once with Rd as the query and once with NP as the query. The few identified SNVs that were inconsistent between the two independent whole-genome alignments were excluded.
The two resulting files provided positions of each SNV in each genome, ordered against one or the other reference. About 10 million paired-end sequences of 42 bases were obtained from each library on individual lanes of an Illumina GA2 flow cell Table S1. Raw data was processed using Illumina Pipeline Version 1.
While this generates some spurious mapping artifacts, it ensures that reads will map to both references when possible, even where there is high divergence. A combination of two criteria was used to identify differences between sequence reads and their references and to flag positions with ambiguous base identity.
The first method used the SamTools version 1. Reference positions missing from the SamTools consensus were treated as unmapped positions presumably within or near deletions. The second method used direct calculation of the frequency of each base at each reference position. Parsed pileup files were subsequently analyzed using custom scripts written in the R statistical programming language [61]. Transformant sequence reads were analyzed as above.
0コメント