Dynamic Methylation of an L1 Transduction Family during Reprogramming and Neurodifferentiation

The retrotransposon LINE-1 (L1) is a significant source of endogenous mutagenesis in humans. In each individual genome, a few retrotransposition-competent L1s (RC-L1s) can generate new heritable L1 insertions in the early embryo, primordial germ line, and germ cells.

Here, we identified a reprogramming-associated de novo L1 insertion in a cultivated hiPSC line. This insertion was traced to a hot donor RC-L1 that was part of an extended and recently active transduction family. We then measured locus-specific DNA methylation among de novo, donor, and transduction family L1 promoters, as well as the L1-Ta subfamily genome-wide, at multiple points of neurodifferentiation. These experiments significantly elucidate the dynamic temporal profile of epigenetic L1 repression applied to new and extant L1 insertions during neurogenesis.

RESULTS
A de novo L1 insertion arising during reprogramming. To study endogenous retrotransposition during neurogenesis, we obtained two hiPSC lines (hiPSC-CRL1502 and hiPSC-CRL2429) generated via delivery of defined reprogramming factors to healthy human dermal fibroblasts (58,76). We then differentiated each hiPSC line toward a neuronal phenotype for 156 days in culture (Fig. 1A) and applied retrotransposon capture sequencing (RC-seq) (58,69,77) to genomic DNA sampled from the parental fibroblasts (time point 0 [T 0 ]), hiPSCs (T 1 ), and several time points of differentiation (T 2 to T 6 ) ( Table 1). Two earlier passages of each hiPSC line were also analyzed by RC-seq to better distinguish L1 insertions arising during reprogramming or cell cultivation (Table 1). Cells from each point of neurodifferentiation were characterized by immunocytochemistry (Fig. 1A) and included neural epithelium (T 2 ), neural rosettes denoting immature neurons (T 3 ) and three stages of prolonged neuronal maturation (T 4 to T 6 ). Endogenous L1 insertions detected by RC-seq and absent from the reference genome were annotated as either polymorphic (previously published or present at T 0 ) or de novo (only present at T 1 or later in one time course). Two potential de novo L1 insertions were identified (see Table S1 in the supplemental material). We then performed insertion site-specific PCR validation for each event ( Fig. 1B and Table 2) and found that one insertion, on chromosome 1 (Chr1), was de novo in hiPSC-CRL2429 cells at time point T 1, was carried through neurodifferentiation (Fig. 1C), and was absent from hiPSC-CRL1502 (Fig. 1C). PCR indicated that the other putative de novo event was polymorphic because it was found in the matched parental fibroblast population ( Table  2 and Table S1).
We then cloned and capillary sequenced the entire de novo L1 insertion (Fig. 1D) and manually inspected the integration site for hallmarks of TPRT (8,9,16,17). The L1 was full length, belonged to the L1-Ta subfamily, carried 5= and 3= transductions, was flanked by 16-nucleotide (nt) TSDs, inserted at a degenerate L1 endonuclease motif (5=-TT/AAAG), and terminated with a 33-nt poly(A) tract. The 5= and 3= transductions were 10 nt and 44 nt in length, respectively, and the 3= transduction was preceded by an internal 17-nt poly(A) tract (Fig. 1D). These features were consistent with endoge- ]) were reprogrammed to obtain hiPSCs (T 1 ), which were then sampled at 5 points (T 2 to T 6 ) of neuronal differentiation in extended cell culture. Immunocytochemistry was used to characterize expression of marker genes (OCT4, NANOG, PAX6, TUJ1, CUX1, and GFAP gemes) and histone 3 phosphorylation (PH3), as associated with various stages of neural cell maturation, with Hoechst staining of DNA. (B) L1 insertion PCR validation strategies. Green and blue arrows, respectively, represent primers targeting the 5= and 3= genomic flanks of an L1 insertion (rectangle). Black arrows represent primers specific to the L1 sequence. Combinations of these primers are used to generate the following amplicons (arranged top to bottom): 5= L1-genome junction, 3= L1-genome junction, L1 insertion (filled site), and empty site. (C) PCR validation results for a de novo L1 insertion detected in cell line hiPSC-CRL2429. An empty/filled PCR was also performed with cell line hiPSC-CRL1502 as a negative control. Red and black arrow heads indicate the expected filled and empty site band sizes, respectively. NTC, nontemplate control. (D) De novo L1 insertion sequence structure. In addition to TSDs (triangles), the full-length L1-Ta insertion was flanked by 5= (orange) and 3= transductions (purple). (E) The same experiments as described for panel C except that they were performed for the donor L1 responsible for the de novo L1 insertion (left) and its lineage progenitor L1 (right), using CRL2429 fibroblast genomic DNA. nous retrotransposition mediated via TPRT and, as confirmed by insertion site-specific PCR, showed that the de novo L1 insertion represented a bona fide retrotransposition event occurring during reprogramming, or very early in hiPSC-CRL2429 cultivation.
An extended human RC-L1 transduction family. The de novo L1 insertion was the first such example to be found in hiPSCs of an endogenous L1 insertion carrying both 5= and 3= transductions. These transductions uniquely indicated a donor L1 sequence on chromosome 3 that was heterozygous in the hiPSC-CRL2429 parental fibroblast population (Fig. 1E). The donor L1 was absent from the reference genome and was polymorphic in humans; it was previously shown to mobilize efficiently in vitro (35). To identify any other germ line L1 insertions closely related to the donor L1, we aligned the 3= transduced sequence to the reference genome and to the annotated 3= L1genome junction sequences of polymorphic L1s carried by hiPSC-CRL2429 or hiPSC-CRL1502 (Table S1) or those annotated by previous studies (52,58,69,70,77,78). We further annotated this list with results obtained by previous studies of L1 mobilization in the germ line, tumors, and cancer cell lines (3,35,37,49,77,(79)(80)(81)(82)(83)(84)(85)(86)(87). From this analysis, we reconstructed an extended L1 transduction family comprising 14 members (Table 3), including a plausible founder, or lineage progenitor (44), element for the family, which was homozygous in hiPSC-CRL2429 and located on chromosome 11 (Fig. 1E).
To further characterize the transduction family, we analyzed the complete internal sequence of eight of its members found in either hiPSC-CRL1502 or hiPSC-CRL2429, including the de novo L1 insertion. A consensus sequence was obtained for the lineage progenitor, donor, and de novo L1s, as well as for another L1 nonreference (Non-ref) element, named Non-ref_Chr3_p24.3, via capillary sequencing of multiple full-length amplicons derived from independent PCRs (Fig. 2). Internal and flanking sequences for four additional reference (Ref) elements (Ref_Chr7_q21.3, Ref_Chr1_p31.1a, Ref_Chr1_p31.1b, and Ref_Chr9_p23) were obtained from the reference genome assembly. The 5= and 3= L1-genome junctions of the remaining six nonreference elements (Non-ref_Chr3_p12.2_a, Non-ref_Chr3_p12.2_b, Non-ref_ChrX_p11.4, Non-ref_Chr17_ q12, Non-ref_Chr1_p22.2, and Non-ref_Chr4_q12) were provided by previous studies (Table 3). Notably, the homozygous lineage progenitor L1 had two allelic variants in hiPSC-CRL2429 cells, which were distinguished by four single nucleotide variants. Allele 1 contained a nonsynonymous change (D523H) in the ORF2p RT domain, which was not found in allele 2. Further analysis of the remaining family members relative to the sequence of L1.3 (88) indicated that each contained internal single nucleotide variants   common to both progenitor element alleles, in addition to shared 3= transduced sequences (Fig. 2). The de novo and donor elements were identical in their L1 sequences, and the 5= transduced sequence carried by the de novo insertion exactly matched the 10 nt directly upstream of the donor element. Surprisingly, in addition to the de novo L1 insertion, two other elements, Ref_Chr1_p31.1_a and Non-ref_ChrX_p11.4, each carried both 5= and 3= transductions, enabling us to unambiguously identify their respective donor L1 sequences (the lineage progenitor and Ref_Chr7_q21.3, respectively), which were also members of the transduction family ( Fig. 2). Interestingly, the 539-nt 5= transduction carried by Ref_Chr1_p31.1a was preceded by a single untemplated guanine, suggesting that the template mRNA was capped (18,89), and utilized a transcription start site in the 5= long terminal repeat (LTR7Y) sequence of a human endogenous retrovirus type H (HERV-H) provirus integrated ϳ126 kb upstream of the lineage progenitor L1 (Fig. 2). This mRNA template incorporated two exons upstream of the lineage progenitor L1, which were spliced together and to the L1 via sites strongly resembling consensus mammalian splice donor and acceptor sequences (Fig. 3). Another element, Non-ref_Chr3_p24.3, incorporated a nonsense mutation predicted to truncate ORF2 prior to its RT domain. In sum, these experiments characterized relationships among members of a transduction family, which, in many cases, remain potentially capable of retrotransposition in the germ line, in tumors (37,49), and, as shown here, in hiPSCs.

Transduction family mobilization in vitro.
To assess the retrotransposition competence of several members of the transduction family, we employed a cultured-cellengineered L1 retrotransposition reporter assay (8) in HeLa cells. Briefly, in this assay, an L1 sequence is cloned into a vector containing an antibiotic resistance cassette oriented antisense to the L1 copy, where the resistance gene contains an intron oriented in sense to the L1, meaning antibiotic resistance occurs only after splicing and retrotransposition of the reporter cassette (8,90) (Fig. 4A). Through this approach, we tested the following elements: a known hot RC-L1 (L1.3) as a positive control (88,91), an RT mutant L1 (L1.3 RT Ϫ ) as a negative control (6), both detected alleles of the lineage progenitor L1, the donor L1 (identical in sequence to the de novo L1), and Non-ref_Chr3_p24.3, which contained an ORF2 stop codon in its RT domain (Fig. 2). Each element was tested in triplicate experiments under the control of its native L1 promoter (Fig. 4B).
Among the tested elements, the lineage progenitor L1 allele 2 exhibited the highest retrotransposition frequency activity, at 135% of L1.3 (Fig. 4B). Consistent with the progenitor L1 allele 1 carrying two nonsynonymous mutations in ORF2 not found in allele 2, resulting in Q159H and D523H amino acid changes (Fig. 2), we found allele 1 retrotransposed at ϳ74% of the efficiency observed for allele 2 and at a similar efficiency as seen for L1.3 (Fig. 4B). Each progenitor L1 allele jumped at Ͼ10% of the efficiency of L1.3 and therefore met the definition of a hot RC-L1 (35). Notably, an allele of the progenitor L1 had previously been tested, albeit in an osteosarcoma cell line and with a different reporter system, and was found to present much more limited mobilization potential in vitro (3). The most likely explanation for this difference is that the prior study tested an allele of the progenitor L1 not assayed here. This result further highlights the impact of allelic variation upon the retrotransposition efficiency of a given genomic RC-L1 copy (38,39).
The donor L1 was sequenced from a line (hiPSC-CRL2429) established from a Caucasian individual. Apart from a single nucleotide mutation in its 3= UTR, this L1 was identical to one identified in a Japanese individual by a previous study, which reported its retrotransposition efficiency as 101% of L1.3 in the same reporter assay (35). Here, the donor L1 jumped at 117% of L1.3, corroborating the prior experimental results and confirming that retrotransposition-competent alleles of this L1 exist in multiple human populations. Finally, L1.3 RT Ϫ and Non-ref_Chr3_p24.3 did not retrotranspose, consistent with disabled ORF2 RT activity in each case (Fig. 4B). Overall, these results demonstrate that the de novo L1, its donor sequence, and the progenitor element of the transduction family were all hot RC-L1s in vitro.
L1 promoter methylation is dynamic during neurodifferentiation. Full-length L1 mRNA transcription is a prerequisite for L1 retrotransposition in cis and is directed by an internal promoter located in the L1 5= UTR (25). DNA methylation of an adjacent CpG island mediates repression of the L1 promoter (26,31). Genome-wide, the L1-Ta subfamily is thought to be broadly hypomethylated in pluripotent cells and then methylated during differentiation, including in mature neurons (40,49,58,61,63,67). However, the temporal methylation patterns for the L1-Ta subfamily and individual L1-Ta promoters during the various stages of neurodifferentiation to date have not been resolved. It is also unknown how quickly methylation is established upon new L1 insertions that arise in pluripotent cells. To address these questions, we applied a multiplexed L1 locus-specific bisulfite sequencing approach (52, 78) ( Fig. 5A and Table  2) to assess DNA methylation among the de novo, donor, and progenitor L1 5= UTR sequences, as well as the L1-Ta subfamily genome wide. This analysis was performed for both hiPSC lines and their parental fibroblasts and derivative neuronal cell populations, as surveyed by RC-seq, with the exception of the de novo L1, which was present only in hiPSC-CRL2429 ( Fig. 5B and 6).
Considering general trends observed in both hiPSC lines, the L1-Ta subfamily and individual L1 promoters were most methylated in fibroblasts and differentiated neurons and least methylated in hiPSCs and the earliest stages of neurodifferentiation ( Fig.  5B and 6A). For example, 66.6%, 31.1%, and 61.0% of CpG dinucleotides surveyed in the donor L1 were methylated, on average, in hiPSC-CRL2429 fibroblasts, hiPSCs, and mature neurons, respectively. Among the two hiPSC lines, the highly significant (P Ͻ 0.0001, paired t test with Bonferroni correction) reductions in methylation observed for the donor L1 during hiPSC derivation (25.0% on average) far exceeded that seen for the lineage progenitor (12.5%) and L1-Ta subfamily (2.9%) (Fig. 5C and 6B). The lineage progenitor L1 was significantly (P Ͻ 0.001, paired t test) more methylated than the donor L1 at all time points in each hiPSC line, with the L1-Ta subfamily being methylated to a level between that of the lineage progenitor L1 and donor L1 at most time points (Fig. 5C and 6B). Notably, we observed a significant (P Ͻ 0.001, paired t test with Bonferroni correction) reduction in methylation (23.1% average decrease) for all amplicons at T 5 in hiPSC-CRL2429, followed by a significant (P Ͻ 0.01) increase in methylation at T 6 (20.1% average increase) (Fig. 5C). This trend was also observed at T 5 for hiPSC-CRL1502, except for the donor L1 (Fig. 6B). The reasons for this pattern are presently unclear (see Discussion). Overall, these results demonstrate that DNA methylation is far more dynamic during reprogramming and differentiation for a donor L1 that can mobilize during or shortly after reprogramming than is seen for the vast majority of L1-Ta subfamily elements.
The de novo L1, which arose in hiPSC-CRL2429, could be detected at its 5= L1genome junction by site-specific PCR at time points T 1 through T 6 (Fig. 1C). However, as assessed by the number of unique sequencing reads generated, the PCR amplicon pool for the de novo L1 was very low in complexity at T 1 , perhaps due to a low percentage of cells carrying the mutation, and we therefore excluded T 1 from further analysis. The de novo L1 was nonetheless consistently less methylated than its donor L1 in hiPSC-CRL2429 time points T 2 through T 6 , with average values across these stages of 41.6% and 53.8%, respectively (Fig. 5B). Methylation ultimately increased upon the de novo L1 during neurodifferentiation, but even in neurons we observed a significant number of cells in which the de novo L1 promoter was fully demethylated. For the donor L1 and the L1-Ta subfamily, we also observed instances of cells in which these promoters were fully demethylated at various points of neuronal differentiation   (Fig. 5B, Fig. 6A). These results suggest that the de novo L1 was only partially methylated subsequent to its integration into the hiPSC-CRL2429 genome and remained incompletely methylated in mature neurons.
Given the disparate methylation levels observed for the de novo and donor L1 promoter regions compared to the level of the lineage progenitor L1, we examined predicted DNA-binding protein motifs (92) affected by sequence variation among these elements (Fig. 2). The 10-nt 5= transduction carried by the de novo L1 insertion incorporated a perfect FOX (forkhead box) protein binding motif (93). Members of the FOX protein family can act as "pioneer" factors in the developmental activation of promoters located in heterochromatin (94). In addition, the T708C nucleotide mutation present in the de novo and donor L1 copies greatly increased the predicted binding affinity for retinoid X receptor (RXR) proteins to this site. RXR proteins are known to respond to vitamin A (95), which is a component of the B-27 medium used here for neurodifferentiation. Conversely, the C581A nucleotide mutation carried by the lineage progenitor L1, and not by the de novo or donor L1 sequences or any other member of the transduction family, removed a key nucleotide mismatch from the core of a predicted PU.1 binding motif. PU.1 is established to recruit DNA methyltransferases to genomic loci and to form a repressor complex with MeCP2, which is a key mediator of L1 silencing (96)(97)(98). These in silico analyses suggested that differential DNA-binding protein activity as a result of sequence variation may impact the methylation and transcriptional state of members of the transduction family.

DISCUSSION
The L1 transduction family identified here is the largest found to date and adds to other such families characterized by previous studies (35,44,54). Although the extent of the transduction family is revealed here, it is likely that additional members will be identified in the future. It should also be noted that each transduction family member, aside from the de novo L1, was either present in the reference genome or identified by earlier works (Table  3). Unusually, in addition to 3= transduced sequences, 3 of the 14 family members carried 5= transductions. This 5= transduction frequency (21.4%) is exceptionally high, given how rarely such events are found in the human germ line (1). Two of the 5= transductions were relatively short (10 nt, de novo L1; 18 nt, Non-ref_ChrX_p11.4) and likely resulted from the L1 promoter directing mRNA transcriptional initiation upstream of L1 position ϩ1. The third 5= transduction identified was significantly longer (539 nt, Ref_Chr1_p31.1_a) and resulted from transcription initiated by the 5= LTR of an upstream HERV-H proviral sequence, followed by splicing of this mRNA into a site adjacent to the donor L1. The inclusion of both LTR and internal HERV-H sequences in an L1 5= transduction was an intriguing result as most heritable L1 insertions appear to arise early in mammalian embryogenesis (55,56), and HERV-H elements are highly expressed in pluripotent cells (99)(100)(101)(102)(103). To speculate, this example demonstrates how HERV-H activation in the early embryo could lead to L1 mobilization. Nonetheless, it remains unclear why 5= transductions are generally so frequent in this family and not in other transduction families (35,44,54). One possibility, an ORF2p amino acid change supporting elevated RT processivity and therefore increased average L1 insertion length, was excluded by an inspection of nonsynonymous sequence variants in this region (Fig. 2). Also excluded was the more likely possibility of mutations in known YY1, RUNX3, or SOX transcription factor binding sites (41,104,105) in the lineage progenitor L1 5=UTR or in alternative predicted sites located in the immediate 100 nt of its 5= genomic flank, which may alter the accuracy of RNA polymerase II transcriptional initiation (Fig. 2). Otherwise, the family exhibited extensive variation in 3= transduction and poly(A) tail length, as reported elsewhere for L1 insertions arising from a common donor L1 in the human population and cancer genomes (32,37,44,49,52,78).
The discovery of a de novo L1 insertion in hiPSC-CRL2429 corroborates previous reports of endogenous and engineered L1 retrotransposition associated with reprogramming and hiPSC cultivation (58,61). L1-mediated mutagenesis is potentially an important consideration for the use of hiPSCs in biomedical applications and as models of disease because the phenotypic properties of hiPSCs and their cellular derivatives could be compromised as a result of de novo L1 insertions (58,106). We demonstrate here that an endogenous L1 insertion arising in an hiPSC line is maintained during neurodifferentiation, indicating that such events can be present in differentiated cell lines derived from hiPSCs. In this case, the L1 was intergenic, and the accompanying transductions did not include protein-coding exons or regulatory elements (47), lessening the probability of a functional impact in neurons carrying the L1 insertion. Although endogenous L1 retrotransposition is established to occur in the neuronal lineage (65), we did not identify any additional de novo L1 insertions that were restricted to neural cells. These events were likely to each be carried by very few cells, meaning that they may not accrue sufficient RC-seq read depth to meet the detection thresholds used here. Nonetheless, it is plausible that de novo L1 insertions that impact the phenotype of hiPSC-derived cells will be identified in the future, especially as gene expression changes have been observed coincident with intronic L1 insertions arising during hiPSC generation (58).
DNA methylation is thought to be established on L1 sequences very early in mammalian embryogenesis (27,28,58,61,63,67) and maintained in mature neurons. To our knowledge, L1 promoter methylation has not been explored for the various multipotent and immature neuronal cell types that arise during neurogenesis. Using in vitro hiPSC neurodifferentiation to represent neuronal development and maturation in vivo, we found that L1 promoter methylation was highly dynamic and increased as neurons matured. In each hiPSC line studied, we observed cells at multiple stages of neurodifferentiation, including mature neurons, where the donor L1 and other L1-Ta promoters were fully demethylated. Although the donor L1 was demethylated in hiPSCs compared to the methylation level of the matching parental fibroblasts, the absolute magnitudes of this change were dissimilar in the two lines (35.5% and 14.4% for hiPSC-CRL2429 and hiPSC-CRL1502, respectively). This perhaps reflected natural variation in the cohort of RC-L1s hypomethylated in each individual, before and after reprogramming. At time point T 5 , which follows a gliogenic switch (107-109) during neural differentiation, we also observed a consistent reduction in L1 promoter methylation. This phenomenon could reflect a genome-wide reduction in DNA methylation specific to this stage of neurodifferentiation, perhaps due to a shift in the proportion of glial and neuronal cells present in culture, and warrants further study.
The de novo L1 insertion appeared to be rapidly targeted for repression by the host genome. During neurodifferentiation, similar transitions in methylation were observed for the de novo, donor and lineage progenitor L1s, and the L1-Ta subfamily even if the absolute methylation levels were very different among these elements. This result was consistent with epigenomic remodeling during reprogramming and neurodifferentiation (110,111) impacting the ground state of L1 methylation genome-wide. It also suggested that the de novo L1 insertion was quickly identified and regulated by the same pathways acting upon extant L1 copies on the genome even if the degree of methylation upon the de novo L1 was significantly lower than that applied to the transduction family and its ancestral L1-Ta subfamily. L1 5' UTR sequence variants, for example the C581A nucleotide mutation carried by the lineage progenitor L1 and predicted to increase DNA methylation mediated by PU.1, could contribute to differential methylation patterns among members of the transduction family. It is also notable that the de novo L1 remained retrotransposition competent, as do many other L1 insertions occurring in hiPSCs or arising during human embryogenesis (57,58). To speculate, if hiPSCs are taken as a model of very early development, a milieu where most heritable L1 insertions arise (55), it is plausible that RC-L1 insertions arising de novo in this context will be incompletely methylated during later development and therefore possess a disproportionate capacity for further mobilization in the soma. Ultimately, hiPSCs and hESCs present accessible models to predict how L1 subfamilies and individual L1 loci are regulated. Additional work is required to test whether these patterns are observed during mammalian development in vivo.

MATERIALS AND METHODS
hiPSC generation and neuronal differentiation. Human induced pluripotent stem cell lines were episomally derived as previously described (76). Neuronal differentiation was performed as described previously (112) with slight modifications. Prior to neuronal differentiation, feeder-free hiPSCs were cultured in murine embryonic fibroblast (MEF)-conditioned KOSR medium supplemented with 100 ng/ml basic fibroblast growth factor (b-FGF). Initiation of neuronal differentiation occurred with the supplementation of dual SMAD inhibitors SB431542 (10 M) and dorsomorphin (1 M) into knockout serum replacement (KOSR) medium, which was gradually exchanged for 3 N medium ( Nucleic acid extraction. A total of approximately 500,000 cells per time point were pelleted (1,000 rpm for 5 min) and then washed with Dulbecco's phosphate-buffered saline (DPBS) (14190144; Gibco) and pelleted again (1,000 rpm for 5 min) and resuspended in 100 l of UltraPure DNase/RNasefree distilled water (10977023; Gibco). Cells were lysed in 10 mM Tris, pH 9.0, and 1 mM EDTA, with 2% SDS and 100 g/ml proteinase K at 65°C. A final concentration of 10 g/ml RNase A was added to each sample and incubated at 37°C for 30 min. DNA was extracted using phenol-chloroform-isoamyl alcohol (25:24:1) and chloroform-isoamyl alcohol (24:1). DNA was precipitated with 0.1 volume of 3 M sodium acetate and 2.5 volumes of 100% isopropanol. Precipitated DNA was washed in 0.8 ml of 75% ethanol (EtOH), slightly air dried, and resuspended in 50 l of UltraPure DNase/RNase-free distilled water (10977023; Gibco). The quality and quantity of DNA were assessed by NanoDrop (Thermo Fisher Scientific).
PCR validation of L1 insertions. RC-seq reads indicating putative de novo L1 insertions were manually inspected, and primers (Table 2) were designed to PCR amplify integration sites and identify the hallmarks of bona fide L1 retrotransposition events (117). Empty/filled-site, 5= L1-genome junction, and 3= L1-genome junction PCRs were performed. Primers were situated within flanking genomic DNA sequences for empty/filled-site PCRs. The same flanking primers were paired with appropriate L1-specific primers for L1-genome junction assays. Expand long-range enzyme was used for empty/filled-site PCRs using 1.75 U of Expand Long Template enzyme (04829069001; Roche), 5 l of 5ϫ buffer with 12.5 mM MgCl 2 , 1.25 l of 100% dimethyl sulfoxide (DMSO), 1.25 l 10 mM deoxynucleoside triphosphates (dNTPs), 1 l of primer mix (25 M each primer), 4 ng of genomic DNA template, and molecular-grade water in a final volume of 25 l under the following PCR conditions: 92°C for 2 min, followed first by 10 cycles at 92°C for 10 s, 59°C for 15 s, and 68°C for 6.5 min and then by 30 cycles at 92°C for 2 min, 59°C for 15 s, and 68°C for 6.5 min plus 20 s of extension time per cycle, with a single extension step at 68°C for 10 min. The 5= and 3= L1-genome junction PCRs were performed using 2 U of MyTaq hot-start DNA polymerase (BIO-21112; Bioline), 1ϫ PCR buffer, 1 M each primer, 5 ng of genomic DNA template, and molecular-grade water in a final volume of 25 l. Cycling conditions were as follows: 95°C for 2 min, followed by 35 cycles at 95°C for 30 s, 58°C for 30 s, and 72°C for 3 min, with a single extension step of 72°C for 5 min. Amplified fragments were resolved on 1% and 2% agarose gels (1ϫ Tris-acetate-EDTA [TAE] buffer) stained with SybrSafe (Life Technologies) for empty/filled-site and 5= and 3= junction PCR assays, respectively, and imaged using a Typhoon FLA 9500 (GE Healthcare Life Sciences, USA). Amplicons of the expected size were excised from the gels, and DNA was extracted using a QIAquick gel extraction kit (28704; Qiagen), followed by capillary sequencing to confirm and characterize L1 insertion structural features.
L1 genotyping and cloning. To facilitate cloning of full-length L1 insertions, a NotI restriction enzyme sequence (5=-GC/GGCC) was introduced at the 5= end of each forward primer close to the L1-genome junction. Purified PCR products (500 ng) approximately 6 kbp in size were digested with NotI and Bstz17I (R3138; New England Biolabs) in 1ϫ CutSmart buffer at 37°C for 1 h. Digestion reactions were run in 2% agarose gels (1ϫ TAE buffer), purified by phenol-chloroform extraction, and cloned into the vector TOPO-XL PCR cloning kit (K4700-20; Life Technologies) according to the manufacturer's instructions. Five microliters of the ligation product was used to transform One Shot TOP10 electrocompetent bacteria as per the manufacturer's instructions. LB agar containing 0.5 g/ml of kanamycin was used to plate bacteria, which were incubated at 37°C overnight. Single colonies were picked and transferred to 5 ml of LB liquid containing 0.5 g/ml of kanamycin for Miniprep plasmid purification (12143; Qiagen).
To filter induced PCR mutations and distinguish possible allelic variants, at least four independent PCR products, and clones from each L1 transduction family member were capillary sequenced using 12 overlapping primer pairs ( Table 2) distributed at ϳ500-bp intervals covering the entire L1 sequence. Each independent clone sequence was then manually assembled and aligned with the other clones of the same element using Clustal Omega (https://www.ebi.ac.uk/Tools/msa/clustalo/). For each L1, a consensus sequence was obtained, and a mutation-free construct was reconstructed by performing multiple restriction enzyme digestions. The desired fragments were resolved in a 2% agarose gel (1ϫ TAE buffer), purified, and ligated into a pCEP4 vector using T4 ligase in a 5:1 (insert/vector) ratio. Five microliters of the ligation product was used to transform One Shot TOP10 chemically competent bacteria (C404010; Invitrogen) as per the manufacturer's instructions. LB agar containing 1 g/ml of ampicillin was used to plate the bacteria, and these were incubated at 37°C overnight. Single colonies were picked and transferred to 5 ml of LB liquid containing ampicillin for Miniprep plasmid purification. To verify the fidelity of the resultant clones, these were capillary sequenced, as described above, using 12 different primers covering the entire L1 sequence.
Retrotransposition indicator plasmids termed L1.3 and L1.3 RT Ϫ were generated through modification of the pCEP4 backbone of pJM101/L1.3 (14,91) and pJM105/L1.3 (118) by removing a BgIII fragment containing the cytomegalovirus (CMV) promoter. The full L1.3 3= UTR, except for a point mutation disrupting the native L1 polyadenylation signal, was reintroduced, and a PacI site was incorporated between the L1.3 3= UTR and the Neo cassette (F. J. Sanchez-Luque and G. J. Faulkner, unpublished data). The mutation-free full-length transduction family members described above were then introduced into this retrotransposition indicator backbone.
DNA-binding protein motif analyses of the lineage progenitor, donor, and de novo L1 sequences were performed using the Catalog of Inferred Sequence Binding Preferences (CIS-BP) database (92).
Retrotransposition assay. HeLa-JVM cells grown in a humidified, 5% CO 2 incubator at 37°C in high-glucose Dulbecco's modified Eagle's medium (DMEM) without pyruvate (11965-092; Gibco), supplemented with 10% fetal bovine serum (26400-044; Gibco), 2 mM L-glutamine, 100 U/ml penicillin, and 100 g/ml streptomycin (10378-016; Gibco) (DMEM complete). Plasmid DNA was purified using a Midi kit (13343; Qiagen) and diluted in sterile water to 0.5 g/l. Cells were transfected and seeded at 5 ϫ 10 3 cell/well in six-well plates using FuGENE HD transfection reagent (Promega) at a ratio of 4 l to 1 g of plasmid DNA. Selection with G418 began 72 h after transfection and continued every 48 h for 14 days (6). Transfection efficiency assays were performed in parallel by cotransfection of pCAG-enhanced green fluorescent protein (EGFP) with L1 reporter plasmids, as described above, with 0.5 g of each construct and 0.5 g of pCAG-EGFP. Cells were analyzed by flow cytometry 48 h posttransfection on a Cytoflex flow cytometer (Beckman-Coulter) at the Translational Research Institute Flow Cytometry Core. The results were used to normalize the G418-resistant colony counts with the percentage of EGFP-positive cells for each L1 reporter construct obtained in the retrotransposition assay, as performed previously (118). L1 CpG methylation analyses. L1-Ta subfamily-wide and L1 locus-specific bisulfite sequencing for each time point in hiPSC-CRL1502 and hiPSC-CRL2429 was performed as described previously (52). Briefly, 500 ng of gDNA was bisulfite treated using an EZ DNA Methylation Lightning kit (Zymo Research), allowing 20 min desulfonation time and eluting in a 25-l volume. Primers L1_Bis-F and L1_Bis-R were used to amplify the L1-Ta 5= UTR region containing a CpG island (Table 2), while for the L1 locus-specific reactions, L1_Bis-R was combined with one of three forward primers placed in the genomic flank of the lineage progenitor, donor, and de novo L1 insertions (L1_Bis-LP, L1_Bis-Donor, and L1_Bis-DN, respectively). PCRs incorporated 1 U of MyTaq hot-start DNA polymerase (BIO-21112; Bioline), 2 l of bisulfitetreated gDNA from each sample, 1ϫ reaction buffer, and 2 M each primer, in a 20-l final volume. PCR cycling conditions were as follows: 95°C for 2 min, followed by 40 cycles of 95°C for 30 s, 54°C for 30 s, and 72°C for 30 s, with a single extension step at 72°C for 5 min. Barcoded libraries were prepared from amplicons pooled by time point and sample using a TruSeq DNA PCR-free library preparation kit (FC-121-3001/2; Illumina) and subjected to multiplexed paired-end 2-by 300-mer sequencing using an Illumina MiSeq platform. Data were processed as described previously (52) and visualized using QUMA (119) with default parameters.
Accession number(s). RC-seq FASTQ files were deposited in the European Nucleotide Archive under accession number PRJEB27103.