To overcome this challenge, we identified 515 inoviruses and 258 DJR phages from 688 high-quality Vibrio spp. genomes and manually verified the attL/R junctions for each prophage. The resulting well-curated dataset of non-tailed prophage genomes provides valuable insights into these phages and, specifically, led to (i) a systematic classification of genomic variations among non-tailed phages and a characterization of their lysogeny-related elements, (ii) the identification of a conserved lysogeny module shared by specific members within the two phage groups, and (iii) the application of a combined strategy that integrates marker-based prediction with dif-like motif detection that significantly enhances the accuracy of predicting non-tailed prophage elements in bacterial genomes.
Vibrio genomes commonly harbor multiple homologous non-tailed prophages. However, Illumina-based short-read sequencing has a limited ability to resolve long repetitive regions, which hinders the reconstruction of complete prophage sequences from fragmented bacterial genomes (Supplementary Fig. 1). To comprehensively evaluate the evolutionary and ecological characteristics of these non-tailed prophages, we retrieved and analyzed 688 complete or chromosome-level Vibrio spp. genomes from the NCBI RefSeq database, with their unique identifiers recorded in Supplementary Data 1.
Using a hidden Markov model (HMM)-based search strategy, we identified prophage regions by detecting clustered gene arrangements with significant homology to conserved phage markers. Each candidate region, as shown in Supplementary Data 1, was then manual curated to precisely delineate prophage genomic boundaries -- specifically, the two prophage-bacterium junctions, attL and attR -- to ensure the extraction of intact prophage genomes. This approach yielded a dataset of 515 inoviruses and 258 DJR prophages (Fig. 1a, b), with detailed genomic coordinates and attL/R junction annotations available in Supplementary Data 2.
Our analysis revealed that ~60% (415/688) of the analyzed genomes contained at least one inovirus or DJR prophage. Notably, our dataset reflected the inherent bias of the RefSeq database toward pandemic Vibrio genomes, particularly Vibrio parahaemolyticus and V. cholerae (Fig. 1c). To quantify prophage prevalence without this bias, we calculated the proportion of genomes per species harboring either inoviruses (54.11 ± 18.76%; mean ± s.d.) or DJR phages (33.24 ± 22.02%).
Inoviruses establish lysogeny through either of the two recombination pathways: integrase (Int)-mediated site-specific recombination or by co-opting host XerC/D recombinases. In Vibrio species, identified inoviruses consistently use the XerC/D pathway, featuring dif-like sequences at the prophage's attL/R junctions and the encapsulated phage genome's attP sites. These dif-targeting elements lack an endogenous integrase. Instead, they leverage host XerC/D recombinases for integration into host dif sites (5'-ATTTAACATAA(N)TAATRCGHASY-3'; with the 5-bp inverted repeats underlined). For this reason, they have been termed "integrative mobile elements exploiting Xer (IMEXs)". Each of the two Vibrio chromosomes contains a dif site, where chromosome dimers are resolved by XerC/D during cell division. This conserved nature makes dif sites ideal for IMEX propagation. By presenting a dif-like attP, Vibrio inoviruses mimic this host strategy, with one recombined junction regenerating the canonical dif-motif (Fig. 2a) to thereby maintain both host fitness and IMEX success. While Int-encoding inoviruses are not found in Vibrio, they can be identified from other genera of γ-Proteobacteria (Supplementary Fig. 2).
Likewise, the DJR prophages employ the two parallel integration strategies: (i) site-specific integration by encoding a complete recombination machinery, including Int, repressor, and occasionally an excisionase (Xis), or (ii) host XerC/D exploitation. These two variations, designated "int-type" and "dif-type", respectively, coexist in Vibrio species. Both subtypes display overall similarity across core genomes, but differ in the lysogeny module, suggesting this variation arises from horizontal gene transfer (HGT) events. Prophage border curation revealed that dif-type members mimic full-length (28-bp), left-armed (17-bp), or right-armed (17-bp) dif-like attP (Fig. 2a) to target chromosomal dif-1 or dif-2 sites, which qualifies them as IMEXs as well. In contrast, int-type members show greater variation in their target site preferences. In addition to the previously described dusA locus encoding tRNA dihydrouridine synthase, we identified five previously unrecognized integration loci on Vibrio chromosome I (Fig. 2b).
The evolution of phages into IMEXs, which co-opt host recombination machinery rather than encoding their own, is a prime example of an evolutionary trade-off aimed at optimizing genomic resources. This strategy, which we term "evolutionary economy", is particularly advantageous for inoviruses, which are characterized by minimalist genomes encoding multifunctional proteins. For example, the zonula occludens toxin (Zot) exhibits dual functions in morphogenesis (via its N-terminal) and toxin secretion (via its C-terminal). Comparative genomics revealed that Vibrio inoviruses possess highly compact genomes (8.6 ± 2.0 kb, n = 495; average ± s.d.). For context, several Int-encoding inoviruses identified in other taxa have larger genomes, including six from γ-Proteobacteria (9.7-14.4 kb), one from α-Proteobacteria (11.5 kb), and three from Archaea (10.5-20.7 kb) (Supplementary Figs. 2 and 3). Similarly, the dif-type DJR phages' co-option of host recombination mechanisms contributes to their evolutionary economy compared to their int-type variants (12.9 ± 1.0 kb, n = 140 vs. 13.9 ± 0.9 kb, n = 97; Dunn-Bonferroni post-hoc test, p = 8.55e-13) (Fig. 3a, b).
Vibrio dif-type DJR variants share extensive genomic synteny with Pseudoalteromonas phage PM2, the first identified lipid-containing phage (Fig. 3a). Since the 1970s, PM2 has been intensively studied as a model for membrane morphogenesis, significantly advancing our understanding of lipid functions in phage capsid assembly and genome delivery. Although historically classified as strictly virulent, this designation lacks experimental support and is based solely on descriptive accounts. Our reanalysis indicates that this classification warrants reevaluation.
Mechanistically, phage integration involves reciprocal recombination between a phage attP and a bacterial attB. This process generates hybrid attL/R junctions that demarcate prophage boundaries and reposition the attP-flanking genes as prophage termini (as illustrated in Fig. 2 inset). The putative attP site of PM2 should consequently lie between the p15 and p16 genes, which are the canonical termini of dif-targeting DJR prophages. Indeed, we identified a 17-bp left-armed dif-like attP (5'-ATTTAACATAATATAAA-3'; NC_000867.1: 747-763 nt) at this precise locus. This genomic configuration is functional during lysogeny, as prophage excision has been experimentally validated in Pseudoalteromonas phage Cr39582, which harbors an identical 17-bp attP motif between p15 and p16. Of note, p16 (encoding a transcriptional repressor) and p15 (encoding a transcriptional regulator) form the sole leftward-transcribed operon in PM2-like genomes. Together with the dif-like attP, these elements appear to constitute a conserved lysogeny module. In contrast, autolykiviruses, which are capable of forming plaques across multiple Vibrio species, lack detectable lysogeny elements and retain only structural and lysis genes homologous to those of dif-type and int-type DJR subtypes (Fig. 3a). This confirms their classification as obligate lytic specialists within DJR phages. Collectively, these findings highlight the necessity of systematically characterizing phage lysogeny modules to fully elucidate phage life strategies.
The appearance of phage-encoded hypervariable regions (pHVRs) in inoviral genomes was first described in 1998, yet their specific roles remain poorly understood. As the characterization of anti-phage defense mechanisms has advanced, recent genomic analyses of Vibrio non-tailed prophages imply that these pHVRs are accessory genes required for phage defense mechanisms. Our independent evidence corroborates this hypothesis (Fig. 4 and Supplementary Data 3) and more specifically, our research clarifies which phage lineages encode pHVRs, along with their precise genomic locations, functions, and potential origins.
Through comparative genomics, we identified that the prevalence of pHVRs in inoviruses co-occur with the xafT gene (encoding the Xer recombination activation factor), leading us to categorize CTX-phages into two subtypes: CTX-I and CTX-II. Both subtypes begin their integrated genomes with the repressor gene rstR but exhibit distinct downstream genomic architectures (Fig. 5a): CTX-I variants contain pHVRs averaging 1.9 ± 1.3 kb in length, located between zot and xafT. CTX-II variants lack pHVRs and terminate with zot, with or without a few downstream pseudogenes. The framework to classify inoviruses targeting Vibrio can be based on the presence of CtxA/B and XafT: CTX+ (CtxA/B-positive and XafT-negative), CTX-I (CtxA/B-negative and XafT-positive), and CTX-II (CtxA/B-negative and XafT-negative).
It is noteworthy that dif-type DJR prophages, which encode a xafT homolog (p15), also contain pHVRs (averaging 1.9 ± 1.2 kb in length) flanked by p15 and a specific lysis gene encoding endolysin or holin (p5, gp-k, etc.). The int-type variants have pHVRs between the int gene and a specific lysis gene, with an average length of 1.7 ± 0.9 kb (Figs. 3a, and 4). Collectively, these pHVRs are not randomly placed but are consistently found between the phage lysis and lysogeny modules. Despite the difference in genome size, no significant variation in pHVR length was observed among CTX-I inoviruses, dif-type, and int-type phages (Kruskal-Wallis test, p = 0.869; Fig. 4). This indicates that the genomic space saved by not encoding a full recombination system is not reallocated to expand these variable regions. This supports our hypothesis that utilizing the host recombination system may favor overall genome compaction.
HGT represents a critical driver shaping pHVR composition. Gene clusters within these regions were found to have homologs from both bacterial contigs and other temperate phage genome (Supplementary Fig. 4), suggesting that pHVRs are hotspots for genetic exchange. Beyond diverse anti-phage roles, pHVRs may also aid the dissemination of virulence factors. For instance, several V. mimicus inoviruses encode the thermostable direct hemolysin (TDH), an exotoxin specific to V. parahaemolyticus, within their pHVRs (Supplementary Fig. 5). Nevertheless, the incorporation of pHVR can impose a genomic burden on streamlined phage genomes. Because of their presence, CTX-I prophages are significantly larger than CTX-II prophages (9.5 ± 1.7 kb versus 6.7 ± 0.7 kb, average ± s.d.; Dunn-Bonferroni post-hoc test, p = 7.69e-43), yet the former exhibits a 4-fold numerical advantage (n = 337) over the latter (n = 78). Their cross-species relative abundance further confirms a preference for the CTX-I subtype (paired t-test, p = 4.01e-05) (Fig. 5b), suggesting that the selective advantages gained from acquiring pHVRs outweigh the genomic burdens. Notably, we identified several degraded CTX-I elements lacking canonical phage features but retaining the pHVR signature (Supplementary Fig. 6), consistent with the evolutionary paradigm that phage relics often lose functional integrity but preserve host-adaptive genetic cargo. In summary, pHVRs serve as adaptive hubs, mediating multi-directional genetic exchange between non-tailed phages, bacterial hosts, and other temperate phages. Evolutionarily, this represents a mutually beneficial compromise, with bacteria tolerating prophage colonization in exchange for weaponized genetic innovation.
Early characterizations of inovirus f1 and DJR phage PM2 in the 1960s established fundamental distinctions between the two phage types. Inoviruses possess single-stranded DNA (ssDNA) genomes that are packaged within filamentous capsids, and execute chronic (non-lytic) infections, releasing progeny via membrane-associated secretion complexes. Conversely, DJR phages encapsulate their double-stranded DNA (dsDNA) genomes within icosahedron capsids and can execute lytic replication mediated by endolysin-holin systems for host cell lysis. These divergences in genetic material, morphology, and replication strategies justify their classification as biologically and phylogenetically distinct phage groups.
Despite these profound differences, comparative genomics has uncovered a strikingly conserved lysogeny module shared by CTX-I inoviruses and dif-type DJR phages, comprising three core components: P16/RstR (a transcriptional repressor), P15/XafT (a transcriptional regulator), and a dif-like attP site (Fig. 6a), with their protein sequences deposited in Supplementary Data 2. CTX-I RstR showed exclusive sequence homology to P16 of dif-type phages, with an average BLASTp identity of 30.6% ± 2.9% and bit-scores of 38.3 ± 4.8, while displaying no detectable similarity to RstR variants from CTX+ or CTX-II phages. Structural alignment of predicted P16 and CTX-I RstR protein chains confirmed their three-dimensional folds are highly conserved, with a root mean squared deviation (RMSD) value as low as 1.160. Phylogenetic reconstruction based on repressor structural models further supported their monophyletic origin (Fig. 6b), as P16 formed a distinct cluster alongside CTX-I RstR and showed no structural similarity to repressors from int-type DJR phages. The P15 and XafT regulators also fall into the same functional group despite their limited sequence similarity (BLASTp identities, 29.2% ± 2.3%; bit-scores, 36.8 ± 4.1) and moderate structural divergence (minimum RMSD, 2.979). Overall, the strong structural conservation between RstR/P16 and XafT/P15 pairs provides compelling evidence for their shared ancestry. Nevertheless, the relatively low sequence similarity suggests that any potential HGT event likely occurred in the distant past, allowing for substantial sequence divergence.
The three inoviral subtypes (CTX-I, CTX-II, and CTX+) and dif-type DJR phages are all IMEXs that integrate into host dif sites (Fig. 2a). Their single or sequential integration can lead to tandem array formation. Among the analyzed Vibrio spp. genomes, 12.6% contained at least one such array, and a substantial proportion of the identified non-tailed phages were part of these arrays (Fig. 1a, b): 19.9% of CTX-I inoviruses (67/337), 94.9% of CTX-II inoviruses (75/79), 45.7% of CTX+ inoviruses (43/94), and 25.6% of dif-type DJR phages (41/160). Two-unit arrays were the most common, accounting for 67.02% of all cases, whereas arrays with four to six units were considerably rarer (2.13%-1.06%; Supplementary Fig. 7 and Data 4). The combination of a CTX-I inovirus and a dif-type DJR phage represents approximately 43% of the two-unit arrays (27/63). It should be noted that accurate identification of prophage junctions remains challenging, as existing bioinformatic tools frequently misannotate adjacent prophages as a single entity. This mischaracterization of heterologous IMEX arrays underscores a significant source of ambiguity in related studies.
Using our curated prophage dataset as a reference standard, the accuracy of three widely used tools, CheckV, Virsorter2, and geNomad, was quantitatively measured using the intersection-over-union (IoU) metric (Fig. 7a, b). The tool geNomad demonstrated the best, with median IoU values of 77% (CTX-I), 26% (CTX-II), 31% (CTX+), 60% (dif-type DJR), and 88% (int-type DJR). The variability in IoU is primarily attributed to the tendency of specific phage subtypes in forming IMEX arrays. Indeed, geNomad's prediction was significantly more accurate when analyzing int-type DJR prophages, which cannot form tandem repetitions but exist in solitude -- resembling Int-encoding tailed phages that are either resistant to or displaced by superinfection with another phage encoding the same attP.
Precise identification of phage attL/R junctions enabled us to uncover important non-tailed phage evolutionary insights that were previously overlooked. However, manual curation of prophage borders proved time-consuming and labor-intensive, and most prophage studies must rely on automated bioinformatics tools. The growing availability of long-read sequencing data has unveiled the organizational complexity in IMEX arrays (Supplementary Fig. 8), underscoring the need for a refined methodology.
To improve this limitation, we adopted a combined strategy that integrates marker-based prediction and dif-like motif detection to enhance IMEX identification (see Supplementary Note 1 for details). The marker-based approach leverages the conserved spatial organization of the IMEX attP and its flanking genes, which in turn form fixed terminal gene pairs after recombination (Fig. 2 inset). Specifically, CTX-I and dif-type DJR phages invariably have rstR (p16) and xafT (p15) as their terminal genes; CTX-II and CTX+ phages are defined by the conserved termini rstR and zot, and rstR and ctxB, respectively. Additionally, the detection of dif-like motifs flanking or located between individual units provides direct evidence for determining their exact boundaries. To enable this, we provide the Bash script find_Vibrio_dif_motif.sh for identifying full-length and shorter arm-specific dif-like motifs. The conserved motif used by the script achieved high coverage in manually curated attL/R sites: ~81% (811/1000) for inoviruses and 93% (275/296) for dif-type DJR phages. This enabled more precise identification of IMEX boundaries: among the analyzed inoviruses, 60% (312/515) retained both attL and attR sites, enabling reconstruction of complete prophage regions, and another 36% (187/515) retained one site, improving boundary definition initially provided by the marker-based approach. Similarly, among the dif-type DJR phages examined, 74% (119/160) retained both attL/R sites and 23% (37/160) retained one site.
Benchmarking against existing tools revealed the superior performance of this strategy (Fig. 7b), which achieved median IoU values of 100% (CTX-I, n = 325), 90% (CTX-II, n = 76), 93% (CTX+, n = 94), and 100% (dif-type DJR, n = 135). However, this strategy is not recommended for the int-type DJR phages examined here. The automated detection of their attL/R sites remains challenging due to high sequence variability, a feature that nonetheless facilitates their integration into diverse genomic loci. Furthermore, unlike their dif-type counterparts, int-type elements typically possess only one invariant terminal marker (the int gene), while the other end lacks a conserved feature. A more precise solution involves manual BLASTn comparison between the parental bacterial genome and a prophage-free isogenic reference to precisely remove flanking host sequences and to refine prophage boundaries.