TaxonImprint Extraction Script Version 1.5 (Beta Release) Eikoniki, 2025 Overview TaxonImprint is an evidentiary metadata system for species level taxonomy. This document contains the strict extraction protocol used to produce deterministic, zero inference evidentiary token strings from taxonomic works. This script is provided as part of the TaxonImprint v1.5 Beta Release. Users may apply the protocol manually or through a large language model. Reproducibility can be tested independently on any species level taxonomic paper. No copyrighted text is included in this document. The rules below must be followed without exception. # # PURPOSE: # This file is the complete onboarding and extraction protocol for the Ev1 # evidentiary metadata system for extant Gastropoda. It is designed for an # LLM with *no prior biological knowledge*. It contains: # # • Full Revised Ev1 v1.3 Lexicon # – definitions for every token # – maximally explicit “applies_when” # – maximally explicit generic examples # – maximally explicit gastropod examples # – boundary notes for when NOT to apply tokens # # • Complete LLM extraction protocol # • Strict parsing and output rules # • Prohibitions # • Internal checklist for self-verification # • Example token strings # • Example output block # # SAVE THIS FILE AS: # Ev1_LLMScript_v1.5_STRICT_FULL.txt # ############################################################################### =============================================================================== SECTION A — INTRODUCTION (MANDATORY) =============================================================================== Ev1 is a controlled vocabulary that expresses the *types of evidence* used in species-level taxonomy for extant gastropods. Each token represents ONE specific, deterministic evidentiary unit. The goal of Ev1 is to label evidence, not to interpret, evaluate, or summarise it. This script teaches an LLM exactly how to: • read a taxonomic paper • identify species-level evidentiary statements • map each piece of evidence to Ev1 tokens • produce a strict, deterministic token list per species It allows an LLM with no background in systematics to perform correct Ev1 extraction. ------------------------------------------------------------------------------- IMPORTANT: EXAMPLE OF WHAT A TOKEN STRING LOOKS LIKE ------------------------------------------------------------------------------- A token string is a list of Ev1 tokens that apply to a species across its entire species-level treatment. For example, a realistic token string might look like: MORPH.SHELL MORPH.PROT MORPH.RAD IMAG.PHOTO GENE.LOCI.COXI PHYL.TREE STAT.ASAP VOUCHER.TYPE.HOLO VOUCHER.TYPE.PARA This example shows: • shell morphology used (MORPH.SHELL) • protoconch explicitly used (MORPH.PROT) • radula evidence (MORPH.RAD) • photographs used as evidence (IMAG.PHOTO) • COI gene used (GENE.LOCI.COXI) • phylogenetic tree used (PHYL.TREE) • ASAP delimitation used (STAT.ASAP) • holotype and paratypes examined (TYPE tokens) This EXAMPLE MUST NOT be copied into your output. It is *only* instructional. =============================================================================== SECTION B — EXECUTION PRINCIPLES (MANDATORY) =============================================================================== You MUST follow all rules exactly: 1. Only assign tokens defined in this file. 2. NEVER invent new tokens or reinterpret existing ones. 3. NEVER infer. If evidence is not explicitly stated, do NOT assign a token. 4. UNDER-CALL rather than over-call. 5. Do NOT use figures unless text explicitly references the figure as evidence. 6. SSH/PSH/OTU numbers are NOT evidence. 7. Evidence must be species-specific. 8. You MUST ignore general introduction, discussion, narrative, natural history, ecological background, or biogeographic commentary unless explicitly used as evidence in species delimitation. 9. Every token must be supported by explicit textual evidence. 10. Final output MUST follow the format in SECTION F. ------------------------------------------------------------------------------- CRITICAL RULE (v1.5 — UNBREAKABLE) ------------------------------------------------------------------------------- For EVERY species treated in a species-level taxonomic act (description, redescription, diagnosis, differential diagnosis, species delimitation), you MUST output the COMPLETE SET of Ev1 tokens that apply to that species. Species NOT treated taxonomically MUST NOT appear in your output. =============================================================================== SECTION C — HOW TO APPLY TOKENS (LOGICAL PROCESS) =============================================================================== You MUST follow this process exactly: STEP 1 — Identify focal species Species that are: • described • redescribed • diagnosed • compared/differentiated • delimited molecularly or morphologically STEP 2 — Identify explicit evidentiary statements Evidence counts ONLY when explicitly used in the species-level argument. STEP 3 — Map each explicit statement to the correct Ev1 token(s) A single sentence can map to multiple tokens. STEP 4 — NEVER infer. • “Protoconch multispiral” → MORPH.PROT • BUT NOT “DEV.MODE”, unless developmental mode is explicitly said. STEP 5 — Produce a valid token list formatted exactly as required. =============================================================================== SECTION D — COMPLETE REVISED Ev1 v1.3 LEXICON =============================================================================== All tokens below include: • definition • applies_when • maximally explicit generic example • maximally explicit gastropod example • boundary_notes =============================================================================== DOMAIN: IMAG — Imaging Evidence =============================================================================== IMAG — Visual documentation used explicitly in species-level taxonomy. IMAG tokens describe the imaging *modality*, not the biological trait. ------------------------------------------------------------------------------- TOKEN: IMAG.PHOTO ------------------------------------------------------------------------------- definition: Photographic images (macro, micro, stacked, or in situ) used explicitly as evidence in taxonomic reasoning. applies_when: Assign IMAG.PHOTO when the manuscript explicitly cites photographs as evidence to show diagnostic characters, comparative differences, or species-level distinctions. example_generic: “The authors state that photographs in Figures 2A–D show the diagnostic sculpture distinguishing this species. Because the use of photographs is explicitly invoked to support the diagnosis, IMAG.PHOTO applies.” example_gastropod: “Figure 4B is referenced as illustrating the protoconch sculpture used to differentiate the volutid species. Because the text explicitly uses the photo as evidence, IMAG.PHOTO applies.” boundary_notes: • Mere presence of photos is insufficient. Must be textually invoked. • Morphological content shown must be coded separately (e.g., MORPH.PROT). ------------------------------------------------------------------------------- TOKEN: IMAG.SEM ------------------------------------------------------------------------------- definition: SEM imagery used explicitly as evidence in taxonomic comparison or diagnosis. applies_when: Assign IMAG.SEM when SEM imaging is explicitly used to visualise microstructure, radula, periostracum, protoconch, or other features used for species-level reasoning. example_generic: “SEM images in Figure 3 reveal microstructural features explicitly used to distinguish the new species. This invocation triggers IMAG.SEM.” example_gastropod: “SEM of the radula is referenced to show the cusp arrangement separating this conoidean species. Therefore IMAG.SEM applies.” boundary_notes: • Must be explicitly invoked in text. • Morphological content must ALSO trigger appropriate MORPH/ANAT tokens. ------------------------------------------------------------------------------- TOKEN: IMAG.ILLUST ------------------------------------------------------------------------------- definition: New scientific illustrations (line drawings, diagrams) created for the manuscript and explicitly referenced as evidence. applies_when: Assign IMAG.ILLUST when the manuscript explicitly refers to new illustrations to show diagnostic characters or species differences. example_generic: “The line drawings in Figure 6 are used in the diagnosis to highlight the distinctive male genital morphology. This triggers IMAG.ILLUST.” example_gastropod: “Camera lucida drawings of the radula in Fig. 7 are explicitly referenced as demonstrating the species-specific cusp pattern. IMAG.ILLUST applies.” boundary_notes: • Historical illustrations are IMAG.HIST, not IMAG.ILLUST. • Morphological content must be recorded separately. ------------------------------------------------------------------------------- TOKEN: IMAG.HIST ------------------------------------------------------------------------------- definition: Historical plates, lithographs, or older illustrations reused explicitly as taxonomic evidence. applies_when: Assign IMAG.HIST when historical figures are explicitly cited to confirm, reinterpret, or support diagnostic characters or synonymic reasoning. example_generic: “The original lithograph from 1843 is re-examined and explicitly used to confirm the shoulder sculpture. This triggers IMAG.HIST.” example_gastropod: “Reeve’s Cymbiola plate is referenced to verify historical shell colour patterns relevant to diagnosis. IMAG.HIST applies.” boundary_notes: • Historical figures must be used AS EVIDENCE, not merely cited in text. ------------------------------------------------------------------------------- TOKEN: IMAG.3D ------------------------------------------------------------------------------- definition: 3D reconstructions (micro-CT, structured-light, etc.) explicitly used in species-level taxonomy. applies_when: Assign IMAG.3D when the text states that 3D imagery reveals internal or external structures used in species-level reasoning. example_generic: “Micro-CT reconstruction shows internal septation diagnostic for the species. Because the CT image is used as evidence, IMAG.3D applies.” example_gastropod: “3D reconstructions reveal the internal columellar fold arrangement, which is explicitly used to separate the species. IMAG.3D applies.” boundary_notes: • Imaging modality only; morphological details are coded separately. =============================================================================== DOMAIN: GENE — Genetic Evidence =============================================================================== General domain definition: GENE tokens represent explicit use of DNA sequence data in species-level taxonomy. These tokens capture *what genetic data were used*, not how they were analysed. Analyses are coded under STAT or PHYL. ------------------------------------------------------------------------------- TOKEN: GENE.MULTI ------------------------------------------------------------------------------- definition: Use of multiple independent genetic loci (≥2 loci) explicitly as evidence in species-level delimitation, diagnosis, or comparison. applies_when: Assign GENE.MULTI when the manuscript clearly states that the authors used two or more independent genetic loci to justify species limits or compare species. example_generic: “The authors report that both COI and 28S data support recognition of the species. Because two independent loci are explicitly used, GENE.MULTI applies.” example_gastropod: “COI and 16S sequences were jointly analysed and cited as confirming the distinctness of the conoidean species. This triggers GENE.MULTI.” boundary_notes: • Only applies when ≥2 loci are explicitly invoked as evidence. • If only a single locus (e.g., COI) is used, do NOT assign GENE.MULTI. ------------------------------------------------------------------------------- TOKEN: GENE.MITO ------------------------------------------------------------------------------- definition: Explicit use of complete or near-complete mitochondrial genomes as evidence in species-level taxonomy. applies_when: Assign GENE.MITO when the manuscript explicitly uses mitochondrial genomes (e.g., full mitogenome, >12 protein-coding genes, etc.) to support species limits or identification. example_generic: “Mitogenome analysis revealed a distinct clade corresponding to the new species. Because full mitochondrial genome data were used explicitly, GENE.MITO applies.” example_gastropod: “The complete mitogenome of the volute was sequenced and used to establish its separation from congeners. This triggers GENE.MITO.” boundary_notes: • Partial COI/16S data do NOT count as mitogenome. • Must explicitly reference whole or nearly whole mitochondrial data. ------------------------------------------------------------------------------- SUBDOMAIN: GENE.LOCI — Locus-Specific Evidence ------------------------------------------------------------------------------- TOKEN: GENE.LOCI.COXI ------------------------------------------------------------------------------- definition: Use of the COI gene (cytochrome oxidase I) explicitly as species-level evidence. applies_when: Assign GENE.LOCI.COXI when the manuscript explicitly uses COI sequences to: • diagnose a species • confirm its distinctness • place it in a phylogenetic tree • delimit species-level units example_generic: “COI sequences demonstrate clear divergence from sympatric taxa, supporting species status. Therefore GENE.LOCI.COXI applies.” example_gastropod: “COI data confirm that SSH 8 and SSH 9 form distinct species. Because COI is explicitly invoked as species-level evidence, GENE.LOCI.COXI applies.” boundary_notes: • COI must be explicitly referenced. • Barcoding alone is sufficient if used in diagnosis/delimitation. ------------------------------------------------------------------------------- TOKEN: GENE.LOCI.16S ------------------------------------------------------------------------------- definition: Use of 16S rRNA sequence data explicitly as evidence in species-level taxonomy. applies_when: Assign GENE.LOCI.16S when 16S data are explicitly used to diagnose species or compare species-level units. example_generic: “16S sequence differences were used to separate the new species from a morphologically similar taxon. GENE.LOCI.16S applies.” example_gastropod: “16S sequences place the conoidean species in a distinct clade separate from congeners, supporting its description. This triggers GENE.LOCI.16S.” boundary_notes: • Explicit reference required. • Must be used in species-level reasoning. ------------------------------------------------------------------------------- TOKEN: GENE.LOCI.28S ------------------------------------------------------------------------------- definition: Use of 28S rRNA sequence data explicitly as evidence in species-level taxonomy. applies_when: Assign this token when 28S data are explicitly used for species delimitation or comparative diagnosis. example_generic: “The 28S gene confirms that the taxon is genetically distinct from its closest relatives. GENE.LOCI.28S applies.” example_gastropod: “28S sequences support the separation of the volute species from a similar lineage. GENE.LOCI.28S applies.” boundary_notes: • Requires explicit reference to 28S. • Do NOT assign if only COI/16S used. =============================================================================== DOMAIN: PHYL — Phylogenetic Evidence =============================================================================== General domain definition: PHYL tokens record explicit use of phylogenetic trees, clade relationships, or divergence-time analyses in species-level taxonomy. ------------------------------------------------------------------------------- TOKEN: PHYL.TREE ------------------------------------------------------------------------------- definition: Explicit use of a phylogenetic tree as species-level evidence. applies_when: Assign PHYL.TREE when the manuscript explicitly uses a phylogenetic tree to: • justify species distinctness • illustrate relationships among candidate species • support synonymy or non-synonymy • confirm monophyly of a species example_generic: “Figure 3 shows the phylogenetic tree that the authors explicitly use to demonstrate the distinctness of the new species. PHYL.TREE applies.” example_gastropod: “The COI phylogenetic tree shows SSH 8 and SSH 9 forming separate clades and is explicitly cited to justify describing Turridrupa neojubata. Therefore PHYL.TREE applies.” boundary_notes: • Tree must be explicitly referenced. • Implicit tree use (“data not shown”) does NOT trigger. ------------------------------------------------------------------------------- TOKEN: PHYL.CLADE ------------------------------------------------------------------------------- definition: Explicit invocation of clade membership, monophyly, sister-group placement, or branching structure as species-level evidence. applies_when: Assign PHYL.CLADE when the text explicitly states that being in a particular clade, lineage, or monophyletic group supports species identity. example_generic: “The taxon forms a well-supported monophyletic clade separate from similar species. Because clade structure is used diagnostically, PHYL.CLADE applies.” example_gastropod: “SSH 11 forms a unique and well-supported clade distinct from T. cincta and this clade membership is explicitly used to justify T. elongata sp. nov. PHYL.CLADE applies.” boundary_notes: • Requires explicit statements about clades/monophyly. • Trees alone without commentary do NOT trigger this. ------------------------------------------------------------------------------- TOKEN: PHYL.DATED ------------------------------------------------------------------------------- definition: Explicit use of time-calibrated phylogenies or divergence dates as taxonomic evidence. applies_when: Assign PHYL.DATED when divergence times or time-calibrated trees are explicitly cited to justify species limits or differentiation. example_generic: “The divergence time estimates show the lineage diverged 3 Ma, supporting species status. PHYL.DATED applies.” example_gastropod: “The dated phylogeny for stromboideans is used to separate two lineages at species level. PHYL.DATED applies.” boundary_notes: • Must explicitly invoke dates or calibrated tree. • Standard COI trees without dates do NOT trigger. =============================================================================== DOMAIN: ECO — Ecological Evidence =============================================================================== General domain definition: Ecological traits explicitly used in species-level differentiation. ------------------------------------------------------------------------------- TOKEN: ECO.HABITAT ------------------------------------------------------------------------------- definition: Habitat type explicitly used as evidence in species delimitation or diagnosis. applies_when: Assign ECO.HABITAT when the manuscript uses habitat differences (substrate, environment, exposure) to justify species distinctions. example_generic: “One species occurs only on coral rubble while its sister taxon inhabits seagrass beds, and these habitat differences are explicitly used in the diagnosis. ECO.HABITAT applies.” example_gastropod: “Turrid A lives only on steep outer reefs, while Turrid B lives on lagoonal sand flats, and this difference is invoked in comparing SSHs. ECO.HABITAT applies.” boundary_notes: • Mere reporting of habitat is NOT sufficient. • Must be used in the argument. ------------------------------------------------------------------------------- TOKEN: ECO.MICROHABITAT ------------------------------------------------------------------------------- definition: Microhabitat characteristics explicitly used in species-level arguments. applies_when: Assign ECO.MICROHABITAT when microhabitat differences (crevices, algal turf, dead coral heads) are explicitly used to support species distinctions. example_generic: “The species is restricted to under-rock microhabitats, whereas similar taxa inhabit exposed surfaces; the authors use this to justify species status. ECO.MICROHABITAT applies.” example_gastropod: “The conoidean species is found only inside Porites crevices while a morpho- logically similar species occurs externally; this distinction is explicitly invoked. ECO.MICROHABITAT applies.” boundary_notes: • Must appear explicitly in the species-level argument. ------------------------------------------------------------------------------- TOKEN: ECO.DIET ------------------------------------------------------------------------------- definition: Diet or host-specific feeding traits explicitly used as evidence. applies_when: Assign ECO.DIET when the manuscript uses feeding or prey differences as species-level evidence. example_generic: “This species feeds exclusively on sponge X, whereas the closely related species feeds on sponge Y, and this difference is explicitly used as evidence. ECO.DIET applies.” example_gastropod: “The volute is reported to feed exclusively on echinoids, whereas a close species feeds on crabs, and this difference is used in diagnosis. ECO.DIET applies.” boundary_notes: • Diet must be explicitly comparative and evidentiary. =============================================================================== DOMAIN: CHEM — Chemical Evidence =============================================================================== General domain definition: Chemical, geochemical, or biochemical traits explicitly used in species-level reasoning. ------------------------------------------------------------------------------- TOKEN: CHEM.ICPMS ------------------------------------------------------------------------------- definition: Explicit use of ICP-MS (or related elemental composition data) as evidence. applies_when: Assign CHEM.ICPMS when elemental composition (e.g., trace metals) is invoked to justify species distinctions. example_generic: “ICP-MS data reveal distinct elemental profiles used to separate the species. CHEM.ICPMS applies.” example_gastropod: “Elemental composition of the shell (via ICP-MS) distinguishes two sympatric volutidae, explicitly referenced in the comparison. CHEM.ICPMS applies.” boundary_notes: • Must explicitly invoke chemical composition. • Not common in Gastropoda. ------------------------------------------------------------------------------- TOKEN: CHEM.RAMAN ------------------------------------------------------------------------------- definition: Explicit use of Raman spectroscopy or equivalent vibrational spectra as taxonomic evidence. applies_when: Assign CHEM.RAMAN when Raman spectra are explicitly cited to distinguish taxa. example_generic: “Raman spectra differ significantly between species and are cited as evidence. CHEM.RAMAN applies.” example_gastropod: “Raman data show differing shell mineral signatures between volutes and are used in diagnosis. CHEM.RAMAN applies.” boundary_notes: • Explicit reference required. ------------------------------------------------------------------------------- TOKEN: CHEM.ISOTOPE ------------------------------------------------------------------------------- definition: Explicit use of stable or radiogenic isotopes as evidence. applies_when: Assign CHEM.ISOTOPE when isotopic values (δ13C, δ18O, etc.) are explicitly used to differentiate species. example_generic: “Oxygen isotope ratios provide evidence separating two cryptic species. CHEM.ISOTOPE applies.” example_gastropod: “Isotopic signatures of the shells differ between populations and are used to argue species limits. CHEM.ISOTOPE applies.” boundary_notes: • Reporting isotope values alone is insufficient. • Must be evidentiary. ------------------------------------------------------------------------------- SUBDOMAIN: BIOC — Biochemical / Bio-optical / Biophysical Evidence ------------------------------------------------------------------------------- TOKEN: BIOC.VEN ------------------------------------------------------------------------------- definition: Venom composition or proteomic venom data explicitly used in species-level taxonomy. applies_when: Assign BIOC.VEN when venom peptides, proteins, or venom chemistry are used to differentiate species. example_generic: “Venom peptide profiles differ sharply between the species and are cited as evidence. BIOC.VEN applies.” example_gastropod: “Conotoxin profiles in Conus explicitly differentiate two cryptic species. BIOC.VEN applies.” boundary_notes: • Must explicitly reference venom composition. ------------------------------------------------------------------------------- TOKEN: BIOC.LUM ------------------------------------------------------------------------------- definition: Bioluminescence or photobiological traits explicitly used in species-level taxonomy. applies_when: Assign BIOC.LUM when luminescent behaviour or emission spectra are invoked as diagnostic evidence. example_generic: “Distinct bioluminescent signatures separate the two species. BIOC.LUM applies.” example_gastropod: “A slit-limpet species emits a characteristic bioluminescent pulse not shared by its congener, and this is used in diagnosis. BIOC.LUM applies.” boundary_notes: • Rare in Gastropoda. =============================================================================== DOMAIN: STAT — Statistical / Analytical Evidence =============================================================================== General domain definition: STAT tokens represent *analytical methods* used for species delimitation or population structure. These do NOT describe genetic loci (GENE) or tree topology (PHYL). They describe analysis type only. ------------------------------------------------------------------------------- TOKEN: STAT.MORPH ------------------------------------------------------------------------------- definition: Explicit use of morphometric, geometric morphometric, or quantitative shape analyses as species-level evidence. applies_when: Assign STAT.MORPH when the manuscript explicitly uses quantitative or statistical analyses of morphology (e.g., PCA, LDA, Procrustes, shape vectors) to argue species distinctness. example_generic: “PCA of shell shape clearly separates the new species from its closest relative, and this analysis is used as evidence. STAT.MORPH applies.” example_gastropod: “Geometric morphometric analysis of shell outlines supports recognizing the volute as distinct. STAT.MORPH applies.” boundary_notes: • Qualitative description of morphology is NOT STAT.MORPH. • Requires explicit quantitative analysis. ------------------------------------------------------------------------------- TOKEN: STAT.STRUCTURE ------------------------------------------------------------------------------- definition: Explicit use of STRUCTURE, ADMIXTURE, or other population clustering tools in species delimitation. applies_when: Assign STAT.STRUCTURE when STRUCTURE/ADMIXTURE results are explicitly used to support species-level divisions. example_generic: “STRUCTURE analysis identifies two non-admixed clusters corresponding to the species boundary. STAT.STRUCTURE applies.” example_gastropod: “STRUCTURE results show no admixture between SSH 4 and SSH 5 and are invoked to support species delimitation. STAT.STRUCTURE applies.” boundary_notes: • Requires explicit use of STRUCTURE-type methods. ------------------------------------------------------------------------------- TOKEN: STAT.ABGD ------------------------------------------------------------------------------- definition: Explicit use of ABGD (Automatic Barcode Gap Discovery) as evidence. applies_when: Assign STAT.ABGD when ABGD output is explicitly referenced as supporting species limits or diagnostic separations. example_generic: “ABGD identifies four candidate species, and the authors explicitly use this to justify species recognition. STAT.ABGD applies.” example_gastropod: “ABGD partitions separate SSH 8 and SSH 9, supporting their treatment as distinct Turridrupa species. STAT.ABGD applies.” boundary_notes: • Must be species-level argument, not incidental reporting. ------------------------------------------------------------------------------- TOKEN: STAT.GMYC ------------------------------------------------------------------------------- definition: Explicit use of GMYC (Generalized Mixed Yule-Coalescent) delimitation as species-level evidence. applies_when: Assign STAT.GMYC when results of GMYC are explicitly used to justify species boundaries. example_generic: “GMYC results resolving six species are explicitly used to delimit the group. STAT.GMYC applies.” example_gastropod: “GMYC identifies multiple Turridrupa lineages and this is used to support species recognition. STAT.GMYC applies.” boundary_notes: • GMYC must be explicitly used, not just mentioned. ------------------------------------------------------------------------------- TOKEN: STAT.bPTP ------------------------------------------------------------------------------- definition: Explicit use of PTP or bPTP delimitation results as species-level evidence. applies_when: Assign STAT.bPTP when the manuscript explicitly uses PTP/bPTP partitions to justify species distinctions. example_generic: “bPTP results indicate three supported species, and these are referenced in the diagnosis. STAT.bPTP applies.” example_gastropod: “PTP partitions separate the conoidean specimens into two species-level units and the authors explicitly cite these results. STAT.bPTP applies.” boundary_notes: • Must be explicitly referenced as evidence. ------------------------------------------------------------------------------- TOKEN: STAT.BPP ------------------------------------------------------------------------------- definition: Explicit use of BPP (Bayesian Phylogenetics & Phylogeography) or multispecies coalescent models in species delimitation. applies_when: Assign STAT.BPP when BPP results are explicitly invoked to support species boundaries, distinctness, or validity. example_generic: “BPP supports two species with posterior probability >0.99. Because this is used as evidence, STAT.BPP applies.” example_gastropod: “BPP corroborates separation of SSH 12 and SSH 13 as distinct conoidean species. STAT.BPP applies.” boundary_notes: • Must be species-level claim. ------------------------------------------------------------------------------- TOKEN: STAT.ASAP ------------------------------------------------------------------------------- definition: Explicit use of ASAP (Assemble Species by Automatic Partitioning) as evidence. applies_when: Assign STAT.ASAP when ASAP partitions or ranks are explicitly used to justify species distinctions. example_generic: “ASAP identifies distinct clusters corresponding to the new species, and the authors invoke this directly in delimitation. STAT.ASAP applies.” example_gastropod: “ASAP partitions separate SSH 8 and SSH 9 and are explicitly used to justify describing Turridrupa neojubata sp. nov. STAT.ASAP applies.” boundary_notes: • Global ASAP summaries MUST NOT be applied to species unless explicitly used in that species’ argument. =============================================================================== DOMAIN: DEV — Developmental Evidence =============================================================================== ------------------------------------------------------------------------------- TOKEN: DEV.MODE ------------------------------------------------------------------------------- definition: Explicit use of developmental mode (e.g., planktotrophic, lecithotrophic, direct development) as species-level evidence. applies_when: Assign DEV.MODE when developmental mode is explicitly stated and used to differentiate species or justify species boundaries. example_generic: “The authors explicitly state that the new species is direct-developing, while the similar taxon is planktotrophic, and use this difference in diagnosis. DEV.MODE applies.” example_gastropod: “The volutid species is explicitly described as planktotrophic based on its protoconch whorls, and this developmental mode is used comparatively in the diagnosis. DEV.MODE applies.” boundary_notes: • Protoconch morphology alone does NOT trigger DEV.MODE unless developmental mode is explicitly stated. =============================================================================== DOMAIN: LIFE — Life-History Evidence =============================================================================== ------------------------------------------------------------------------------- TOKEN: LIFE.PARASITIC ------------------------------------------------------------------------------- definition: Explicit use of parasitism (or lack thereof) as species-level evidence. applies_when: Assign LIFE.PARASITIC when the manuscript explicitly uses parasitic vs. free-living life history to justify species distinctions. example_generic: “This species is parasitic while its relative is free-living, and this is explicitly used to support species delimitation. LIFE.PARASITIC applies.” example_gastropod: “The eulimids differ in parasitic association with echinoids, explicitly invoked in diagnosis. LIFE.PARASITIC applies.” boundary_notes: • Rare in most gastropod groups. =============================================================================== DOMAIN: BEH — Behavioural Evidence =============================================================================== ------------------------------------------------------------------------------- TOKEN: BEH.SPAWNING ------------------------------------------------------------------------------- definition: Explicit use of spawning behaviour, spawning season, mating behaviour, or reproductive timing as evidence in species-level taxonomy. applies_when: Assign BEH.SPAWNING when reproductive behaviour is explicitly invoked to distinguish species. example_generic: “The species spawn in different months and the authors use this difference to justify species recognition. BEH.SPAWNING applies.” example_gastropod: “Different spawning periods are invoked to differentiate two sympatric muricids. BEH.SPAWNING applies.” boundary_notes: • Behaviour must be explicitly used for taxonomic reasoning. =============================================================================== DOMAIN: PHYSIOL — Morphology & Anatomy =============================================================================== PHYSIOL represents morphological or anatomical traits explicitly used in species-level taxonomy. These tokens encode *what characters* are used, not the imaging method (IMAG handles that). ------------------------------------------------------------------------------- SUBDOMAIN: MORPH — Hard-Part Morphology ------------------------------------------------------------------------------- TOKEN: MORPH.SHELL ------------------------------------------------------------------------------- definition: Adult shell morphology explicitly used as species-level evidence, including sculpture, whorl profile, spire height, aperture characters, colour pattern, shoulder features, canal length, periostracal traits, etc. applies_when: Assign MORPH.SHELL when shell morphology is explicitly invoked in diagnosis, redescription, or comparison. example_generic: “The diagnosis explicitly uses the shell’s strong axial ribs and rounded aperture to separate the species. MORPH.SHELL applies.” example_gastropod: “The volute species is distinguished by its tall spire and inflated last whorl, explicitly referenced in the diagnosis. MORPH.SHELL applies.” boundary_notes: • Most gastropod papers trigger this token. • Imaging modality must be coded separately (IMAG.*). ------------------------------------------------------------------------------- TOKEN: MORPH.PROT ------------------------------------------------------------------------------- definition: Protoconch morphology (shape, whorl count, sculpture) explicitly used as species-level evidence. applies_when: Assign MORPH.PROT when protoconch features are explicitly used in diagnosis or to separate species. example_generic: “The protoconch is described as multispiral and ornamented, and this is used to distinguish the species. MORPH.PROT applies.” example_gastropod: “Turridrupa species differ in protoconch whorl count, explicitly invoked in separating SSH 8 and SSH 9. MORPH.PROT applies.” boundary_notes: • Protoconch alone does NOT imply developmental mode unless explicitly stated. ------------------------------------------------------------------------------- TOKEN: MORPH.OPERC ------------------------------------------------------------------------------- definition: Operculum morphology explicitly used in species-level taxonomy. applies_when: Assign MORPH.OPERC when operculum shape, size, or structure is explicitly used to justify species distinctness. example_generic: “The operculum’s ovate shape and apical nucleus are used to diagnose the new species. MORPH.OPERC applies.” example_gastropod: “The turbinid species is separated based on operculum shape, explicitly cited in the diagnosis. MORPH.OPERC applies.” boundary_notes: • Rare in groups without opercula. ------------------------------------------------------------------------------- TOKEN: MORPH.RAD ------------------------------------------------------------------------------- definition: Radula morphology explicitly used as species-level evidence. applies_when: Assign MORPH.RAD when radular features (cusp counts, tooth shape, formula) are used in species diagnosis or comparison. example_generic: “The radula has three cusps on the central tooth and this is used to separate the species. MORPH.RAD applies.” example_gastropod: “The radula with 45 rows of teeth and a distinct marginal cusp arrangement is explicitly referenced. MORPH.RAD applies.” boundary_notes: • Must be explicitly invoked. • Imaging method (photo/SEM) is coded separately. ------------------------------------------------------------------------------- TOKEN: MORPH.MICRO ------------------------------------------------------------------------------- definition: Shell microstructure or fine-scale external sculpture explicitly used as species-level evidence. applies_when: Assign MORPH.MICRO when microstructural features (spiral lirae, granules, cancellate sculpture, periostracal microstructure) are explicitly referenced as diagnostic evidence. example_generic: “The fine cancellate microstructure is explicitly used to distinguish the species. MORPH.MICRO applies.” example_gastropod: “SEM microstructure of the protoconch is invoked to separate two conoidean species. MORPH.MICRO applies.” boundary_notes: • SEM modality does NOT imply MORPH.MICRO unless text mentions microstructure. ------------------------------------------------------------------------------- SUBDOMAIN: ANAT — Soft-Part Anatomy ------------------------------------------------------------------------------- TOKEN: ANAT.ANIM ------------------------------------------------------------------------------- definition: External anatomy of the animal (mantle, foot, siphon, tentacles, pigmentation) explicitly used as evidence. applies_when: Assign ANAT.ANIM when external soft-part traits are explicitly invoked to differentiate species. example_generic: “The mantle colour pattern is explicitly used to distinguish this species. ANAT.ANIM applies.” example_gastropod: “The conoidean species has a uniquely coloured siphon explicitly invoked in the comparison. ANAT.ANIM applies.” boundary_notes: • Must be explicitly used for taxonomy. ------------------------------------------------------------------------------- TOKEN: ANAT.FOREGUT ------------------------------------------------------------------------------- definition: Foregut anatomy used explicitly in species diagnosis or comparison. applies_when: Assign ANAT.FOREGUT when buccal tube, oesophagus, or foregut glands are described as evidence. example_generic: “The foregut gland structure separates the two species; ANAT.FOREGUT applies.” example_gastropod: “Foregut anatomy of the turrid species is invoked to justify separation from its relative. ANAT.FOREGUT applies.” boundary_notes: • Extremely rare except in dissected groups. ------------------------------------------------------------------------------- TOKEN: ANAT.DIGESTIVE ------------------------------------------------------------------------------- definition: Digestive anatomy beyond foregut explicitly used as evidence. applies_when: Assign ANAT.DIGESTIVE when midgut, hindgut, or digestive gland structures are cited as species-level evidence. example_generic: “Distinct digestive gland morphology is invoked in diagnosis. ANAT.DIGESTIVE applies.” example_gastropod: “The digestive caecum structure differs in volutes and is used in comparison. ANAT.DIGESTIVE applies.” boundary_notes: • Rare; must be explicit. ------------------------------------------------------------------------------- TOKEN: ANAT.REPRO ------------------------------------------------------------------------------- definition: Reproductive anatomy explicitly used as species-level evidence. applies_when: Assign ANAT.REPRO when reproductive organs (male, female, hermaphroditic) are explicitly referenced in species delimitation or diagnosis. example_generic: “Differences in penis shape are explicitly used to diagnose the new species. ANAT.REPRO applies.” example_gastropod: “The reproductive gland arrangement in muricids is invoked to justify species status. ANAT.REPRO applies.” boundary_notes: • Must be explicitly invoked. ------------------------------------------------------------------------------- TOKEN: ANAT.ORG ------------------------------------------------------------------------------- definition: Organ-level soft anatomy (heart, kidney, ctenidium, osphradium) explicitly used as species-level evidence. applies_when: Assign ANAT.ORG when internal organ morphology is explicitly cited. example_generic: “The ctenidial arrangement distinguishes the species. ANAT.ORG applies.” example_gastropod: “A unique osphradium shape is explicitly invoked in the diagnosis. ANAT.ORG applies.” boundary_notes: • Must be explicit. ------------------------------------------------------------------------------- TOKEN: ANAT.CIRC ------------------------------------------------------------------------------- definition: Circulatory system anatomy explicitly invoked as evidence. applies_when: Assign ANAT.CIRC when heart morphology, haemocoel chambers, or related structures are used in species comparison. example_generic: “The heart morphology differs between species and is used in diagnosis. ANAT.CIRC applies.” example_gastropod: “The arrangement of the auricle is explicitly cited to separate two conoideans. ANAT.CIRC applies.” boundary_notes: • Rare; must be explicit. ------------------------------------------------------------------------------- TOKEN: ANAT.TOX ------------------------------------------------------------------------------- definition: Venom apparatus (morphology of venom gland, duct, radular sac) explicitly used as species-level evidence. applies_when: Assign ANAT.TOX when venom apparatus anatomy is explicitly cited in diagnosis or comparison. example_generic: “The venom gland morphology differs and is used explicitly in diagnosis. ANAT.TOX applies.” example_gastropod: “Conoidean venom bulb and duct differ between the species and the difference is invoked explicitly. ANAT.TOX applies.” boundary_notes: • Distinct from BIOC.VEN (biochemical venom evidence). =============================================================================== DOMAIN: SPATIAL — Geospatial, Bathymetric, Altitudinal & Temporal Evidence =============================================================================== General domain definition: SPATIAL domain records geographic, bathymetric, altitudinal, temporal, and barrier-related evidence *when explicitly used* to justify species-level distinctions. These tokens capture **the evidentiary role of spatial factors**, not merely reporting occurrences or collection localities. ------------------------------------------------------------------------------- SUBDOMAIN: GEOD — Geospatial / Environmental Evidence ------------------------------------------------------------------------------- TOKEN: GEOD.LOC ------------------------------------------------------------------------------- definition: Explicit use of locality or geographic range as evidence in species-level differentiation. applies_when: Assign GEOD.LOC when the manuscript explicitly uses locality differences (e.g., micro-range, region, biogeographic units) to justify species limits, diagnoses, or comparisons. example_generic: “The two taxa occur in mutually exclusive regions (north vs. south side of the strait), and this geographic separation is explicitly used to justify treating them as distinct species. GEOD.LOC applies.” example_gastropod: “SSH 5 occurs only in Polynesia while SSH 6 is restricted to the Philippines, and this range difference is explicitly invoked in species delimitation. GEOD.LOC applies.” boundary_notes: • Merely listing localities or distribution does NOT trigger GEOD.LOC. • Locality must be used as an *argument* in species delimitation. ------------------------------------------------------------------------------- TOKEN: GEOD.DEPTH ------------------------------------------------------------------------------- definition: Explicit use of bathymetric differences (depth ranges) as evidence. applies_when: Assign GEOD.DEPTH when depth ranges are explicitly used to justify species differences or boundaries. example_generic: “One species occurs only below 200 m while a similar form inhabits shallow reefs, and this distinction is explicitly used in diagnosis. GEOD.DEPTH applies.” example_gastropod: “SSH 3 inhabits 40–80 m while SSH 4 is found at 200–300 m, and these depth ranges are explicitly invoked to separate Turridrupa species. GEOD.DEPTH applies.” boundary_notes: • Reporting depth alone is insufficient. • Must explicitly contribute to the taxonomic argument. ------------------------------------------------------------------------------- TOKEN: GEOD.TIME ------------------------------------------------------------------------------- definition: Explicit use of geological age or temporal occurrence as species-level evidence. applies_when: Assign GEOD.TIME when the manuscript explicitly states that a species’ age or fossil occurrence time contributes to species-level taxonomy. example_generic: “Because one lineage originates from a Pliocene deposit and the other is Holocene, and this difference is explicitly used in the comparison, GEOD.TIME applies.” example_gastropod: “A Miocene volutid lineage is explicitly distinguished from a Pleistocene one in species-level diagnosis. GEOD.TIME applies.” boundary_notes: • Rare in extant-only papers. ------------------------------------------------------------------------------- TOKEN: GEOD.ALT ------------------------------------------------------------------------------- definition: Explicit use of altitudinal range differences as species-level evidence. applies_when: Assign GEOD.ALT when altitude differences are explicitly used to justify species-level distinctions. example_generic: “The species occurs above 1200 m whereas its congener is restricted to lowland forest, explicitly used to justify species status. GEOD.ALT applies.” example_gastropod: “Intertidal vs. subtidal zonation is explicitly invoked to distinguish species. GEOD.ALT applies.” boundary_notes: • Rare in marine taxa; more common in pulmonates. ------------------------------------------------------------------------------- TOKEN: GEOD.CURRENT ------------------------------------------------------------------------------- definition: Explicit use of ocean currents, flow barriers, or environmental circulation patterns as evidence. applies_when: Assign GEOD.CURRENT when current systems are explicitly invoked as barriers or isolation mechanisms supporting species boundaries. example_generic: “The authors argue that two species are isolated by the major west-flowing current, which is explicitly invoked in delimitation. GEOD.CURRENT applies.” example_gastropod: “The South Equatorial Current is explicitly cited as separating SSH 8 and SSH 9. GEOD.CURRENT applies.” boundary_notes: • Must be explicitly stated; cannot be inferred. =============================================================================== DOMAIN: MODEL — Species Distribution Models (SDMs/ENMs) =============================================================================== ------------------------------------------------------------------------------- SUBDOMAIN: DISTR ------------------------------------------------------------------------------- TOKEN: DISTR.ALG ------------------------------------------------------------------------------- definition: Explicit use of an SDM/ENM algorithm (e.g., MaxEnt) as species-level evidence. applies_when: Assign DISTR.ALG when the manuscript explicitly uses SDM outputs to justify species distinctions. example_generic: “MaxEnt distribution modelling identifies two ecologically distinct niches, explicitly used to justify species limits. DISTR.ALG applies.” example_gastropod: “SDM outputs for a turrid species show non-overlapping climatic niches used in diagnosis. DISTR.ALG applies.” boundary_notes: • Model must be explicitly used in species delimitation. =============================================================================== DOMAIN: VOUCHER — Type Material / Examined Specimens =============================================================================== General domain definition: VOUCHER tokens capture formal use of type material in taxonomic acts. ------------------------------------------------------------------------------- SUBDOMAIN: TYPE — Name-Bearing Type Material ------------------------------------------------------------------------------- TOKEN: VOUCHER.TYPE.HOLO ------------------------------------------------------------------------------- definition: Holotype (or lectotype/neotype) explicitly examined or designated in the treatment of a species. applies_when: Assign VOUCHER.TYPE.HOLO when the manuscript: • describes a new species (sp. nov.) and designates a holotype • redescribes a species with reference to the holotype • re-examines the holotype as species-level evidence example_generic: “The holotype is examined and explicitly cited in diagnosis. VOUCHER.TYPE.HOLO applies.” example_gastropod: “The holotype of Turridrupa cincta is re-examined and used to confirm shell characters in the diagnosis. VOUCHER.TYPE.HOLO applies.” boundary_notes: • All new species MUST receive VOUCHER.TYPE.HOLO if a holotype is designated. • If a holotype is mentioned but NOT used as evidence, do NOT assign. ------------------------------------------------------------------------------- TOKEN: VOUCHER.TYPE.PARA ------------------------------------------------------------------------------- definition: Paratype or other non-name-bearing type material explicitly examined and invoked in species-level taxonomy. applies_when: Assign VOUCHER.TYPE.PARA when paratypes (or syntypes, paralectotypes) are explicitly examined and used in diagnosis or comparison. example_generic: “Paratypes are examined and compared to justify intraspecific variation and species limits. VOUCHER.TYPE.PARA applies.” example_gastropod: “Paratypes of the conoidean species are explicitly used to document variation in radular morphology. VOUCHER.TYPE.PARA applies.” boundary_notes: • Only assign when paratypes are explicitly used in reasoning. • Listing paratypes without evidentiary use does NOT trigger this token. =============================================================================== SECTION E — PROHIBITIONS (ABSOLUTE) =============================================================================== YOU MUST NOT: - Infer evidence from context or figures. - Treat SSH/PSH/OTU codes as evidence. - Use statements of general biology not tied to species-level arguments. - Create new tokens, rename tokens, or modify schema. - Use any knowledge outside the manuscript. - Output evidence snippets, quotes, or explanations. - Output commentary or narrative. - Copy the example species in SECTION F. - Output ANYTHING except the required species:token blocks. =============================================================================== SECTION F — INTERNAL SELF-CHECK BEFORE OUTPUT =============================================================================== Before generating your final answer, you MUST verify: 1. Every species you list is actually treated in a species-level act. 2. You have under-called rather than over-called. 3. Every token assigned is supported by explicit text. 4. No token has been invented or renamed. 5. All formatting matches the required block style. 6. No commentary, quotes, or narrative exist in the output. 7. You did NOT include the example species. 8. Your output ends immediately after the last species block. =============================================================================== SECTION G — COMPLETE LLM EXTRACTION PROTOCOL =============================================================================== This protocol governs EXACTLY how you (the LLM) must read, parse, interpret, and extract evidence from a taxonomic manuscript. You must obey every rule below without exception. ------------------------------------------------------------------------------- STEP 1 — IDENTIFY FOCAL SPECIES ------------------------------------------------------------------------------- A focal species is ANY species for which the manuscript performs a: • description • redescription • diagnosis • differential diagnosis • species comparison • species delimitation (molecular or morphological) You MUST list ONLY these species in the output. You MUST NOT: • include higher taxa • include genera • include species only mentioned in passing • include species used only as outgroups • include species not treated taxonomically ------------------------------------------------------------------------------- STEP 2 — IDENTIFY SPECIES-LEVEL EVIDENTIARY STATEMENTS ------------------------------------------------------------------------------- You MUST scan the manuscript for any sentence, clause, or figure reference that explicitly supports species distinction. These include: • diagnostic shell characters • protoconch differences • radula morphology • genetic loci used (COI, 16S, 28S, multi-locus, mitogenome) • phylogenetic trees or clades used in species justification • species delimitation analyses (ASAP, ABGD, GMYC, PTP, BPP, STRUCTURE) • ecological differences explicitly used as evidence • spatial / depth / current barriers explicitly used • type material examined (holotype, paratypes) If it is NOT explicitly used for species-level reasoning, you must ignore it. ------------------------------------------------------------------------------- STEP 3 — MAP EACH EVIDENCE STATEMENT TO Ev1 TOKENS ------------------------------------------------------------------------------- For each explicit evidentiary statement, you must map it to ALL applicable tokens in the Ev1 lexicon. Examples of multi-token mapping: “SEM image of the radula shows three major cusps used to distinguish the species.” → IMAG.SEM (imaging modality) → MORPH.RAD (radular morphology) → MORPH.MICRO (if microstructure is explicitly discussed) “COI gene confirms the distinctness of the species in the phylogenetic tree.” → GENE.LOCI.COXI → PHYL.TREE “ASAP separates SSH 9 as an independent species.” → STAT.ASAP “Holotype examined; paratypes illustrate morphological variation.” → VOUCHER.TYPE.HOLO → VOUCHER.TYPE.PARA ------------------------------------------------------------------------------- STEP 4 — DO NOT INFER, DO NOT GUESS, DO NOT HALLUCINATE ------------------------------------------------------------------------------- All of the following are strictly forbidden: • inferring developmental mode from protoconch unless stated • inferring ecology from distribution • inferring clades from trees not described • inferring imaging modality from figure style • over-calling genetic evidence • inventing tokens • “interpreting” beyond what is stated If unsure → DO NOT assign the token. ------------------------------------------------------------------------------- STEP 5 — ASSEMBLE THE TOKEN LIST PER SPECIES ------------------------------------------------------------------------------- You must produce ONE block per species: SpeciesName: TOKEN TOKEN TOKEN Rules: • Tokens MUST match EXACTLY the token names in the lexicon. • One token per line. • Two spaces indentation before each token. • No commentary. • No extra text. • No explanations. ------------------------------------------------------------------------------- STEP 6 — APPLY THE FINAL SANITY CHECK (SECTION F) ------------------------------------------------------------------------------- You must not output anything unless: • all rules have been followed • the output matches the mandated style • no example species appear • no additions, substitutions, or strings appear outside tokens and species names ------------------------------------------------------------------------------- STEP 7 — PRODUCE THE FINAL OUTPUT AND STOP ------------------------------------------------------------------------------- As soon as the last species block is printed, STOP. Do NOT give acknowledgements, notes, summaries, or commentary. =============================================================================== SECTION H — OUTPUT FORMAT (MANDATORY) =============================================================================== Your final output MUST follow this exact form: SpeciesName: TOKEN TOKEN TOKEN SpeciesName: TOKEN TOKEN Rules: • One blank line between species blocks. • Tokens must be valid Ev1 tokens. • Tokens must be indented exactly two spaces. • Species name must appear exactly as in manuscript headings. • DO NOT include genus-only names. • DO NOT include any text other than the species:token blocks. =============================================================================== SECTION I — EXAMPLE OUTPUT BLOCK =============================================================================== This example shows the correct format AND MUST NOT be copied into your results: ------------------------------------------------------------------------------- EXAMPLE ONLY — DO NOT COPY INTO OUTPUT ------------------------------------------------------------------------------- Turridrupa exemplarensis: MORPH.SHELL MORPH.PROT MORPH.RAD IMAG.PHOTO GENE.LOCI.COXI PHYL.TREE STAT.ASAP VOUCHER.TYPE.HOLO VOUCHER.TYPE.PARA Turridrupa ficta: MORPH.SHELL MORPH.PROT IMAG.PHOTO ------------------------------------------------------------------------------- END OF EXAMPLE — DO NOT REPRODUCE THESE SPECIES OR THESE TOKENS ------------------------------------------------------------------------------- =============================================================================== SECTION J — CLOSING NOTES =============================================================================== • This file (Ev1_LLMScript_v1.5_STRICT_FULL.txt) must be pasted verbatim into the context of any LLM performing Ev1 token extraction. • The LLM must have NO ambiguity about: – what the tokens mean – when they apply – when they do not – the exact output format – the mandatory under-calling rule • Once generated, the output token lists can be used in: – species accounts – systematic databases – metadata extraction pipelines – FAIR-compliant taxonomic publishing – Eikoniki production workflows • No modification of this file is allowed during extraction. END OF Ev1_LLMScript_v1.5_STRICT_FULL.txt