Eighty-two bench-ready peptides at sub-PIONEER oral doses, validatable in three weeks for roughly ten thousand dollars apiece. Three thousand eight hundred sequences in total, every one of them disclosed as public-domain prior art under 35 U.S.C. §102. The GLP-1 family of incretin and metabolic-peptide receptor agonists underpins a market exceeding USD 100 billion in projected annual sales; this deposit closes a substantial fraction of the sequence space surrounding the seven approved and clinical-stage molecules in that estate. Each candidate ships with a complete bench-execution package: predicted pharmacology, full Fmoc SPPS synthesis route, RP-HPLC purification protocol, radioligand-displacement binding affinity assay, Cisbio HTRF cAMP functional confirmation, sema-class developed-form pharmacokinetic profile, projected receptor occupancy and required weekly dose at the PIONEER-1 anchor, and predicted clinical efficacy at week 26 and week 68 calibrated to STEP-1, SURPASS-2, SURPASS-3, SUSTAIN-7, and PIONEER-1 Phase 3 endpoints. One thousand of those candidates additionally carry an ESMFold v1 backbone PDB and per-residue confidence scores. Once published, no party may claim composition-of-matter novelty on any of the 3,800 specific disclosed sequences in applications filed after the disclosure date.
Why this disclosure exists
The GLP-1 family of incretin and metabolic-peptide receptor agonists is the most commercially valuable peptide drug class in modern pharmaceuticals. Approved members (semaglutide, liraglutide, tirzepatide, dulaglutide, exenatide) and clinical-stage molecules (retatrutide, cotadutide, survodutide, mazdutide, cagrilintide) collectively underpin a market exceeding USD 100 billion in projected annual sales. The originator IP estate is heavily concentrated and weakly differentiated at the sequence level: most claims rest on small substitution patterns around a conserved central scaffold, and the extant patent corpus does not adequately disclose the full sequence space surrounding each approved peptide.
This deposit closes a substantial fraction of that sequence space. Once published, no party may claim composition-of-matter novelty on any of the 3,800 specific disclosed sequences in applications filed after the disclosure date. The deposit does not block obvious variants under §103, does not block method-of-use or method-of-treatment claims, does not block formulation or combination-therapy claims, and does not block compositions that include one of the disclosed peptides as a component; those scopes are explicitly preserved (see "What this disclosure does not claim" below). The 158 viable candidates within the deposit (Tier A and Tier B, defined below) function both as prior art and as independently developable molecules; predicted pharmacology suggests they could plausibly enter clinical development under generic, biosimilar, or de-novo paths.
Disclosure scope
3,800 sequence-distinct peptide analogs across five structural classes:
| Fold class | Count | Receptors targeted | Anchor reference peptides |
|---|---|---|---|
| GLP-1R mono-agonists | 700 | GLP-1R | semaglutide, liraglutide, exenatide, dulaglutide, lixisenatide |
| Glucagon mono-agonists | 700 | GCGR | glucagon |
| Dual agonists | 700 | GLP-1R / GCGR | cotadutide, survodutide, mazdutide, tirzepatide (GLP-1R / GIPR) |
| Triple agonists | 700 | GLP-1R / GCGR / GIPR | retatrutide |
| Oxyntomodulin-like | 1,000 | GLP-1R / GCGR (extended) | oxyntomodulin |
Every disclosed sequence is novel against a 2,583-patent corpus surveyed on 2026-04-25, covering originator and generics filings on GLP-1 family agonists across major jurisdictions. The corpus snapshot date is recorded in every per-candidate dossier. Subsequent filings between the snapshot date and any later assertion of prior art should be re-checked.
Notation
Sequences use standard one-letter amino-acid codes with one deliberate non-IUPAC convention: U denotes Aib (α-aminoisobutyric acid), the non-canonical α-methyl alanine residue used for DPP-4 resistance and helix stabilisation in semaglutide, tirzepatide, and retatrutide. U here is not selenocysteine (the IUPAC default). The convention is documented in every per-candidate dossier and in the deposit README; on synthesis, Aib (U) is incorporated as Fmoc-Aib-OH (Bachem 04-13-1029, CAS 94744-50-0). For ESMFold v1 inference, Aib was substituted to alanine and the deposited PDB files reflect that substitution.
Readiness tier system
Each candidate carries a single readiness tier (A / B / C / D) based on predicted Kd at the primary target, projected weekly oral dose to reach receptor occupancy of 0.6, and predicted RO at the standardised PIONEER-1 anchor dose (14 mg/day oral semaglutide, equivalent to 98 mg/week).
| Tier | Definition | Count |
|---|---|---|
| A — bench-ready | Primary Kd ≤ 1 nM, required dose ≤ 100 mg/week, RO ≥ 50% at PIONEER-1 dose. Therapeutic-grade potency at sub-PIONEER dose. Worth synthesising and binding-assaying directly. | 82 |
| B — formulation-required | Primary Kd ≤ 10 nM, required dose ≤ 500 mg/week, RO ≥ 20%. Viable lead with formulation lift (depot SC, absorption enhancer beyond SNAC, or higher PIONEER dose multiplier). | 76 |
| C — feasible weak | Required dose ≤ 1 g/week. Substantial dose escalation needed; useful for scaffold-breadth exploration. | 32 |
| D — §102 prior-art only | Required dose > 1 g/week or primary Kd unavailable. Disclosed for patent foreclosure under §102; not pursued clinically. | 3,610 |
One hundred fifty-eight candidates (Tier A + Tier B = 4.2% of the corpus) are immediately bench-ready or formulation-near. The remaining 95% serve as patent-foreclosure prior art, anticipating composition-of-matter claims in adjacent sequence space.
Tier classification thresholds are pure pharmacology readouts (Kd, RO, dose). They do not expose the underlying Kd-prediction methodology, which remains proprietary to Coracle Research.
What ships in each dossier
Each per-candidate HTML dossier (≈ 22 KB, 3,800 total) carries the following sections:
- Sequence. Full one-letter sequence, length, Aib (U) positions, composition.
- Predicted structure. Topology, helix count, helical residue fraction, mean Chou-Fasman Pα helix propensity. For 1,000 top-ranked candidates, an additional "Predicted 3D atomic structure" section links the deposited PDB file with mean and worst-residue ESMFold v1 pLDDT confidence values. ESMFold v1 (Lin et al. 2023) was chosen over later models (AlphaFold-Multimer, AlphaFold 3) for its open weights and fully reproducible single-chain inference without API gating; receptor-bound conformations are out of scope for this drop and would be the natural next-generation step. For ESMFold inference, Aib (U) was substituted to alanine; the deposited PDBs reflect that substitution. Aib's α-methyl group backbone-restricts dihedrals more tightly than alanine, so the deposited structure is best read as the alanine-substituted backbone scaffold, not the fully Aib-locked geometry of the synthesised candidate. The dossier's "Helical residue fraction" and "Mean helix propensity" columns are computed from the per-class scaffold template plus the actual sequence (using the published Aib Pα=1.50 estimate of Karle et al. 1996 / Toniolo 1989) and are unaffected by this substitution.
- Solvent / environment robustness. Cα RMSD vs aqueous reference for water, lipid bilayer, and D₂O.
- Sequence novelty. Design strategy, nearest published peptide and edit distance, parent reference and substitution count, §102 prior-art declaration with corpus snapshot date.
- Predicted pharmacology. Primary target, selectivity profile, predicted half-life, predicted oral bioavailability, per-receptor predicted Kd, calibration note (median fold-error 2× at primary target across a 9-peptide held-out panel of approved and clinical-stage GLP-1 family analogs: semaglutide, liraglutide, exenatide, lixisenatide, dulaglutide, glucagon, oxyntomodulin, cotadutide, and tirzepatide. Ground-truth Kd values taken from peer-reviewed pharmacology literature; calibration is retrospective, no systematic directional bias).
- Readiness tier. Single-line summary with rationale.
- Mutation map. Position-by-position table of substitutions versus the closest published sequence.
- Synthesis profile and synthesis route. Aggregation index, charge imbalance, hydrophobic fraction, plus a complete Fmoc SPPS recipe with resin, coupling chemistry, cleavage cocktail, non-canonical residue handling (Aib coupling protocol with reagent CAS), and recommended scale.
- Purification and QC. Preparative RP-HPLC, lyophilisation, ESI-MS identity, AAA quantification, analytical HPLC purity ≥ 95% release.
- Binding affinity assay. Universal radioligand-displacement protocol (Cheng-Prusoff IC₅₀ → Kd correction). Per-receptor parameter table specifying cell line, membrane, tracer, buffer, and anchor native peptide Kd from published literature.
- Functional cAMP confirmation. Cisbio HTRF cAMP Gs Dynamic Kit protocol, 384-well format, IBMX-supplemented stim buffer.
- Pharmacokinetic profile. Sema-class developed-form parameters (Vd 0.18 L/kg, t½ 168 h, ka 0.07 1/h with SNAC carrier, F 1.0%) anchored to PIONEER-1, with the assumed development modifications (Aib2 backbone protection plus C18 di-acid lipidation at an internal Lys) called out as a starting point. Unmodified-sequence F and t½ retained as a diagnostic gap.
- Projected weekly dose and receptor occupancy. Steady-state plasma exposure at PIONEER-1 dose, per-receptor RO, required weekly dose to achieve RO=0.6.
- Predicted clinical efficacy. HbA1c reduction at week 26 and body-weight reduction at week 68, projected via single-target E_max model anchored to STEP-1 (Wilding 2021), SURPASS-2 (Frias 2021), SURPASS-3 (Ludvik 2021), SUSTAIN-7 (Pratley 2018), and PIONEER-1 (Aroda 2019) Phase 3 endpoints with full citations.
- Estimated cost and timeline. Synthesis, binding, and functional cAMP costs scaled per candidate; total bench cost band; parallel timeline to first Kd plus EC₅₀ measurement.
- Provenance. Disclosure source, generation timestamp, deterministic candidate identifier.
A single Tier A candidate is bench-validatable for approximately USD 7-12k and 2-3 weeks of parallel synthesis-and-assay work. The dossier carries every catalog reference, vendor SKU, buffer recipe, and protocol step required to execute the validation without further consultation of the originator.
Where to start
After extracting drop_04_dossiers.zip and drop_04_structures.zip alongside the navigation files, the bundle is browseable from a local file viewer; no server is required. A pragmatic reading order for a medchem or clinical-development reader:
- Start with the cover PDF (8 pages) for the narrative overview.
- Open
top-25-overall.htmlfor the highest-composite candidates across all classes. Six are Tier A, sixteen are Tier B, three are Tier C. - Open one of the per-class shortlists for fold-class-specific top-50 picks.
- Filter
summary.csvbyreadiness_tier == "A"to retrieve all 82 bench-ready candidates as a single tabular extract. - Open any individual
CR-D04-NNNN.htmlfor a complete per-candidate dossier, including sequence, predicted pharmacology, synthesis recipe, assay protocols, PK projection, and predicted clinical efficacy.
What this disclosure does not claim
The deposit does not assert any judgment on freedom-to-operate beyond §102 novelty (and the equivalent provisions in EPC Art. 54, UK Patents Act §6, JP Patent Act §29, and CN Patent Law Art. 22) against the surveyed corpus. Method-of-use and method-of-treatment claims, formulation claims, and combination-therapy claims are unaffected by this disclosure. Filings between the corpus-snapshot date and any later assertion of prior art should be re-checked.
The deposit does not claim any in-vivo activity, clinical safety, or therapeutic efficacy. Predicted pharmacology and projected clinical efficacy are computational estimates calibrated to literature; bench validation and clinical trials are required before any therapeutic assertion.
The deposit does not disclose Coracle Research's underlying computational methodology. The pipeline internals, including scoring and design-space exploration logic, are not part of this deposit.