If the N-terminal half of stereocilin is intrinsically disordered and functionally dispensable, removing it creates a truncated protein that fits in a single AAV vector. 20 AlphaFold 3 experiments plus six independent web prediction tools (SignalP, NetGPI, DeepLoc, NetNGlyc, IUPred3) support this hypothesis.
STRC's coding sequence is 5,325 bp. An AAV vector (with promoter, ITRs, polyA) has ~4,400 bp of cargo space. The gene is 925 bp too large. Current approaches split it across two vectors and hope both reach the same cell.
Iranfar et al. (2026) rescued hearing in DFNB16 mice using dual-AAV: ~60% of outer hair cells got both vectors. But mice have 3.5 mm cochleas with ~3,300 OHCs. Humans have 35 mm cochleas with ~12,000 OHCs. Our gamma-Poisson stochastic model, calibrated from both Omichi 2020 data points, predicts:
A single-vector approach eliminates the co-transduction bottleneck entirely. Every cell that gets the vector gets the full gene. The 2.8x advantage at R=50% grows to 4.7x at R=30%, and the gap widens further at lower titers.
AlphaFold 3 predicts stereocilin's structure with wildly different confidence along the protein. The N-terminal half (residues 1-615) has no stable 3D structure at all (pTM 0.27, 38% disordered). The C-terminal half folds into a well-defined structure. If the disordered region is functionally dispensable, removing it brings the gene into single-AAV range.
Mini-STRC doesn't just survive truncation — it folds significantly better without the N-terminal. The disordered half was actively destabilizing the protein. This is unusual and provides strong computational evidence that the truncation is not just safe, but beneficial.
Beyond the original mini-STRC, we tested increasingly aggressive truncations. All three produce well-folded proteins that fit in a single AAV.
| Construct | Residues | Length | CDS | AAV headroom | pTM | Disorder |
|---|---|---|---|---|---|---|
| Mini-STRC conservative | 616-1775 | 1,160 aa | 3,480 bp | 920 bp | 0.81 | 7% |
| Shorter mini-STRC recommended | 700-1775 | 1,076 aa | 3,228 bp | 1,472 bp | 0.86 | 4% |
| C-term only aggressive | 1075-1775 | 701 aa | 2,103 bp | 2,597 bp | 0.87 | 6% |
| Delta LRR linker not recommended | custom | 989 aa | 2,967 bp | 1,733 bp | 0.80 | 8-12% |
Truncation solves the size problem, but creates a new one. Stereocilin is a GPI-anchored surface protein: it needs a signal peptide to enter the ER, get glycosylated, receive its GPI anchor, and reach the cell surface. The native signal peptide (residues 1-26) is in the removed N-terminal. No signal peptide = protein stays cytoplasmic = doesn't work.
Solution: prepend an exogenous signal peptide to the construct. The IgK signal peptide (20 aa, 60 bp) is the most widely used in gene therapy vectors and adds minimal size. The C-terminal GPI-anchor signal (hydrophobic tail + omega site ~S1763) is preserved intact in all constructs.
Full-length stereocilin has 14 predicted N-linked glycosylation sites (NxS/T motifs). Truncation removes sites in the N-terminal:
Impact unknown. The lost N-terminal glycosylation sites may affect protein folding, ER quality control, or tectorial membrane anchoring. This is the primary experimental risk.
We tested mini-STRC (616-1775) against every relevant cochlear partner. No direct protein-protein interaction was detected. Critically, full-length STRC also doesn't interact in these assays, confirming that negative results are not truncation artifacts but AF3's inability to model membrane-associated or glycosylation-dependent interactions. The shorter construct (700-1775) is a strict subset of mini-STRC: it removes 84 more N-terminal residues that are disordered in all models. Interaction results transfer directly.
| Experiment | ipTM | Cross-PAE | Verdict |
|---|---|---|---|
| Full STRC + TMEM145 | 0.47 | 8.6 A | Low confidence |
| Mini-STRC + TMEM145 | 0.43 | — | Truncation doesn't hurt (0.47 → 0.43) |
| Mini-STRC + Piezo2 CED | 0.30 | 14-17 A | No interaction (expected) |
| Mini-STRC + Otoancorin | 0.29 | 17-21 A | No direct contact |
| Mini-STRC + Tectorin ZP | 0.24 | 16-17 A | No binding |
| Mini-STRC + TMC1 | 0.20 | 19-21 A | No interaction (expected) |
| Mini-STRC homodimer | 0.20 | 20-30 A | No dimerization |
| Full STRC homodimer (control) | 0.24 | 26-29 A | Also no dimer — truncation didn't cause this |
| NFATC1 + Calcineurin A/B (positive control) | 0.73 | 0.9-2.1 A | ✓ Known complex correctly detected |
Key insight: The positive control (NFAT + Calcineurin, ipTM 0.73-0.91) proves our methodology works. The negative results across all partners suggest stereocilin functions through membrane-associated or glycosylation-mediated interactions that AF3 cannot model in solution. This is consistent with stereocilin being a GPI-anchored glycoprotein embedded in the tectorial membrane matrix.
This is not a new idea. The dystrophin gene (11,000 bp) was far too large for AAV. Researchers created "micro-dystrophin" by removing non-essential spectrin-like repeats, fitting it into a single AAV vector. Sarepta's SRP-9001 (delandistrogene moxeparvovec) using this approach was FDA-approved in 2023.
Removing the N-terminal half of stereocilin also removes its native signal peptide (residues 1-26). Without a signal peptide, the protein cannot enter the endoplasmic reticulum, which means no GPI-anchor attachment, no glycosylation, no surface display. The protein would be stuck in the cytoplasm, functionally dead.
We solve this by prepending an IgK signal peptide (immunoglobulin kappa chain leader, 21 aa). IgK is the most widely used exogenous SP in gene therapy vectors because it's short (63 bp), efficiently cleaved, and well-characterized across cell types.
To validate computationally, we submitted three sequences to SignalP 6.0 (DTU Health Tech), the gold-standard neural network for signal peptide prediction. It classifies proteins as SP/non-SP and predicts the exact cleavage site.
| Construct | SP probability | Cleavage site | Cleavage Pr | Status |
|---|---|---|---|---|
| Full STRC (native, 1775 aa) | 93.4% | pos 24-25 | 0.689 | ✓ SP detected |
| IgK-SP + shorter mini-STRC (our construct) | 99.97% | pos 20-21 | 0.978 | ✓ SP detected |
| Shorter mini-STRC (no SP, control) | 0.00% | — | — | ✗ No SP |
Result: The IgK signal peptide is recognized with 99.97% confidence, higher than even the native STRC signal peptide (93.4%). Cleavage is predicted cleanly at position 20-21 with probability 0.978. The negative control (no SP) scores exactly 0%, confirming that without a signal peptide the construct has no ER entry pathway.
IgK-SP + shorter mini-STRC (our construct)
Full STRC (native signal peptide)
SignalP 6.0 probability plots. Green: signal peptide region. Red: cleavage site. The IgK construct shows a sharper, more confident cleavage prediction than native STRC.
STRC's native signal peptide is unusually long (26 aa) with a hydrophobic core of 17 consecutive leucines. While functional, this is atypical: most Sec/SPI signal peptides are 15-22 aa with a mixed hydrophobic core. The IgK SP (21 aa, METDTLLLWVLLLWVPGSTGD) follows the canonical n-h-c pattern: charged N-terminus (M-E), hydrophobic core (TLLLWVLLLWV), polar c-region (PGSTGD) with a clean Ala-x-Ala motif at the cleavage site. This makes it one of the most reliably processed signal peptides in mammalian expression systems.
We established residue 700 as the cut point based on the pLDDT disorder dip at positions 685-694. But is 700 truly optimal? We tested four truncation boundaries (±50 residues) and also verified that prepending the IgK signal peptide doesn't destabilize the fold.
| Cut point | Residues | CDS (bp) | pTM | Ranking | Disorder | AAV headroom |
|---|---|---|---|---|---|---|
| 650 | 650-1775 | 3,378 | 0.84 | 0.88 | 7% | 1,322 bp |
| 680 | 680-1775 | 3,288 | 0.84 | 0.87 | 6% | 1,412 bp |
| 700 ★ | 700-1775 | 3,228 | 0.86 | 0.88 | 4% | 1,472 bp |
| 720 | 720-1775 | 3,168 | 0.86 | 0.89 | 4-5% | 1,532 bp |
Result: Residues 700 and 720 are both optimal (pTM 0.86), while cutting at 650 or 680 includes the disorder dip (residues 685-694, pLDDT 31-39) and gives worse results. This confirms the pLDDT-based domain boundary. The cut at 700 sits at the exact recovery point where structural confidence returns.
| Construct | pTM | Ranking | Disorder |
|---|---|---|---|
| Shorter mini-STRC (700-1775), no SP | 0.86 | 0.88 | 4% |
| IgK-SP + shorter mini-STRC (700-1775) | 0.85 | 0.88 | 5% |
The 21-amino acid IgK signal peptide has no meaningful effect on the fold (pTM 0.86 → 0.85, within noise). This is expected: the SP is cleaved co-translationally in the ER and is not part of the mature protein.
Stereocilin is a GPI-anchored protein: a glycosylphosphatidylinositol (GPI) lipid is attached post-translationally to tether it to the outer surface of the hair cell membrane. This is essential for its function at stereocilia tips and for forming horizontal top connectors with the tectorial membrane. The GPI signal is encoded in the C-terminal ~25 amino acids, which all mini-STRC constructs preserve intact.
We submitted three constructs to NetGPI 1.1 (DTU Health Tech), a neural network that predicts GPI-anchor signals and omega sites (the residue where GPI attachment occurs).
| Construct | Prediction | Omega site | Likelihood |
|---|---|---|---|
| Full STRC (1775 aa) | GPI-Anchored | S1749 | 0.471 |
| IgK-SP + shorter mini-STRC | GPI-Anchored | S1071* | 0.471 |
| Shorter mini-STRC (no SP, control) | GPI-Anchored | S1050* | 0.471 |
* Position in construct numbering. Corresponds to S1749 in full STRC (same residue, different offset due to truncation).
Result: All three constructs are predicted as GPI-anchored with identical likelihood (0.471) and the same omega site (serine). The GPI signal is in the C-terminal tail, which is fully preserved in all truncation variants. N-terminal removal has zero effect on GPI-anchor prediction. Even the no-SP control is predicted GPI-anchored, though without a signal peptide it would never reach the ER where GPI attachment actually occurs.
Where does the protein end up inside the cell? DeepLoc 2.1 (DTU Health Tech) uses a protein language model (ProtT5) to predict subcellular localization and membrane association from sequence alone. For a GPI-anchored extracellular protein like stereocilin, we expect: extracellular localization + lipid anchor (GPI) or soluble.
| Construct | Top localization | Probability | Membrane type | Lipid anchor | SP detected |
|---|---|---|---|---|---|
| Full STRC | Extracellular | 74.1% | Soluble | 28.4% | ✓ Yes |
| IgK-SP + mini-STRC | Extracellular | 57.4% | Lipid anchor | 72.1% | ✓ Yes |
| No SP (control) | Cell membrane | 42.9% | Lipid anchor | 71.4% | ✗ Artifact |
Result: Our construct (IgK-SP + mini-STRC) is correctly predicted as extracellular with lipid anchor (GPI) association. The lipid anchor probability is actually higher for mini-STRC (72.1%) than full STRC (28.4%), suggesting that removing the disordered N-terminal domain makes the GPI signal more prominent to the model. The no-SP control loses extracellular localization (drops to Cell membrane), confirming the signal peptide is essential for proper trafficking.
IgK-SP + shorter mini-STRC (our construct)
Full STRC (native)
DeepLoc 2.1 attention maps showing which residues contribute to localization prediction. Note the strong N-terminal signal peptide attention and C-terminal GPI signal.
Truncating the N-terminal removes 9 of 14 potential N-glycosylation sites. Does this matter? NetNGlyc 1.0 (DTU Health Tech) uses neural networks trained on known glycoproteins to score each Asn-Xaa-Ser/Thr sequon. Scores above 0.5 = predicted glycosylated; jury agreement (x/9 networks) indicates confidence.
Full STRC: 14 sequons, 13 predicted glycosylated
| Position | Sequon | Score | Jury | Status | In mini-STRC? |
|---|---|---|---|---|---|
| N65 | NISS | 0.757 | 9/9 | +++ | Lost |
| N202 | NATG | 0.520 | 5/9 | + | Lost |
| N297 | NLSW | 0.653 | 8/9 | + | Lost |
| N366 | NFSI | 0.628 | 7/9 | + | Lost |
| N427 | NLSF | 0.641 | 9/9 | ++ | Lost |
| N476 | NETL | 0.707 | 9/9 | ++ | Lost |
| N540 | NDTM | 0.570 | 8/9 | + | Lost |
| N565 | NDTC | 0.619 | 9/9 | ++ | Lost |
| N656 | NCSF | 0.553 | 8/9 | + | Lost |
| N696 | NPSS | 0.577 | 7/9 | + PRO-X1 | Lost (cut zone) |
| N824 | NDSV | 0.452 | 4/9 | − | Retained |
| N916 | NQSV | 0.648 | 8/9 | + | ✓ Retained |
| N964 | NGTL | 0.598 | 8/9 | + | ✓ Retained |
| N1179 | NLTL | 0.678 | 8/9 | + | ✓ Retained |
| N1274 | NESI | 0.517 | 6/9 | + | ✓ Retained |
Result: All 5 retained glycosylation sites in mini-STRC are confidently predicted (scores 0.52-0.72, jury 6-9/9). The IgK signal peptide doesn't affect glycosylation scores (within 0.003 of no-SP control). The only site below threshold in the entire protein (N824, 0.452) happens to be in the retained region but is likely not glycosylated anyway. N696, the last site before our cut point, sits in the disorder zone (IUPred3 score 0.82) and has a PRO-X1 warning, suggesting it's a poor substrate in vivo even in the full protein.
9
Sites lost (N-terminal)
All in disordered region
5
Sites retained
All high-confidence
0.66
Avg retained score
vs 0.63 full protein avg
NetNGlyc 1.0 (Gupta & Brunak 2002). Job ID: 69BB95040023E1E80CE6DABF. All 3 constructs submitted as FASTA.
AlphaFold 3 pLDDT scores showed the N-terminal is disordered, but that's one method. IUPred3 (ELTE Budapest) uses energy-based pairwise potentials to predict intrinsic disorder from sequence alone. It's algorithmically independent from AlphaFold. If both methods agree, the disorder finding is robust.
IUPred3 long disorder prediction for Stereocilin (UniProt Q7RTU9). Scores above 0.5 (dashed line) = disordered. The N-terminal shows multiple disorder peaks; the C-terminal (our mini-STRC, residues 700-1775) is almost entirely ordered.
| Region | Residues | Avg IUPred3 | % disordered | AF3 avg pLDDT | Agreement |
|---|---|---|---|---|---|
| N-terminal | 1-114 | 0.220 | 12.3% | 40.9 | ✓ Both low |
| Middle | 115-615 | 0.254 | 9.6% | 50.2 | ✓ Both moderate |
| Transition zone | 616-699 | 0.282 | 25.0% | 60.2 | ✓ Both worst |
| Mini-STRC | 700-1775 | 0.187 | 2.5% | 71.2 | ✓ Both ordered |
| C-terminal core | 1075-1775 | 0.184 | 3.9% | 73.0 | ✓ Both ordered |
AlphaFold 3 pLDDT
Res 691: 31.2 (minimum)
Res 700: 57.1 (recovery begins)
Res 714: 78.0 (confident)
IUPred3 disorder score
Res 691: 0.819 (disordered)
Res 700: 0.263 (ordered)
Res 714: 0.030 (highly ordered)
Disordered stretches (IUPred3 score > 0.5, length ≥ 3 residues)
Red: in truncated region (removed). Blue: in mini-STRC (retained, minor). The 676-696 stretch (21 residues) is the longest disordered region and sits exactly at our cut zone, confirming the truncation boundary is optimal.
Conclusion: Two algorithmically independent methods (AlphaFold 3 deep learning + IUPred3 energy-based potentials) produce convergent disorder profiles. Both identify residue 691 as the disorder peak and residue 700 as the order boundary. The mini-STRC region (700-1775) is only 2.5% disordered by IUPred3 versus 25.0% in the transition zone. This cross-validation rules out the possibility that the disorder finding is an artifact of any single algorithm.
IUPred3 (Erdős et al. 2021, Nucleic Acids Research). UniProt Q7RTU9 (Stereocilin, Homo sapiens, 1775 aa). Long disorder mode, medium smoothing.
Primary candidate: Shorter mini-STRC (residues 700-1775). Best fold-to-coverage ratio (pTM 0.86, 4% disordered). 1,472 bp AAV headroom allows any promoter, full UTRs, and optimal regulatory elements. Retains both the LRR domain and the C-terminal functional core.
Backup: C-term only (1075-1775). If experiments show the LRR domain is dispensable, this construct offers the best fold (pTM 0.87) and massive headroom (2,597 bp). But it removes 60% of the protein, so functional validation is critical.
What we've established computationally:
Each one either de-risks the construct or reveals a problem before lab experiments.