Part 4

Computational Toolkit

Comprehensive collection of computational tools used in STRC variant analysis and hearing loss genetics research. All tools are freely accessible to enable independent genetic variant research.

143

Total tools

45

Verified / Used

49

E1659A tested

AI & Foundation Models

1 tools

AlphaGenome

Google DeepMind's genomic foundation model. Predicts chromatin accessibility, gene expression, splicing, histone modifications, TF binding, and 3D contact maps from DNA sequence.

E1659A: 54,276 scores — Splice quantile 0.997+

Structural Biology

22 tools

AlphaFold 3 Server

Predicts 3D structures of protein complexes (protein-protein, protein-DNA, protein-ligand)

AlphaFold Database

Provides predicted 3D protein structures for nearly every known protein

E1659A: pLDDT 68.75 — Moderate confidence at E1659

DDGun

Untrained algorithm predicting the folding stability impact of amino acid substitutions.

DUET

Integrates mCSM and SDM into a consensus prediction of protein stability upon mutation.

DynaMut

Analyzes mutational impacts on protein dynamics and vibrational entropy.

E1659A: -0.913 kcal/mol — Destabilizing

ELASPIC

Evaluates the effect of mutations on protein folding and protein-protein interactions.

ESMFold

Predicts 3D protein structure from amino acid sequence alone

FoldX

Calculates empirical energy terms for evaluating wild-type and mutant protein stability.

IFUM

Jointly estimates absolute folding stability (DeltaG) and equilibrium structural ensembles.

IUPred3

Predicts intrinsic disorder from protein sequence

Mol Star Viewer

Interactive 3D protein structure visualization

NetGPI

Predicts GPI-anchor signal presence

NetNGlyc

Predicts N-linked glycosylation sites (NXS/NXT motifs)

PoPMuSiC

Predicts changes in thermodynamic stability caused by single site mutations.

PyMOL

Publication-quality 3D protein structure rendering

RoseTTAFold

Predicts 3D structures and models complex multi-protein biological assemblies.

RosettaDDG

Python wrapper automating high-throughput free energy calculations for variants.

SDM

Calculates the difference in stability based on environment-specific amino acid substitution tables.

STRUM

Predicts \Delta\DeltaG using quantitative structure-activity relationship (QSAR) models.

SWISS-MODEL

Builds 3D protein models by homology to known experimental structures

SignalP

Predicts signal peptide presence and type (Sec/SPI, Sec/SPII, Tat/SPI)

mCSM

Predicts stability and binding affinity changes utilizing graph-based spatial signatures.

Variant Effect Prediction

41 tools

Allen Brain Atlas

High-resolution imaging and transcriptomics mapping genetic expression strictly to brain anatomy.

E1659A: N/A — STRC not highly expressed in brain

AlphaMissense

Predicts pathogenicity of missense variants (amino acid substitutions)

E1659A: 0.9016 — Likely Pathogenic

BayesDel

Evaluates coding and non-coding variants utilizing a Bayesian modeling framework.

E1659A: 0.2255 — Damaging

CADD

Scores all variant types: missense, synonymous, intronic, intergenic, UTR, splice

E1659A: PHRED 25.5 — Top 0.3% deleterious

CIViC

Open-source, crowd-sourced database for clinical interpretation of variants in cancer.

Caduceus

Genomic language model leveraging the Mamba architecture with reverse-complement equivariance.

ClinPred

Meta-predictor combining functional VEP scores with clinical allele frequencies.

E1659A: 0.9869 — Very strong pathogenic signal

DANN

Uses deep neural networks to score the deleteriousness of genetic variants.

E1659A: 0.9946 — Highly deleterious

DNABERT-2

Multi-species modeling tool employing byte pair encoding and refined transformer configurations.

ESM1v

Evaluates missense variants using zero-shot protein language modeling.

E1659A: -2.733 (ESM-1v) / -1.718 (ESM-2) — Damaging. E is most preferred residue at position 1659. A ranks 3rd worst of 20.

EVE (Evolutionary model of Variant Effect)

Maps viral and human fitness landscapes via deep generative models.

Eigen

Uses unsupervised spectral approaches to aggregate functional annotations into a single score.

E1659A: N/A — Scores available via dbNSFP, unsupervised genome-wide functional score

Evo / Evo 2

40-billion parameter genomic foundation model predicting and generating tasks across DNA and RNA.

Expression Atlas

Database of gene and protein expression across species and biological conditions.

E1659A: 4562 experiments — STRC expression data across tissues

FATHMM-MKL

Integrates multiple kernel learning to predict the functional consequences of SNVs.

E1659A: 0.9748 — Strong damaging signal

FAVOR

Facilitates the annotation of variants based on their functional consequences.

Franklin

AI-assisted ACMG/AMP variant classification

Galaxy Project

Open-source, web-based platform for highly accessible and reproducible genomic data analysis.

HGMD Professional

Exhaustive, expert-curated database of germline mutations underlying inherited human diseases.

Human Protein Atlas

Spatial omics atlas mapping all human proteins across tissues, blood, and single cells.

HyenaDNA

Genomic sequence model utilizing implicit convolutions to achieve million-token data contexts.

InterVar

Automated application of ACMG/AMP criteria (PVS1, PS1-4, PM1-6, PP1-5, BA1, BS1-4, BP1-7)

E1659A: Deleterious — SIFT 0.0 + PolyPhen 0.991 via Ensembl VEP

LINSIGHT

Estimates the fitness consequences of non-coding mutations.

E1659A: N/A — Non-coding focused, less relevant for coding variant

MPC

Calculates missense badness, PolyPhen-2, and constraint metrics for variant scoring.

E1659A: N/A — Score available via dbNSFP for constrained regions

Mastermind

AI-driven genomic intelligence platform indexing variants from over 11 million full-text articles.

MetaRNN

Prioritizes rare non-synonymous SNVs via recurrent neural networks.

E1659A: 0.8552 — Damaging

MutScore

Assesses the specific fitness effects and loss-of-function potential of genetic variants.

Nucleotide Transformer (NTv3)

Unified foundation model pre-trained on 9 trillion base pairs for molecular phenotype prediction.

Open Targets

Platform supporting the systematic identification and prioritization of therapeutic drug targets.

E1659A: 0.731 — 73 STRC disease associations

PharmGKB

Comprehensive resource detailing how genetic variation directly affects drug response.

E1659A: PA38082 — STRC gene entry exists, no pharmacogenomic interactions

PrimateAI-3D

Evaluates missense variants by integrating 3D protein structures and primate genomics.

E1659A: N/A — Primate-specific pathogenicity, available via SpliceAI Lookup

REVEL

Ensemble score from 13 tools: MutPred, FATHMM, VEST, PolyPhen, SIFT, PROVEAN, MutationAssessor, MutationTaster, LRT, GERP, SiPhy, phyloP, phastCons

E1659A: 0.789 — Pathogenic range

SnpEff

Annotates variants and predicts their localized structural and sequence ontology impacts.

SparkINFERNO

Analyzes variants for broad phenotypic and functional impacts in scalable cloud environments.

Terra.bio

Scalable platform enabling researchers to run bioinformatics tools securely on Google/Azure clouds.

VEST4

Random forest classifier predicting the statistical probability of a variant being pathogenic.

E1659A: 0.5900 — Moderate pathogenic signal

VarSome

Aggregates: ClinVar, gnomAD, REVEL, CADD, SpliceAI, conservation scores, literature

E1659A: VUS — PM2_Supporting + BP1

dbNSFP

30+ predictor scores per variant: REVEL, CADD, SIFT, PolyPhen-2, MutationTaster, FATHMM, GERP, PhyloP, PhastCons, AlphaMissense, and more

E1659A: 40 scores — All extracted via myvariant.info

fitCons

Clusters genomic positions by functional annotations to estimate fitness consequences.

E1659A: N/A — Evolutionary fitness score available via dbNSFP

gMVP

Predicts missense variant pathogenicity employing a sophisticated graph neural network.

E1659A: N/A — Graph-based structural score, requires local computation

variant tools (vtools)

Flexible command-line toolset for the storage, annotation, and dynamic filtering of sequence variants.

Splicing Prediction

12 tools

AbSplice2

Tissue-specific contextual filter estimating the probability of aberrant splicing events.

E1659A: N/A — Not splice-affecting (missense variant)

FRASER2

Intron-centric aberrant splicing caller utilizing empirical RNA-seq data.

GeneSplicer

Detects splice sites in genomic DNA sequences using maximal dependence decomposition.

E1659A: N/A — Not splice-affecting (missense variant)

HAL

High-throughput alternative splicing prediction platform.

MMSplice

Modular modeling framework predicting the usage of cassette exons.

E1659A: N/A — Not splice-affecting (missense variant)

MaxEntScan

Evaluates 5' and 3' splice site strength employing the Maximum Entropy Principle.

E1659A: N/A — Not splice-affecting (missense variant)

NNSplice

Neural network approach to locating consensus splice sites in primary DNA sequence.

E1659A: N/A — Not splice-affecting (missense variant)

Pangolin

Predicts splice site strength and aberrant usage across multiple mammalian tissues.

SPANR

Predicts the percentage of spliced-in (PSI) events across different tissues.

SPiP

Bioinformatics pipeline utilizing statistical thresholds for deep splicing analysis.

E1659A: N/A — Not splice-affecting (missense variant)

SpliceAI

Predicts splice site creation/disruption from DNA sequence

E1659A: Low — Missense, not splice-disrupting

SpliceRover

Deep convolutional neural network for splice site prediction in whole genomes.

Regulatory & Non-Coding

10 tools

Basenji

Predicts cell-type-specific epigenetic and transcriptional profiles from raw DNA sequence.

DeepSEA

Deep learning-based sequence model for predicting the chromatin effects of sequence alterations.

E1659A: N/A — Coding variant, regulatory prediction less relevant

ENCODE

Encyclopedia of DNA elements providing massive functional genomic datasets.

Enformer

Maps DNA sequences to RNA expression and chromatin states using transformer networks.

GTEx Portal

Comprehensive database of tissue-specific gene expression and eQTLs.

HaploReg

Explores functional annotations of the noncoding genome at specific haplotype blocks.

E1659A: N/A — Coding variant, regulatory LD analysis less relevant

PIQ

Computational method modeling the magnitude and shape of genome-wide DNase profiles.

RegulomeDB

Identifies functional DNA features and regulatory elements in non-coding genomic regions.

E1659A: N/A — No regulatory variants (coding region)

Roadmap Epigenomics

Atlas of human epigenomes mapping regulatory elements across multiple cell types.

TLand

Organ-specific machine-learning architecture for prioritizing regulatory variants.

Population Databases

13 tools

ALFA

Allele Frequency Aggregator analyzing dbSNP data across diverse populations.

BRAVO / TOPMed

Variant browser providing allele frequencies for over 868 million variants from whole genomes.

E1659A: Not found — Absent from TOPMed

Biobank Japan

Prospective genome biobank offering summary statistics for ~260,000 Japanese individuals.

E1659A: Not found — Absent from Japanese population GWAS

CMDB

High-quality database containing 9.04 million SNVs from 141,431 healthy Chinese individuals.

ClinGen

Gene-disease validity curation (is STRC definitively linked to hearing loss?)

ExAC

Historical exome aggregation consortium (largely superseded by gnomAD).

KoB KDNA

The National Project of Bio-Big Data in South Korea, projecting 1 million sequenced genomes.

E1659A: Not found — Absent from Korean population data

LOVD

Leiden Open Variation Database providing locus-specific gene variant data.

E1659A: N/A — STRC variants present, E1659A absent

SAGE

Comprehensive repertoire integrating 154 million genetic variants from South Asians.

UK Biobank

Massive biomedical database containing over 500,000 sequenced genomes.

dbSNP

Foundational archive for single nucleotide polymorphisms and multiple small-scale variations.

E1659A: N/A — No rsID assigned

gnomAD

Population allele frequencies across diverse ancestries

E1659A: Not found — Absent from 251K controls (PM2)

seqr

Variant search across 70K+ rare disease cases

Clinical Databases

7 tools

CGAR

Interactive web application for prioritizing clinically implicated variants via ancestry composition.

ClinVar

Archives variant classifications from clinical labs and research groups

E1659A: N/A — Not yet submitted

ClinicalTrials.gov

Registry of 400,000+ clinical studies worldwide

DECIPHER

Maps genomic variants (especially CNVs/SVs) to clinical phenotypes

DGIdb

Drug-Gene Interaction Database mapping genes to potential therapeutic compounds.

GWAS Catalog

Extensively curated database of human genome-wide association studies and summary statistics.

OMIM

Comprehensive gene-disease relationship catalog

Gene-Level Resources

4 tools

ARCHS4

Resource providing uniformly processed gene counts from over 137k GEO/SRA RNA-seq samples.

Ensembl REST API

Variant Effect Predictor (VEP) — predict consequences of variants

UCSC Genome Browser

Visualize any genomic region with dozens of annotation tracks

UniProt

Protein sequences (reviewed SwissProt + unreviewed TrEMBL)

Hearing Loss & Inner Ear

9 tools

AAV Capsid Database

Catalog of natural and engineered AAV capsids

ASGCT Gene Therapy Database

Tracks gene therapy clinical trials globally

AudioGene

Supervised machine learning suite predicting patient genotypes strictly from audiometric data.

E1659A: N/A — Audiogram-to-gene prediction, not variant-specific

DB-OTO

Repository of clinical trial parameters for dual-AAV gene therapy treating OTOF mutations.

Deafness Variation Database

Variant classifications specific to hearing loss genes

HIEDRA

Single-nucleus RNAseq atlas of human inner ear development (1st and 2nd trimesters).

Hereditary Hearing Loss Homepage

Complete list of deafness genes and loci

Sensorion AAV Platform

High-throughput in vivo screening database identifying and validating AAV vectors for the inner ear.

gEAR

Gene Expression Analysis Resource for comparing in vivo cochlear data to in vitro organoids.

Conservation & Evolution

5 tools

Clustal Omega

Multiple sequence alignment (MSA) of protein or DNA sequences

ConSurf

Estimates the evolutionary conservation of amino and nucleic acid positions using precise phylogeny.

EVcouplings

Predicts 3D structural conformations and mutation fitness landscapes from evolutionary sequence covariation.

OrthoDB

Ortholog clusters across 2,000+ species

Rate4Site

Scoring algorithm for amino acid conservation taking into account stochastic evolutionary processes.

Structural Variants & CNV

7 tools

AnnotSV

Compiles functionally and clinically relevant information to interpret SV pathogenicity quickly.

CNVnator

Primary structural variant caller utilizing read-depth signals from whole-genome sequencing.

CNest

Advanced copy number estimator and variant caller designed for large-scale cohort analysis.

ClassifyCNV

Command-line tool calculating the pathogenicity of germline duplications and large deletions.

Delly

Integrates paired-end and split-read analyses to discover precise genomic rearrangements.

GATK gCNV

Broad Institute pipeline for robustly detecting germline copy number variants in WES and WGS data.

Manta

Rapid structural variant caller tailored specifically for high-throughput clinical sequencing pipelines.

Nomenclature & Validation

3 tools

CrossMap

Versatile tool for converting genome coordinates in various file formats (BAM, BED, VCF).

Mutalyzer

Checks sequence variant nomenclature according to strict, up-to-date HGVS guidelines.

E1659A: Valid — NM_153700.2:c.4976A>C confirmed

VariantValidator

Validates HGVS sequence descriptions and flawlessly maps transcript/genomic variant data.

E1659A: Valid — HGVS confirmed, MANE select

Literature Mining

8 tools

Europe PMC

Repository providing comprehensive access to worldwide life sciences articles and preprints.

LitSense

Finds the best-matching sentences given a query via a cutting-edge neural embedding approach.

LitSuggest

Web-based system utilizing AI and machine learning for document classification and literature triage.

LitVar 2.0

Advanced semantic search engine linking genomic variant data in PubMed, PMC, and dbSNP.

E1659A: N/A — E1659A not found in literature

PubMed

Search biomedical literature (abstracts and full text)

PubTator

Text-mining tool annotating entire articles with key biological entities (genes, mutations, diseases).

E1659A: N/A — E1659A not found in PubTator literature

Semantic Scholar

AI-backed search engine focusing entirely on scientific literature and citation graphs.

E1659A: 13 papers — None mention E1659A

tmVar

Complex NLP algorithm extracting sequence variants across both protein and gene levels.

Workflow Platforms

1 tools

Illumina Connected Insights

Comprehensive software enabling AI-assisted somatic oncology variant interpretation and reporting.

Research Impact

143

Computational tools

16

AlphaFold 3 experiments

$50-100

Total AI cost for full analysis

All tools are designed for independent research. Most databases are free with academic access. The complete methodology is documented to enable reproduction by any family facing similar variant uncertainty.