Bayesian updates from negative genetic results
I recently had a genetic panel done. Hundreds of genes were tested and I came back as a carrier for a couple of things. This isn't surprising:
- Lazarin et al (2012) find that "24% of individuals were identified as carriers for at least one of 108 disorders, and 5.2% were carriers for multiple disorders".
- Fridman et al (2021)00088-4) find that "based on 6,447 exome sequences of healthy, genetically unrelated Europeans of two distinct ancestries, we estimate that every individual is a carrier of at least 2 pathogenic variants in currently known autosomal-recessive (AR) genes and that 0.8%–1% of European couples are at risk of having a child affected with a severe AR genetic disorder.
Updates from positive tests
One of the genes I'm a carrier for is associated with Glycogen storage disease type II.
Knowing I'm a carrier means it's straight-forward to use Mendelian genetics to determine that my siblings have a 50% chance of being carriers and my niblings have a 25% chance of being carriers.
The background incidence of this mutation is ~1 in 530, which are also the odds a partner would be a carrier. If we had a child, there'd be a 25% chance said child would inherit the mutation from both of us and be affected as a result (ignoring the mysteriousness of incomplete penetrance).
Multiple mutations can cause GSDII, so its background rate is ~1 in 40,000. Since I only have the one mutation, pre-test the odds that a randomly selected partner and I would have had a child with GSDII were $\frac{1}{530}\cdot\frac{1}{4}=\frac{1}{2,120}$.
Note that this only accounts for a partner carrying the same variant I do. GSDII can also arise from compound heterozygosity — my child inheriting my mutation from me and a different GAA mutation from my partner. Since the disease requires two carrier parents who each pass on the mutation, the disease incidence equals $q^2 \cdot \frac{1}{4} = \frac{1}{40,000}$, giving an overall carrier frequency of $q = \frac{1}{100}$. That's considerably higher than the 1/530 rate for my specific variant, so the total pre-test risk is higher than 1/2,120. Either way, post-test these odds can be reduced to essentially zero if my partner were to do similar testing.
For my siblings, the odds of their children having GSDII with a randomly selected partner are $\frac{1}{530}\cdot\frac{1}{2}\cdot\frac{1}{4}=\frac{1}{4,240}$ and the odds of having a carrier child are $\frac{1}{530}\cdot\frac{1}{2}\cdot\frac{1}{2}=\frac{1}{2,120}$. The additional $\frac{1}{2}$ term being the odds they are also carriers, given Mendelian genetics. This is, of course, much higher than the $\frac{1}{40,000}$. So the test results in a big update for their state of knowledge.
I also tested negative for many things. For autosomal recessive) disorders that means there's a ~0% chance of my children having them sans de novo mutation.
Updates from negative tests
Positive carrier results give clear and significant updates for my siblings. But what about all the things I tested negative for — can my siblings learn anything from those results too? The answer is less obvious and requires a bit of work.
To figure this out, we need to pull out the Bayesian blender and pour (a) the background rate of the condition, (b) my parents' possible genotypes, and (c) Mendelian inheritance into it.
Let's call $q$ the frequency of carriers of a mutation in the population and $p=1-q$ the frequency of non-carriers.
Now, let's work through my parents' potential genotypes. (Note: in the following "non-carrier" means not heterozygous. Strictly, this lumps together homozygous wild-type (AA) and homozygous affected (aa) individuals, but for rare diseases the aa frequency is negligible and the distinction doesn't matter.)
- Both parents non-carriers
- Prior probability: $p^2$
- Probability I'm a non-carrier given this scenario = 1
- Probability my siblings are non-carriers = 1
- One parent's a carrier, one's a non-carrier:
- Prior probability: $2pq$
- Probability I'm a non-carrier given this scenario = 1/2
- Probability my siblings are non-carriers = 1/2
- Both of my parents were carriers
- Prior probability: $q^2$
- Probability I'm a non-carrier given this scenario = 1/4
- Probability my siblings are non-carriers = 1/4
Now, Bayes' theorem says that
P(parental genotype | I'm a non-carrier) =
P(I'm a non-carrier | parental genotype)
* P(parental genotype) / P(I'm a non-carrier)
We have P(I'm a non-carrier) = $p^2\cdot1 + 2pq\cdot\frac{1}{2} + q^2\cdot\frac{1}{4}=p^2+pq+q^2/4$.
Therefore, we have:
- P(both parents non-carriers | Me non-carrier) = $\frac{p^2}{p^2+pq+q^2/4}$
- P(one parent carrier | Me non-carrier) = $\frac{pq}{p^2+pq+q^2/4}$
- P(both carrier | Me non-carrier) = $\frac{q^2}{4}\frac{1}{p^2+pq+q^2/4}$
Now,
P(sib non-carrier) = Σ[P(sibling is non-carrier | parental genotype) * P(parental genotype | I'm a non-carrier)]
plugging it all in we get: $$ P(\textrm{sib non-carrier}) = \frac{p^2 + pq/2 + q^2/16}{p^2+pq+q^2/4} $$
The numerator and denominator are both perfect squares: $p^2 + pq/2 + q^2/16 = (p+q/4)^2$ and $p^2+pq+q^2/4=(p+q/2)^2$, so this simplifies to
$$ P(\textrm{sib non-carrier}) = \left(\frac{p+q/4}{p+q/2}\right)^2 $$
Note that this derivation assumes full siblings sharing both parents; the update differs for half-siblings.
A mutation I don't have is Sandhoff disease which has a carrier frequency of between 1:310 to 1:276
Choosing the higher value, q=1/276, we plug and chug to get P(sib non-carrier)=99.8%. Before I did this test, the population base rate gives the prior odds that my siblings would be non-carriers: p=99.6%. So the test represents a 0.2pp update for them.
Plotting the magnitude of the update across all possible carrier frequencies gives us a sense of when negative results start to matter:
set terminal svg size 800,500 font "Arial,14"
set output "autosomal-recessive-bayesian-update.svg"
set xrange [0:1]
set xlabel "Frequency"
set ylabel "Update given I'm not a carrier (pp)"
plot 100*((((1-x)+x/4)/((1-x)+x/2))**2 - (1-x)) t ""
From this we see that we'd need a carrier frequency of 20-40% before we got significant updates.
We can also use this to double-check our work. At 0% frequency the update goes to 0 as it should. At the other extreme, consider q=99.99%. In this case both parents are almost certainly carriers, and my sibling's prior probability of being a non-carrier is essentially p ≈ 0%. My negative test is surprising — it's strong evidence that the Aa × Aa parental cross can produce wild-type offspring — and bumps the sibling's posterior up to roughly 25%, an update of about 25pp.