Wednesday, May 28, 2008

Common ancestors: analysis

After estimating via simulation the kinship distribution K(n) of our simple population model, let us try to solve the problem analytically. As was stated before, K(n) = 0 for n odd, so we will concentrate only on even values of n. First we will calculate K(n) recursively by solving the cases K(0), K(2) and the inductive step K(n + 2). In what follows N is the (constant) size of any generation and x and y denote arbitrary individuals of the same generation. We also define

s(n) := n·r(n)/∑m m·r(m),
S := ∑ n·s(n),
D(n) := 1 − (K(0) + K(2) + ··· + K(n)) = P(k(x,y) > n).

s(n) expresses the probability that an individual have n siblings including herself, as proved in a prior entry.

K(0): this value is simply the probability that x and y are the same individual:

K(0) = P(x = y) = 1/N.

K(2): this is the probability that x and y are siblings. If x and her siblings add up to n this probability is (n − 1)/N, so in general

K(2) = P(x and y are siblings) =
= ∑n > 0 P(x and her siblings add up to n) P(y is one of x's siblings) =
= ∑n > 0 s(n)(n − 1)/N = (S − 1)/N.

K(n + 2), n ≥ 2: this value is the probability that k(x,y) = n + 2, i.e. x and y have a youngest common ancestor (n + 2)/2 generations before theirs. We can express this probability like this:

P(k(x,y) = n + 2) = P(k(x,y) = n + 2 | k(x,y) > n) · P(k(x,y) > n) =
= P(k(x,y) = n + 2 | k(x,y) > n) · D(n),

where we have just conditioned the event "k(x,y) = n + 2" to the enclosing event "k(x,y) > n".

Now, from the very definition of the kinship proximity function, k(x,y) > n if and only if

k(m(x),m(y)) > n − 2,
k(m(x),f(y)) > n − 2,
k(f(x),m(y)) > n − 2,
k(f(x),f(y)) > n − 2;

so

P(k(x,y) = n + 2 | k(x,y) > n) =
= P([k(m(x),m(y)) = n | k(m(x),m(y)) > n − 2] or
[k(m(x),f(y)) = n | k(m(x),f(y)) > n − 2] or
[k(f(x),m(y)) = n | k(f(x),m(y)) > n − 2] or
[k(f(x),f(y)) = n | k(f(x),f(y)) > n − 2]).

Without further justification (though the final results are consistent), we assume that these clauses are statistically independent:

P(k(x,y) = n + 2 | k(x,y) > n)=
= 1 − P([k(m(x),m(y)) ≠ n | k(m(x),m(y)) > n − 2] and
[k(m(x),f(y)) ≠ n | k(m(x),f(y)) > n − 2] and
[k(f(x),m(y)) ≠ n | k(f(x),m(y)) > n − 2] and
[k(f(x),f(y)) ≠ n | k(f(x),f(y)) > n − 2]) =
= 1 P(k(m(x),m(y)) ≠ n | k(m(x),m(y)) > n − 2) ·
· P(k(m(x),f(y)) ≠ n | k(m(x),f(y)) > n − 2) ·
· P(k(f(x),m(y)) ≠ n | k(f(x),m(y)) > n − 2) ·
· P(k(f(x),f(y)) ≠ n | k(f(x),f(y)) > n − 2).

As kinship proximity is not affected by the individuals' sex (beyond level 0), the four factors have the same value and the expression is equivalent to:

P(k(x,y) = n + 2 | k(x,y) > n)=
1 − (1 P(k(x',y') = n | k(x',y') > n − 2))4,

where x' and y' are arbitrary individuals of the generation preceding x and y. Using basic probability properties we have

P(k(x',y') = n | k(x',y') > n − 2) =
P(k(x',y') = n and k(x',y') > n − 2)/P(k(x',y') > n − 2) =
P(k(x',y') = n)/P(k(x',y') > n − 2) =
K(n)/D(n − 2),

and thus

K(n + 2) = D(n)(1 − (1 − K(n)/D(n − 2))4) =
= D(n)(1 − ((D(n − 2) − K(n))/D(n − 2))4) =
= D(n)(1 − (D(n)/D(n − 2))4)=
= D(n) − D(n)5/D(n − 2)4.

To summarize, the recursive calculation of K(n) can be performed as follows:

  • K(0) ← 1/N
  • D(0) ← (N − 1)/N
  • K(2) ← (S − 1)/N
  • D(2) ← (NS)/N
  • K(n + 2) ← D(n) − D(n)5/D(n − 2)4
  • D(n + 2) ← D(n) − K(n + 2)

We can also provide a closed formula for K(n). The recursive equation

K(n + 2) = D(n) − D(n)5/D(n − 2)4

can be rewritten as

D(n) − D(n + 2) = D(n) − D(n)5/D(n − 2)4

or

D(n + 2) = D(n)5/D(n − 2)4.

If we define

d(n) := ln D(2n)

the equality can be expressed then as

d(n + 2) = 5d(n + 1) − 4d(n),

which is a standard second-order linear recurrence equation. The roots of the associated characteristic polynomial are 1 and 4, so the classical theory of linear recurrence equations tells us that d(n) can be expressed as

d(n) = a + 4n,

or, turning to D(n),

D(n) = A·B2n,

where A and B can be determined by using the known values D(0) and D(2). If we solve this and express K(n) in terms of D(n) we have:

K(0) = 1/N,
K(n) = A · (B2n − 2B2n), n ≥ 2,
A = (N − 1)4/3/(N·(NS)1/3),
B = ((NS)/(N − 1))1/3,

which is the closed expression we were after. This formula matches perfectly the experimental results we had previously obtain via simulation. A further confirmation that the analysis is consistent comes from the fact that ∑ K(n) = 1, as it must be for a probability distribution: ∑ K(n) = 1 − limn → ∞D(n), and D(n) = A·B2n tends to 0 when n → ∞ because B = ((NS)/(N − 1))1/3 < 1 (SR and R = 2 by hypothesis).

1 comment :

  1. Well, as I promised, and with your diagrams help... here you have my little hummingbird, hope you enjoy it!!

    photo 1
    photo 2
    photo 3
    photo 4
    photo 5

    And pls make more models, your origami habilities are great.

    (Related 1, Related 2)

    ReplyDelete