My friends Jackson and Monica recently had their second baby. (Congratulations! 🥳) Obviously, this means their first child has a new younger brother. What a privilege!

But how much of a privilege is it? Exactly how many people have younger brothers? I thought this would be fairly easy to look up or estimate, but it turned out to be pretty tricky. A quick Google search didn’t give me the statistic I wanted, so I had to do some thinking.

A Quick Heuristic

First, I thought about families that I know. I am the youngest of five, so that means my four siblings have a younger brother, and I don’t. That’s 4/5 of us that have a younger brother. My brother has five kids, and their youngest is a boy. Another 4/5. My sister has two kids, and the youngest is a girl, so her family has 0/2. My girlfriend has three siblings, and the youngest is a boy, so they’re 3/4. Her sister has two kids, and the youngest is a boy, so they’re 1/2. Between these families, 12 people have younger brothers out of 18 people total, so we’re at 67% from that quick heuristic.

These are probably bigger-than-average families from a clearly biased sample. The percentage is probably lower than that, but how much lower?

Actual Math

Given a number of siblings $N$, we can write a formula for the probability people from those families having younger brothers:

$$ p_N = \frac{1}{N2^N} \sum_{n=0}^{N-1}{n2^n} $$

This is very easily translated to Python code as follows

def probabilityGivenFamilySize(N):
    """
    Computes the probability that a person from a family with N children has a younger brother
    """
    m = 0
    for n in range(1, N):
        m += n * 2**n
    return m / (N * 2 ** N)

for i in range(1,10):
    print(f"| {i} | {probabilityGivenFamilySize(i):.3f} |")

Here are the probabilities in table form:

$N$ $p_N$
1 0.000
2 0.250
3 0.417
4 0.531
5 0.613
6 0.672
7 0.717
8 0.751
9 0.778

Seeing this, I thought that the probability is almost certainly less than 50% since most families have three or fewer kids. But then you have to consider that families with four kids represent four times as many people as the same number of families with one kid. I need actual data to properly compute the weighted average.

Actual Data

This blog post has some interesting information and some nice plots about family composition. I used this data from the US Census for the number of children living in various types of households.

Sibling Plot

If we estimate the number of children living in each household, we get about 30% of kids under 18 are currently living with a younger brother.

Honestly, that surprised me to see a percentage that high after seeing the table. Not only that, this is almost certainly an underestimate of the original statistic that I wanted to know.

  • The 3+ families lumped together is certainly an under-representation of very large families (like my own).
  • This also wouldn’t count families where older children have moved out. I would have been counted as an only child for 6 years according to these data, yet I have four older siblings with a younger brother.
  • The age span between the oldest and youngest will mean that the number of children will skew low compared to the number of siblings a child actually has. This skew is also worse for large families.
  • Two siblings under 18 who are living apart will count as two “zeros” instead of two “ones”.
  • And with parents having on average fewer children than the number of children their families had, the older generations will certainly have a higher percentage probability of having younger brothers than the current generation.

Clearly this isn’t exactly the dataset I want, but it does provide a good lower bound for the question I asked.

So…what’s the answer?

I don’t know. It’s greater than 30%, though. 🤷 Maybe 35%? Or even 40%?

If you have any ideas for how I could improve this analysis, let me know!