Stephan Bridger: math

Showing posts with label math. Show all posts

Saturday, March 29, 2025

Bootstrapping and the Central Limit Theorem

If you've ever seen a data visualization, you've probably seen a Bell Curve or a normal distribution. But this emergent property of many data visualizations is actually a result of the law of large numbers and the central limit theorem.

The central limit theorem tells us that the distribution of a normalized version of any sample mean will eventually converge to a standard normal distribution.

For example, let's say that we wish to chart the first fifty most popular science-fiction books on Goodreads by the number of pages they contain.

Our initial sample will look something like this:

pageCounts = np.array([
    324, 216, 384, 194, 480, 368, 374, 268, 244, 258, 
    476, 472, 391, 390, 144, 288, 118, 592, 224, 342,
    382, 336, 450, 500, 304, 297, 192, 320, 487, 260,
    250, 525, 182, 275, 400, 576, 518, 318, 208, 256
])

If we want to plot our original sample of books, we could do something like:

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

pageCounts = np.array([
    324, 216, 384, 194, 480, 368, 374, 268, 244, 258,
    476, 472, 391, 390, 144, 288, 118, 592, 224, 342,
    382, 336, 450, 500, 304, 297, 192, 320, 487, 260,
    250, 525, 182, 275, 400, 576, 518, 318, 208, 256
])

plt.figure(figsize=(7, 5))
sns.histplot(page_counts, bins=10, kde=False, color='#1f77b4', edgecolor='black')
plt.title('Histogram of Book Pages')
plt.xlabel('Page Count')
plt.ylabel('Frequency')
plt.savefig("histogram.jpg", dpi=300, bbox_inches='tight')
plt.close()

This will produce a chart like:

But if we want to normalize and bootstrap our dataset, we will have to resample it. Replacement sampling, which we will use in this example, works like this. Let us say that we have a data set of only:

pageCounts = np.array([
	216, 324, 385
])

The resampling process will randomly sample from this set. For example:

Resample #1: [216, 324, 324] -> mean = 288.0
Resample #2: [385, 385, 216] -> mean = 328.67
Resample #3: [324, 216, 216] -> mean = 252.0

If we repeat this process many times, the distribution of our resampled means will approximate a normal distribution, as predicted by the Central Limit Theorem. We can append the following Python code to bootstrap our dataset and graph it:

np.random.seed(42)
num_samples = 10000
bootstrap_means = np.random.choice(page_counts, (num_samples, len(page_counts)),
replace=True).mean(axis=1)

plt.figure(figsize=(7, 5))
sns.histplot(bootstrap_means, bins=30, kde=True, color='#ff7f0e', edgecolor='black')
plt.title('Bootstrapped Distribution of Page Counts')
plt.xlabel('Mean Page Count')
plt.ylabel('Frequency')
plt.savefig("bootstrapped_distribution.jpg", dpi=300, bbox_inches='tight')  
plt.close()

This process is extremely useful for both modeling and hypothesis testing. If we want to make a claim about a dataset, such as page counts of science fiction books — but we only get a small sample of science fiction books to work with—we can use bootstrapping to generate many simulations of the dataset and sample the distribution of the statistic we want to inquire about.

It's important to note that resampling isn't done to estimate the distribution—our sample itself already represents a data model. In this case, it represents page counts of science fiction books.

Rather, by resampling, we approximate the sampling distribution of a given statistic, such as the mean. This may allow us to make inferences about the broader dataset, even when the original sample size is small.

For example, we could additionally assess confidence intervals, which we'll discuss in a future post.

Wednesday, March 26, 2025

Unlearning, or Proof by Contradiction

Sometimes, we have to unlearn the things we initially learned. And I don't mean this in the sense of having been deliberately deceived. Rather, I mean that to some extent, there are actually many situations in life that involve necessary lies—or believing things that are wrong for perfectly rational reasons. Sometimes it is only after we have consumed and digested such a falsehood that we can see the truth at all. Really, this form of learning is not unlike some parts of math.

Consider a mathematical proof in which we begin by assuming that something is one way. But by the end of the proof, we may realize, through contradiction, that it's actually another way.

Let us take the number 2 and generously hypothesize that the square root of 2 is actually rational. If this assumption were true, we should be able to prove it with an equation. Let the square root of 2 be the lowest form of $\frac{p}{q}$.

Since squares of even numbers are even, and squares of odd numbers are odd, it follows that in order to get back to two, we would have to raise both p and q to the power of 2, like this:

\[ 2 = \frac{p^2}{q^2} \]

If we multiply both sides by $q^2$, we can get rid of the denominator. Now we get $ p^2 = 2q^2 $. From here, we must infer that $p$ is an even number.

With our generous assumption that $p$ is even, let us substitute $p$ for $2r$, where r is an integer. Let us test our hypothesis with the following equation, simplifying both sides:

\[ (2r)^2 = 2q^2 \] \[ 4r^2 = 2q^2 \]

Uh oh. Now we've hit a snag. From here, if we attempt to divide by two on both sides, we get:

\[ 2r^2 = q^2 \]

And we cannot further divide or simplify our equation. At least, we can not do so within the realm of rational numbers!

How can this be? Remember, our initial hypothesis that the square root of 2 was rational rested on the assumption that $\frac{p}{q}$ was in its lowest form. But now here we see that $2r^2$ is equal to $q^2$. In other words, both $p$ and $q$ are divisible by 2—which contradicts our original claim that $\frac{p}{q}$ was in lowest terms.

This means they still share a common factor. Thus, neither side is in its lowest form. Proof by contradiction that the square root of two is not rational after all.

Saturday, November 30, 2024

Repetitions are Sequences

When doing a task like working out, a common pattern is to perform something like 100 reps, then 90 reps, then 80, and so on, until you’ve completely counted down to zero. But this pattern can also be expressed arithmetically.

We say that there are 11 terms in this sequence: 100, 90, 80, 70, 60, 50, 40, 30, 20, 10, and 0. Alternatively, we could count the terms by solving:

\[ 0 = 100 - 10 (n - 1) \]

Now, let S represent the sum of all the terms in our sequence, N represent the number of terms, and $ t_0 $ and $ t_1 $ represent the first and last terms of the sequence.

\[ S_n = \frac{n}{2} \cdot (t_1 + t_n) \]

If we're beginning at 100 and counting all the way down to zero, we plug those values into our equation to get the total sum of 550.

\[ S_n = \frac{11}{2} \cdot (100 + 0) = 550 \]

Wednesday, November 13, 2024

Frequentism and Bayesianism

From Frequentism and Bayesianism, A Practical Introduction:

For frequentists, probability only has meaning in terms of a limiting case of repeated measurements. That is, if I measure the photon flux F from a given star (we’ll assume for now that the star’s flux does not vary with time), then measure it again, then again, and so on, each time I will get a slightly different answer due to the statistical error of my measuring device. In the limit of a large number of measurements, the frequency of any given value indicates the probability of measuring that value. For frequentists probabilities are fundamentally related to frequencies of events. This means, for example, that in a strict frequentist view, it is meaningless to talk about the probability of the true flux of the star: the true flux is (by definition) a single fixed value, and to talk about a frequency distribution for a fixed value is nonsense.

For Bayesians, the concept of probability is extended to cover degrees of certainty about statements. Say a Bayesian claims to measure the flux F of a star with some probability P(F): that probability can certainly be estimated from frequencies in the limit of a large number of repeated experiments, but this is not fundamental. The probability is a statement of my knowledge of what the measurement result will be. For Bayesians, probabilities are fundamentally related to our own knowledge about an event. This means, for example, that in a Bayesian view, we can meaningfully talk about the probability that the true flux of a star lies in a given range. That probability codifies our knowledge of the value based on prior information and/or available data.

Friday, August 18, 2023

A Parlor Trick with Primitive Roots

OK, the title is a bit of a pun -- really, the parlor trick uses elementary number theory. A primitive root modulo n however, is an integer g such that every integer relatively prime to n can be expressed as some power of g modulo n. In other words, g can generate all numbers relatively prime to n through its powers.

When dealing with modular arithmetic, cyclic groups, and primitive roots, we find patterns emerge. For example, we can see the powers of 3 are congruent to a cyclic pattern that repeats with numbers modulo 7, the powers of 3 give: 3, 2, 6, 4, 5, 1 — and then it loops back to 3.

\begin{array}{rcrcrcrcrcr}3^{1}&=&3^{0}\times 3&\equiv &1\times 3&=&3&\equiv &3{\pmod {7}}\\3^{2}&=&3^{1}\times 3&\equiv &3\times 3&=&9&\equiv &2{\pmod {7}}\\3^{3}&=&3^{2}\times 3&\equiv &2\times 3&=&6&\equiv &6{\pmod {7}}\\3^{4}&=&3^{3}\times 3&\equiv &6\times 3&=&18&\equiv &4{\pmod {7}}\\3^{5}&=&3^{4}\times 3&\equiv &4\times 3&=&12&\equiv &5{\pmod {7}}\\3^{6}&=&3^{5}\times 3&\equiv &5\times 3&=&15&\equiv &1{\pmod {7}}\\3^{7}&=&3^{6}\times 3&\equiv &1\times 3&=&3&\equiv &3{\pmod {7}}\\\end{array}

This kind of repetition shows up even in something as simple as the last digit of powers. A neat trick is using this modular property to deduce the last digit of a large integer.

For example, consider the integer 7. As we increment the powers of 7, the last digit begins to repeat: 7, 9, 3, 1, and so on.

Those four numbers repeat over and over again. So, we say that it has a cycle length of four:

\begin{align*} 7^1 &\equiv 7 \\ 7^2 &\equiv 49 \\ 7^3 &\equiv 343 \\ 7^4 &\equiv 2401 \\ 7^5 &\equiv 16807 \\ 7^6 &\equiv 117649 \\ 7^7 &\equiv 823543 \\ 7^8 &\equiv 5764801 \\ 7^9 &\equiv 40353607 \\ \end{align*}

Why do powers repeat like this? The answer is modular arithmetic, again. Similar to how primitive roots generate cycles, our last digit is a product of mod 10 -- and there are only ten possible numbers it can be, so it eventually cycles, too.

So, how can we use this knowledge to do a fun parlor trick, like guess the last digit of an extremely large integer, such as $7^{3001}$?

As demonstrated, the powers of 7 have a cycle length of four, so the last digit repeats every 4 numbers. It follows that we first divide our exponent, 3001, by 4.

\[ 3001 \div 4 = 750 \text{, with a remainder of } 1 \]

Since we are left with a remainder, we use it, calculating $7^{1} = 7$. Thus, we know the last digit of the number $7^{3001}$ is 7.

But what if the division has no remainder? For example, let's consider $7^{3000}$. Dividing 3000 by 4 gives:

\[ 3000 \div 4 = 750 \text{, with a remainder of } 0 \]

Since there's no remainder, the exponent is a perfect multiple of the cycle length four, so we look at the last digit of $7^4$, which equals 2401. The last digit of $7^{3000}$ is 1.

Patterns like these show up so often in number theory that once you see them, you can’t unsee them.

Sunday, July 09, 2023

Savagery vs Science

Savagery is sort of the opposite of science, in that it's a kind of impulsive readiness to believe or disbelieve with absolute certainty, often followed by false religious zeal or dogma.

Euler's Continued Fraction in Lisp

In 1748, Leonhard Euler published a formula describing an identity that connected and generalized an infinite series and infinite continued fraction.

James Garfield's Pythagorean Proof

Today I learned James Garfield, who once worked as a lawyer, Civil War General, and served as the 20th President of the United States, was math savvy and published a novel Pythagorean theorem proof.^[1]

\[ \text{Area}_{\text{trapezoid } ACED} = \frac{1}{2} \cdot (AC + DE) \cdot CE = \frac{1}{2} \cdot (a + b) \cdot (a + b) = \frac{(a + b)^2}{2} \] \[ \begin{aligned} \text{Area}_{\text{trapezoid } ACED} &= \text{Area}_{\Delta ACB} + \text{Area}_{\Delta ABD} + \text{Area}_{\Delta BDE} \\ &= \frac{1}{2}(a \times b) + \frac{1}{2}(c \times c) + \frac{1}{2}(a \times b) \end{aligned} \] \[ (a + b) \times \frac{1}{2}(a + b) = \frac{1}{2}(a \times b) + \frac{1}{2}(c \times c) + \frac{1}{2}(a \times b) \] \[ a^{2} + b^{2} = c^{2} \]

Small Pieces

We can take this in smaller pieces. First, we can find the area of the right-angled trapezoid with the following equation:

\[ \text{Area}_{\text{trapezoid}} = \frac{1}{2} \cdot (a + b) \cdot (a + b) = \frac{(a + b)^2}{2} \]

We can find the area of each of the two outer triangles with the following:

\[ \text{Area}_{\text{triangle}} = \frac{ab}{2} \]

And the area of the inner triangle with:

\[ \text{Area}_{\text{inner triangle}} = \frac{c^2}{2} \]

Proof

Reducing, we can go to the end, beginning with our substituted and now simplified area equation demonstrated above:

\[ \frac{(a + b)^2}{2} = 2 \cdot \frac{ab}{2} + \frac{c^2}{2} \]

Then we expand $ (a + b)^2 $ on the left hand side. And our equation on the right can also be simplified since we're both multiplying and dividing $ ab $ by 2:

\[ \frac{a^2 + 2ab + b^2}{2} = ab + \frac{c^2}{2} \]

Multiply both sides by 2 to eliminate denominators:

\[ a^2 + 2ab + b^2 = 2ab + c^2 \]

Lastly, subtract $ 2ab $ from both sides:

\[ a^2 + b^2 = c^2 \]

Footnotes

Mathematical Treasure: James A. Garfield’s Proof of the Pythagorean Theorem ↩︎

Saturday, September 03, 2022

Euler's Equation

Euler's number, e, the constant 2.71828, is the base of the natural logarithms. Given n approaching infinity, Euler's number is the limit of:

\begin{align*}\displaystyle{\displaylines{(1 + 1/n)n}}\end{align*}

It's used frequently abroad across the sciences. It can also be elegantly expressed as an infinite series, like so:

\begin{align*} {\displaystyle e=\sum \limits _{n=0}^{\infty }{\frac {1}{n!}}=1+{\frac {1}{1}}+{\frac {1}{1\cdot 2}}+{\frac {1}{1\cdot 2\cdot 3}}+\cdots .} \end{align*}

Separately, the imaginary unit i, ${\displaystyle {\sqrt {-i}}}$, represents the imaginary solution to the quadratic equation, x² + 1 = 0. The value can also be used to extend real numbers to complex numbers.

And π is pi, the irrational number we all know and love, roughly approximate to 3.14159, representing the ratio of the circle's circumference to its diameter.

While it isn't absolutely understood, we can join the three numbers in a seemingly bizarre proof that just works.

$ {\displaystyle e^{i\pi }=-1} $

Tuesday, August 30, 2022

Abelian Groups

Commutative groups, those groups in which operand order does not change an equation's result, form Abelian groups that commute: "7 × 3 = 3 × 7". When this condition is not satisfied, we say the expression is non-commutative.

Cyclic groups are a special case of commutative Abelian groups—sets that are monogenous—generated by a single element—and invertible with a single operation.

Consider a set that, if we iterated over every other element with a particular operation, we'd be able to derive all of the elements of the set.

For a finite cyclic group, let G be the group, n be the size of the set, and e be the identity element, such that gⁱ = g^j whenever i ≡ j (mod n); like so.

G = {e, g, g², ... , gⁿ⁻¹}

The commutative property also holds over the additive group of ℤ, or the integers, which are isomorphic to any infinite cyclic group. Similarly, the additive group of ℤ/nℤ, integers modulo n, is isomorphic to the finite cyclic group of order n.

Since all cyclic groups commute, they are all abelian groups, and all finitely produced abelian groups are the direct products of cyclic groups.

For example, the powers of 10 form an infinite subset G = {…, 0.001, 0.01, 0.1, 1, 10, 100, 1000, …} over the rational numbers.

With 10 as a generator, set G is a multiplicative cyclic group. For any element a of the group, one can derive log₁₀ a.

Our set contains 10 and 100. The product $10^1 \cdot 10^2$ is equivalent to $10^{1+2} = 1000$. Every cyclic group G is Abelian because if ${\displaystyle x}, {\displaystyle y}$ are in ${\displaystyle G}$, then:

$ {\displaystyle xy=a^{m}a^{n}=a^{m+n}=a^{n}a^{m}=yx}$

This homomorphic property is relevant in cryptography. It's also useful for computing commitments. For example, we can perform operations to verify information, like so. Let m be a message and r be a random value:

$ {\displaystyle C(m_{1},r_{1})\cdot C(m_{2},r_{2})=C(m_{1}+m_{2},r_{1}+r_{2})}$

That is to say, with this special property, one could compute and verify the sums of values without knowing the actual values being committed.

Stephan Bridger