Jim Fowler - Blog

On the Popularity of Certain Numbers.

I searched for each number between 1 and 500 on Google, and recorded the (estimated) number of hits. I’m not aware of anyone having done this before; in any case, I made a chart:

Click on the above chart to see a bigger version. You can also look more closely at the first hundred numbers, or look at the above data with a log scale on the y-axis.

I have some observations and questions:

There’s some periodicity in the above data (every 5, every 10, every 100).
Can you explain how quickly the distribution falls off (is it exponentially decaying, for instance)?
The most popular numbers are, in decreasing order of popularity: 2, 3, 10, 4, 5, 11, 6, 7, 8, 20, 15, 30, 14, 18, 1, 24, 21, 19, 25, 22, 28, 29, 50, and so on.
The most popular numbers ending in 0 are, in decreasing order of popularity and having been divided by ten: 1, 2, 3, 5, 10, 4, 8, 9, 7, 20, 6, 50, 15, 12, 30, 25, 11, 40, 13, 18, 16, 14, and so on. Is the distribution of numbers ending in 0 related to the distribution of all numbers?
Are certain families of numbers more popular? Are prime numbers or square numbers particularly popular?

You can download my comma-separated data file if you would like to play with the data yourself. Note, however, that I got this data from Google’s SOAP interface, which, for reasons I don’t understand, doesn’t give the same number of “estimated hits” as the web page interface.

Posted: Sunday, December 3, 2006 6:36:47am
Category: Personal
Permalink and Comments

Most numbers are boring, asymptotically speaking.

Let $f(n)$ be the number of Google hits for the integer $n$. Then $f(578)$ is about 100 million, and $f(1156)$, that is, the number of hits for a number twice as big, is about 40 million, a bit less than half as big. Doubling the input continues to halve the output: $f(2312)$ is about 20 million (half again!), and $f(4624)$ is about 8 million, and $f(9248)$ is about 4 million.

There are about half as many pages talking about numbers that are twice as big. This is an example of a power law, and indeed, a log-log plot of $f$ looks linear to my blurry vision:

Doing a linear regression in R gives the red line, or in symbols, $$f(x) \approx 5,800,000,000 / x^{1.029}.$$ Rather humorously, this means that $f(a)/f(b) \approx b/a$. In the end, this is not so surprising: Zipf’s law says that, in a corpus of naturally occuring text, the frequency of a word is inversely proportional to its rank; here, we have a similar phenomenon at work: roughly, the popularity of a number is inversely proportional to its size.

In other words, while the number of integers expressible with fewer than $n$ bits grows exponentially in $n$, the number of pages discussing integers expressible with fewer than $n$ bits grows linearly in $n$; being silly, I’d say that this is an asymptotic version of the claim that most large numbers are uninteresting. After all, popular numbers have a lot of fan sites.

Posted: Sunday, December 10, 2006 8:03:55am
Category: Personal
Permalink and Comments

Categories

Archive

On the Popularity of Certain Numbers.

Most numbers are boring, asymptotically speaking.