Categories

Computer Science Personal Economics General Mathematics Linguistics Questions Teaching Physics Talks History Theology

Archive

On the Popularity of Certain Numbers.

I searched for each number between 1 and 500 on Google, and recorded the (estimated) number of hits. I’m not aware of anyone having done this before; in any case, I made a chart:

Google search results for numbers 1 to 500

Click on the above chart to see a bigger version. You can also look more closely at the first hundred numbers, or look at the above data with a log scale on the y-axis.

I have some observations and questions:

  • There’s some periodicity in the above data (every 5, every 10, every 100).
  • Can you explain how quickly the distribution falls off (is it exponentially decaying, for instance)?
  • The most popular numbers are, in decreasing order of popularity: 2, 3, 10, 4, 5, 11, 6, 7, 8, 20, 15, 30, 14, 18, 1, 24, 21, 19, 25, 22, 28, 29, 50, and so on.
  • The most popular numbers ending in 0 are, in decreasing order of popularity and having been divided by ten: 1, 2, 3, 5, 10, 4, 8, 9, 7, 20, 6, 50, 15, 12, 30, 25, 11, 40, 13, 18, 16, 14, and so on. Is the distribution of numbers ending in 0 related to the distribution of all numbers?
  • Are certain families of numbers more popular? Are prime numbers or square numbers particularly popular?

You can download my comma-separated data file if you would like to play with the data yourself. Note, however, that I got this data from Google’s SOAP interface, which, for reasons I don’t understand, doesn’t give the same number of “estimated hits” as the web page interface.

Most numbers are boring, asymptotically speaking.

Let $f(n)$ be the number of Google hits for the integer $n$. Then $f(578)$ is about 100 million, and $f(1156)$, that is, the number of hits for a number twice as big, is about 40 million, a bit less than half as big. Doubling the input continues to halve the output: $f(2312)$ is about 20 million (half again!), and $f(4624)$ is about 8 million, and $f(9248)$ is about 4 million.

There are about half as many pages talking about numbers that are twice as big. This is an example of a power law, and indeed, a log-log plot of $f$ looks linear to my blurry vision:

Doing a linear regression in R gives the red line, or in symbols, $$f(x) \approx 5,800,000,000 / x^{1.029}.$$ Rather humorously, this means that $f(a)/f(b) \approx b/a$. In the end, this is not so surprising: Zipf’s law says that, in a corpus of naturally occuring text, the frequency of a word is inversely proportional to its rank; here, we have a similar phenomenon at work: roughly, the popularity of a number is inversely proportional to its size.

In other words, while the number of integers expressible with fewer than $n$ bits grows exponentially in $n$, the number of pages discussing integers expressible with fewer than $n$ bits grows linearly in $n$; being silly, I’d say that this is an asymptotic version of the claim that most large numbers are uninteresting. After all, popular numbers have a lot of fan sites.