Evolution of Books – Words, Pages and “Book multiplication law”


Accidentally found couple of nice charts I did once during the work on linguistic data and (as usual) thought that it worth sharing. REALLY (there is a new LAW at the end)!

Dataset: Google ngram (ngram 1) total counts data (here)

What we will get here and work on is a historic data of printed books (scanned by Google) from 1500 till today (English).

Simply dividing match_count by page_count we can get historic average #words-per-page:


We can see that there is a “stabilizing” waves behavior towards 200 words per page – probably due to technological evolution of printing and adjustment to optimal human interaction. Now, in the world of soft books, the meaning of “# of pages” lasts in a past and interesting when (maybe already today) writers will not count pages at all and editors will leave a page granularity only for special cases?

By dividing page_count by volume_count we can get historic average #pages-per-book:


We can see that there is a peak of pages-per-book at beginning of 19th century (peak at 1811 with 765 pages per book – excluding disperse chart at the beginning which is described later) and then books evolution pushed the trend asymptotically towards 500 pages per book (in average).  Probably driven by human behavior (writer’s payoff granularity, size of hand, fear of fat books in store), the # of pages per book is also going to change in the next decade, so only try to realise the Truth… There is no spoon...

And finally by dividing match_count by volume_count we can get historic average #words-per-book:


What can we see here? Same wave-like evolutional transition, stabilizing around 100kwords per book. Question is – how/if digitalizing of books is going to change it? Will it be reduced as marketing forces would drive higher granularity? Will it grow as multi-series books will be merged into big virtual rollups?

Dispersed chart at the beginning is caused by small amount of books (high statistical variability):


Now amazing part is that log(# of books) is absolutely linear:


Equation: “log10(# of books) = 0.0123*#Years +0.0177”

So here comes the “Book multiplication law“: “Amount of books multiplies 10 times every 80 years

Will it be ever saturated? But… After all what matters is a quality and not a quantity, right? 😉


Author: Andrey Gabdulin Product Development

2 thoughts on "Evolution of Books – Words, Pages and "Book multiplication law"

