Tried to do some research based on amount of results that dear Google is providing and found that it… does not work. Well, actually it presents inconsistent results, so I am not sure what to choose.
Want some details? Enjoy:
When you are searching for anything within Google, it presents approximate amount of results based on crawler indexing (and apparently a caching of this indexing). So you can see it as “About xxx results (0.yy seconds)”. Even realizing that this is just a number that represents some snapshot-in-time, I would expect it to do so.
I wanted to check amount of results for let’s say “a”. The first result gave me amount of 25,200,000,000 (lets call from here and on in millions i.e. 25,200M). Then I got to the page 10 (in x10 results per page) and this number has dropped to 11,860M. When I got back to the first page, the amount has remain 11,860M (for the first page) ever since then. Let’s say it was my own glitch (possibly in mind), BUT I have already captured the value and the insisted to get a proper result. So So I found an article in which Google “explains” similar inconsistency behavior by fact that it “caches” results amount and the way to present “more accurate” result is to run more complex (non cached) search. It might even increase amount of results since this drilling down could get to bigger number then the cached one. Well – not cool, but as long as it is logical, we can get some workaround. Unfortunately it is not the case…
As for the date 24/08/14, the behavior is as following:
For example let’s search “abc” by excluding some totally weird (negligible/never-existing) string. let’s say “egdfvbgxzgfad”:
Let’s capture 192M results for the first run
Let’s double check that there are no “egdfvbgxzgfad” results:
Ok. For the first check, let’s go to the page #10 and enjoy the changed result to 710M
And even back to the first page. It might be explained by the same reason of caching and “new update” from refrash of the caching table that was caused by “refresh” as you are the first one that got to page #10 since time when the result was 192M:
Now let’s add our “cache refreshing” zero-availability string and see that the number has been grown by 8M results:
This is weird, but… what is 8M near 710M! Embrace the approximations as the way of life! One more step to ensure that we stay stable – lets add something to the end of “egdfvbgxzgfad” – for example “1” (which should not impact too much on our check). BUT:
We’ve got now 2330M results! Waaaait. There is a NUMBER there and not something like “Huge amount of results that you will never be able to read”. Playing with this a bit more, you can finally find out that there is some duality in results, so every time you change a bit the exclude token, the results switches between 718M to 2330M:
Well… A golden rule of “If you cannot ensure the number, do not use it” is applicable not only for small start-up companies, but for giants like Google as well.
I guess I would have to abandon this data source. 😦
P.S. And if you go to the last page (yes, it is closer then you think), this amount is dramatically dropping to prosaic 106 results (it goes sometimes to 400 results in other searches)…
At the bottom though you can find a remark:
It helps… to extend the amount of pages to 300:
Oh, Google. It is so impressive to see those astronomic numbers of pages! So much opportunities and unexplored! Great marketing move for the product that is actually exposing 15^-6% of the promised web ocean depth.