Overlap in the Web Search Results of Google and Bing

Authors

  • Rakesh Agrawal Data Insights Laboratories
  • Behzad Golshan Boston University
  • Evangelos Papalexakis Carnegie Mellon University

DOI:

https://doi.org/10.1561/106.00000005

Abstract

Google and Bing have emerged as the diarchy that arbitrates what documents are seen by Web searchers, particularly those desiring English language documents. We seek to study how distinctive are the top results presented to the users by the two search engines. A recent eye-tracking has shown that the web searchers decide whether to look at a document primarily based on the snippet and secondarily on the title of the document on the web search result page, and rarely based on the URL of the document. Given that the snippet and title generated by different search engines for the same document are often syntactically different, we first develop tools appropriate for conducting this study. Our empirical evaluation using these tools shows a surprising agreement in the results produced by the two engines for a wide variety of queries used in our study. Thus, this study raises the open question whether it is feasible to design a search engine that would produce results distinct from those produced by Google and Bing that the users will find helpful. 

References

D. Antoniades, I. Polakis, G. Kontaxis, E. Athanasopoulos, S. Ioannidis, E. P. Markatos, and T. Karagiannis. we.b: The web of short URLs. In 20th international conference on World Wide Web, pages 715–724. ACM, 2011.

B. W. Bader and T. G. Kolda. Efficient matlab computations with sparse and factored tensors. SIAM Journal on Scientific Computing, 30(1):205–231, 2007.

B. W. Bader and T. G. Kolda. Matlab tensor toolbox version 2.2. Albuquerque, NM, USA: Sandia National Laboratories, 2007.

J. Bar-Ilan. Search engine ability to cope with the changing web. In Web dynamics, pages 195–215. Springer, 2004.

Z. Bar-Yossef, I. Keidar, and U. Schonfeld. Do not crawl in the DUST: different urls with similar text. ACM Transactions on the Web, 3(1):3, 2009.

K. Bharat and A. Broder. A technique for measuring the relative size and overlap of public web search engines. Computer Networks and ISDN Systems, 30(1):379–388, 1998.

D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. the Journal of machine Learning research, 3:993–1022, 2003.

R. Bro and H. A. Kiers. A new efficient method for determining the number of components in parafac models. Journal of chemometrics, 17(5):274–286, 2003.

A. Broder. A taxonomy of web search. ACM Sigir forum, 36(2):3–10, 2002.

C. D. Brown and H. T. Davis. Receiver operating characteristics curves and related decision measures: A tutorial. Chemometrics and Intelligent Laboratory Systems, 80(1):24–38, 2006.

E. C. Chi and T. G. Kolda. On tensors, sparsity, and nonnegative factorizations. SIAM Journal on Matrix Analysis and Applications, 33(4):1272–1299, 2012.

H. Chu and M. Rosenthal. Search engines for the world wide web: A comparative study and evaluation methodology. In American Society for Information Science, volume 33, pages 127–135, 1996.

S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. JASIS, 41(6):391–407, 1990.

W. Ding and G. Marchionini. A comparative study of web search service performance. In ASIS Annual Meeting, volume 33, pages 136–42. ERIC, 1996.

E. Enge, S. Spencer, J. Stricchiola, and R. Fishkin. The art of SEO. O’Reilly, 2012.

Federal Communications Commission. Editorializing by broadcast licensees. Washington, DC: GPO, 1949.

S. Gauch and G. Wang. Information fusion with profusion. In 1st World Conference of the Web Society, 1996.

Z. Guan and E. Cutrell. An eye tracking study of the effect of target rank on web search. In SIGCHI conference on Human factors in computing systems, pages 417–420. ACM, 2007.

A. Gulli and A. Signorini. The indexable web is more than 11.5 billion pages. In 14th international conference on World Wide Web, pages 902–903. ACM, 2005.

A. Hannak, P. Sapiezynski, A. Molavi Kakhki, B. Krishnamurthy, D. Lazer, A. Mislove, and C. Wilson. Measuring personalization of web search. In 22nd international conference on World Wide Web, pages 527–538. ACM, 2013.

R. A. Harshman. Foundations of the parafac procedure: models and conditions for an" explanatory" multimodal factor analysis. Technical report, UCLA, 1970.

J. Håstad. Tensor rank is np-complete. Journal of Algorithms, 11(4):644–654, 1990.

U. Kang, E. Papalexakis, A. Harpale, and C. Faloutsos. Gigatensor: scaling tensor analysis up by 100 times algorithms and discoveries. In 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 316–324. ACM, 2012.

T. G. Kolda and B. W. Bader. Tensor decompositions and applications. SIAM review, 51(3):455–500, 2009.

F. W. Lancaster and E. G. Fayen. Information Retrieval On-Line. Melville Publishing Co., 1973.

S. Lawrence and C. L. Giles. Searching the world wide web. Science, 280(5360):98–100, 1998.

S. H. Lee, S. J. Kim, and S. H. Hong. On URL normalization. In Computational Science and Its Applications–ICCSA 2005, pages 1076–1085. Springer, 2005.

T. Lei, R. Cai, J.-M. Yang, Y. Ke, X. Fan, and L. Zhang. A pattern tree-based approach to learning URL normalization rules. In 19th international conference on World Wide Web, pages 611–620. ACM, 2010.

D. Lewandowski. Web search engine research. Emerald Group Publishing, 2012.

V. Maltese, F. Giunchiglia, K. Denecke, P. Lewis, C. Wallner, A. Baldry, and D. Madalli. On the interdisciplinary foundations of diversity. University of Trento, 2009.

M.-C. Marcos and C. González-Caro. Comportamiento de los usuarios en la página de resultados de los buscadores. un estudio basado en eye tracking. El profesional de la información, 19(4):348–358, 2010.

M. Mørup and L. K. Hansen. Automatic relevance determination for multi-way models. Journal of Chemometrics, 23(7-8):352–363, 2009.

E. E. Papalexakis, U. Kang, C. Faloutsos, N. D. Sidiropoulos, and A. Harpale. Large scale tensor decompositions: Algorithmic developments and applications. IEEE Data Eng. Bull., 36(3):59–66, 2013.

A. Pirkola. The effectiveness of web search engines to index new sites from different countries. Information Research: An International Electronic Journal, 14(2), 2009.

K. Purcell, J. Brenner, and L. Rainie. Search engine use 2012. Pew Internet & American Life Project, 2012.

E. Selberg and O. Etzioni. Multi-service search and comparison using the metacrawler. In 4th international conference on World Wide Web, 1995.

N. Sidiropoulos, E. E. Papalexakis, and C. Faloutsos. Parallel randomly compressed cubes: A scalable distributed architecture for big tensor decomposition. Signal Processing Magazine, IEEE, 31(5):57–70, 2014.

A. Spink, B. J. Jansen, C. Blakely, and S. Koshman. A study of results overlap and uniqueness among major web search engines. Information Processing & Management, 42(5):1379–1391, 2006.

A. Spink, B. J. Jansen, and C. Wang. Comparison of major web search engine overlap: 2005 and 2007. In 14th Australasian World Wide Web Conference, 2008.

N. J. Stroud and A. Muddiman. Exposure to news and diverse views in the internet age. ISJLP, 8:605, 2012.

J. Teevan, D. Ramage, and M. R. Morris. # twittersearch: a comparison of microblog search and web search. In 4th ACM international conference on Web search and data mining, pages 35–44. ACM, 2011.

D. Wilkinson and M. Thelwall. Search markets and search results: The case of Bing. Library & Information Science Research, 35(4):318–325, 2013.

Downloads

Published

2016-05-10

Issue

Section

Articles