Different Topic, Different Traffic: How Search and Navigation Interplay on Wikipedia



Published Jun 25, 2019
  • Dimitar Dimitrov
  • Florian Lemmerich
  • Fabian Flöck
  • Markus Strohmaier


As one of the richest sources of encyclopedic information on the Web, Wikipedia offers large-scale article access data that allows us to compare articles with respect to the two main paradigms of information seeking, i.e., search by formulating a query, and navigation by following hyperlinks. Using such data from the English Wikipedia, we study access behavior by employing two main metrics, namely (i) searchshare -- the relative amount of views an article received by search --, and (ii) resistance -- the ability of an article to relay traffic to other Wikipedia articles -- to characterize articles. We demonstrate how articles in distinct topical categories differ substantially in terms of these properties. For example, architecture-related articles are often accessed through search and are simultaneously a ``dead end'' for traffic, whereas historical articles about military events are mainly navigated. We further link traffic differences to varying network, content, and editing activity features. Lastly, we measure the impact of the article properties by modeling access behavior on articles with a gradient boosting approach and explore explicit importance of individual features. Our results constitute a step towards understanding human information seeking behavior, and may contribute to identify focal points for future improvements of Wikipedia and similar systems.

