Multi-Cultural Interlinking of Web Taxonomies with ACROSS


  • Natalia Boldyrev Max Planck Institute for Informatics, Saarland Informatics Campus
  • Marc Spaniol Universite de Caen Normandie, Caen
  • Gerhard Weikum Max Planck Institute for Informatics, Saarland Informatics Campus



The Web hosts a huge variety of multi-cultural taxonomies. They encompass product catalogs of e-commerce, general-purpose knowledge bases and numerous domain-specific category systems. The enormous heterogeneity of those sources is a challenging aspect when multiple taxonomies have to be interlinked. In this paper we introduce ACROSS system to support the alignment of independently created Web taxonomies. For mapping categories across different taxonomies, ACROSS harnesses instance-level features as well as distant supervision from an intermediate source like multiple Wikipedia editions. ACROSS includes a reasoning step, which is based on combinatorial optimization. In order to reduce the run time of the reasoning procedure without sacrificing the quality, we study two models of user involvement. Our experiments with heterogeneous taxonomies for different domains demonstrate the viability of our approach and improvement over state-of-the-art baselines.


R. Agrawal and R. Srikant. On Integrating Catalogs. In Proc. of WWW, pages 603-612, 2001.

S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. G. Ives. DBpedia: A Nucleus for a Web of Open Data. In Proc. of ISWC, pages 722-735, 2007.

N. Boldyrev, M. Spaniol, G. Weikum. ACROSS: A Framework for Multi-Cultural Interlinking of Web Taxonomies. In Proc. of WebSci, pages 127-136, 2016.

K. D. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a Collaboratively Created Graph Database for Structuring Human Knowledge. In Proc. of SIGMOD, pages 1247-1250, 2008.

P. Bouquet, L. Serafini, and S. Zanobini. Semantic Coordination: a New Approach and an Application. In Proc. of ISWC, volume 2870 of LNCS, pages 130-145, 2003.

A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. Hruschka, and T. M. Mitchell. Toward an Architecture for Never-Ending Language Learning. In Proc. of AAAI, pages 1306-1313, 2010.

O. Chapelle, B. Schölkopf, A. Zien (Editors). Semi-Supervised Learning. MIT Press, Cambridge, 2006.

W. W. Cohen, P. Ravikumar, and S. E. Fienberg. A Comparison of String Distance Metrics for Name-Matching Tasks. In Proc. of IIWEB, pages 73-78, 2003.

A. Das Sarma, L. Dong, and A. Halevy. Bootstrapping Pay-As-You-Go Data Integration Systems. In Proc. of SIGMOD, pages 861-874, 2008.

A. Doan, A. Halevy, and Z. G. Ives. Principles of Data Integration. Morgan Kaufmann, 2012.

X. Dong, A. Halevy, and C. Yu. Data Integration with Uncertainty. In Proc. of VLDB, pages 687-698, 2007.

X. Dong, A. Halevy, and J. Madhavan. Reference Reconciliation in Complex Information Spaces. In Proc. of SIGMOD, pages 85-96, 2005.

F. Duchateau, R. Coletta, Z. Bellahsene, and R. Miller. (not) Yet Another Matcher. In Proc. CIKM, pages 1537-1540, 2009.

J. Euzenat. First experiments in cultural alignment repair. In Proc. of WoDOOM, pages 3-14, 2014.

J. Euzenat and P. Shvaiko. Ontology Matching, Second Edition. Springer, 2013.

A. Fader, S. Soderland, and O. Etzioni. Identifying Relations for Open Information Extraction. In Proc. of EMNLP, pages 1535-1545, 2011.

C. Fellbaum (Editor). WordNet: An Electronic Lexical Database. MIT Press, 1998

J. L. Fleiss. Measuring Nominal Scale Agreement among Many Raters. In Psychological Bulletin, volume 76(5), pages 378-382, 1971.

F. Giunchiglia, P. Shvaiko and M. Yatskevich. S-Match: an Algorithm and an Implementation of Semantic Matching. In ESWS, pages 61-75, 2004.

J. Gobolos-Szabo, N. Prytkova, M. Spaniol, and G. Weikum. Cross-Lingual Data Quality for Knowledge Base Acceleration Across Wikipedia Editions. In Proc. of QDB, pages 1-7, 2012.

J. Gracia, E. Montiel-Ponsoda, P. Cimiano, A. Gomez-Perez, P. Buitelaar, and J. McCrae. Challenges for the Multilingual Web of Data. Web Semantics: Science, Services and Agents on the World Wide Web, 11(0):63 - 71, 2012.

A. Halevy, A. Rajaraman, and J. Ordille. Data Integration: the Teenage Years. In Proc. of VLDB, pages 9-16, 2006.

B. He and K. Chang. Statistical Schema Matching across Web Query Interfaces. In Proc. of SIGMOD, pages 217-228, 2003.

T. Heath and C. Bizer. Linked Data: Evolving the Web into a Global Data Space. Synthesis Lectures on the Semantic Web, Morgan & Claypool Publishers, 2011.

S. Hertling and H. Paulheim. Wikimatch - Using Wikipedia for Ontology Matching. In Proc. of OM, pages 37-48, 2012.

R. Ichise, H. Takeda, and S. Honiden. Integrating Multiple Internet Directories by Instance-based Learning. In Proc. of IJCAI, pages 22-28, 2003.

E. Jiménez-Ruiz and B. Cuenca Grau. LogMap: Logic-Based and Scalable Ontology Matching. In Proc. of ISWC, volume 7031, pages 273-288, 2011.

P. Kingsbury and M. Palmer. From TreeBank to PropBank. In Proc. of LREC, 2002.

J. R. Landis and G. G. Koch. The Measurement of Observer Agreement for Categorical Data. In Biometrics, volume 33, pages 159-174, 1977.

F. Lin and W. W. Cohen. Semi-Supervised Classification of Network Data Using Very Few Labels. In ASONAM, pages 192-199, 2010.

C. Meilicke. Alignment Incoherence in Ontology Matching. PhD Thesis, University of Mannheim, 2011.

N. Nakashole, G. Weikum, and F. M. Suchanek. PATTY: A Taxonomy of Relational Patterns with Semantic Types. In Proc. of EMNLP, pages 1135-1145, 2012.

R. Navigli and S. P. Ponzetto. BabelNet: The Automatic Construction, Evaluation and Application of a Wide-coverage Multilingual Semantic Network.In Artificial Intelligence, volume 193, pages 217-250, 2012.

T. Nguyen, V. Moreira, H. Nguyen, H. Nguyen, and J. Freire. Multilingual Schema Matching for Wikipedia Infoboxes. In Proc. of VLDB} pages 133 - 144, 2011.

M. Palmer, D. Gildea, and P. Kingsbury. The Proposition Bank: A Corpus Annotated with Semantic Roles. In Computational Linguistics, volume 31(1), pages 71-106, 2005.

H. Paulheim. Wesee-Match Results for OEAI 2012. In Proc. of OM, 2012.

A. Singhal. Introducing the Knowledge Graph: Things, Not String. Official Blog of Google. Retrieved May 18, 2012.

A. Solimando, E. Jiménez-Ruiz, and G. Guerrini. Detecting and Correcting Conservativity Principle Violations in Ontology-to-Ontology Mappings. In Proc. of ISWC, pages 1-16, 2014.

M. Spaniol, N. Prytkova, and G. Weikum. Knowledge Linking for Online Statistics. In Proc. of WSC, 2013.

D. Spohr, L. Hollink, and P. Cimiano. A Machine Learning Approach to Multilingual asnd Cross-Lingual Ontology Matching. In Proc. of ISWC, part 1, pages 665-680, 2011.

S. Staab and R. Studer. Handbook on Ontologies, 2nd ed. Springer, 2009.

F. M. Suchanek, S. Abiteboul, and P. Senellart. Paris: Probabilistic Alignment of Relations, Instances, and Schema. In Proc. of VLDB, pages 157-168, 2011.

F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: A Core of Semantic Knowledge. In Proc. of WWW, pages 697-706, 2007.

P. P. Talukdar and K. Crammer. New Regularized Algorithm for Transductive Learning. In Proc. of KDD, pages 442-457, 2009.

O. Udrea, L. Getoor, and R. J. Miller. Leveraging Data and Structure in Ontology Integration. In Proc. of SIGMOD, pages 449-460, 2007.

M. L. Wick, K. Rohanimanesh, A. McCallum, and A. Doan. A Discriminative Approach to Ontology Mapping. In Proc. of NTII, pages 16-19, 2008.

M. Yahya, S. Whang, R. Gupta, and A. Halevy. ReNoun: Fact Extraction for Nominal Attributes. In Proc. of EMNLP, pages 325-335, 2014.

Additional Files