Wikipedia XML Database Dump : Data Tag Parser

In this chapter

This chapter explains importing Wikipedia XML database dump and parse the data during import(On ingest data tagging).

What is Wikipedia XML Database Dump

Wikipedia provides its complete data content embedded in XML format. Such xml dumps can be imported to QueryIO and parsed. QueryIO provides sample program to import xml dump and a data tag parser(WikiTextParser.jar) to parse data during import. Dta tag parser jar is available at $INSTALL_HOME/demo/WikiTextParser.jar

Registering Wikipedia Data Tag Parser

QueryIO provides inbuilt data tag parsers for wikipedia xml dumps.

You can find sample .wiki files in $INSTALL_HOME/demo/Data.zip. Upload these files and use Query Designer to query the metadata extracted using this parser.



Copyright © 2018 QueryIO Corporation. All Rights Reserved.

QueryIO, "Big Data Intelligence" and the QueryIO Logo are trademarks of QueryIO Corporation. Apache, Hadoop and HDFS are trademarks of The Apache Software Foundation.