This chapter explains importing Wikipedia XML database dump and parse the data during import(On ingest data tagging).
Wikipedia provides its complete data content embedded in XML format. Such xml dumps can be imported to QueryIO and parsed. QueryIO provides sample program to import xml dump and a data tag parser(WikiTextParser.jar) to parse data during import. Dta tag parser jar is available at $INSTALL_HOME/demo/WikiTextParser.jar
QueryIO provides inbuilt data tag parsers for wikipedia xml dumps.
You can find sample .wiki files in $INSTALL_HOME/demo/Data.zip. Upload these files and use Query Designer to query the metadata extracted using this parser.