|
|
|
|
|
by zepearl
2652 days ago
|
|
I used Python to load the contents of the articles into a DB (potentially wrong extract of veeery old code - I have something like 20 different versions lying around therefore I'm not 100% sure that this did work well): === import xml.dom.pulldom as pulldom
from lxml import etree
from xml.etree import ElementTree as ET
sInputFileName = "/my/input/wiki_file.xml"
context = etree.iterparse(sInputFileName, events=('end',), tag='doc')
for event, elem in context:
iThisArticleCharLength = len(elem.text)
sPageURL = elem.get("url")[0:4000]
sPageTitle = elem.get("title")[0:4000]
SPageContents = elem.text
<do what you want with these vars...>
=== |
|