|
for some URLs the data throws an exception, for example: http://en.wikipedia.org/wiki/Horse (I don't like snakes) File "/home/drace/dev/NLUlite/client_python/NLUlite.py", line 375, in add_url
parser.feed(page)
File "/usr/lib/python2.7/HTMLParser.py", line 114, in feed
self.goahead(0)
File "/usr/lib/python2.7/HTMLParser.py", line 158, in goahead
k = self.parse_starttag(i)
File "/usr/lib/python2.7/HTMLParser.py", line 305, in parse_starttag
attrvalue = self.unescape(attrvalue)
File "/usr/lib/python2.7/HTMLParser.py", line 472, in unescape
return re.sub(r"&(#?[xX]?(?:[0-9a-fA-F]+|\w{1,8}));", replaceEntities, s)
File "/usr/lib/python2.7/re.py", line 151, in sub
return _compile(pattern, flags).sub(repl, string, count)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 8: ordinal not in range(128)It's also really slow at learning. I have a ton of everything, cores, memory etc and it takes minutes to process web pages. I guess you do say that on the website that the free version is slow. |