| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by PaulHoule 2139 days ago

I parse big XML and similarly structured files, convert them into RDF, puff them up into a (still RDF but with a lot of blank nodes) hypergraph so I can load the content into a single database and be able to trace that these two facts are related and come from this part of document A and that part of document B.

I have document parsing and SPARQL queries that can take a few minutes that I'd like to run frequently so I can keep all parts of the system up to date.

I've only benchmarked it a bit, but I found I got approximately the five times speed-up that PyPy promised. This is with PyPy based on Python 3.6. I think PyPy is switching to cffi as the way to connect to C code so most native code "just works" now.

I had to backport my code from Python 3.8; Python 3.6 lacks contextvars, but there is a polyfill for that, otherwise there was no problem.

I stayed away from PyPy for a long time because it was tied to Python 3.5 which was busted in various ways. One of those was that the filesystem path objects were half-implemented, you should have been able to pass them into anything from the stdlib that expected a string path and at that time you couldn't. Little accidents like that can slow down a technology like PyPy from being adopted.

1 comments

rciorba 2139 days ago

> I think PyPy is switching to cffi as the way to connect to C code so most native code "just works" now.

As far as I know extensions need to be written for cffi specifically.

cffi is a newer way of writing C extensions, developed by the PyPy project. It was designed to have a smaller&cleaner interface to let you call C code from Python. Here's Armin Rigo talking about it at EuroPython: https://www.youtube.com/watch?v=ejUzVcvTLgI

The CPython way of writing extensions is documented here: https://docs.python.org/3/extending/extending.html It seems to require you to deal with the internals of the CPython interpreter (deal with PyObject structs, reference counting, etc).

I know PyPy has some support for CPython extensions, but it has to emulate some internals and it's slower as a result.