Hacker News new | ask | show | jobs
by chaxor 1132 days ago
This is a decent way to explain a good use case - exploration of new priors and ideas. For example, I was looking for a good way to do streaming xml parsing in the simplest way possible. One way is to do it via various packages in different languages and write out all the functions to pause etc. Another option (apparently a language, but domain specific) is XSLT. I already knew these things, but by using GPT-4 I found out how to hit a very long list of requirements with just one small XSLTv3 script and a one-liner in bash, as opposed to several hundred lines of python, Julia, or Rust.

Specifically though, GPT pointed me to Xalan (which I had never heard of, and would not have ever seriously considered by looking around for XSLT parsers - because I didn't realize it was the only option for stream parsing xml via XSLT v3. It was able to tell me this a lot more directly and offer why it was the best option to go with for my list of requirements. Even writing python scripts along with XSLT, etc wouldn't make the stream parsing possible because any package available couldn't handle v3.

So yeah, it is quite useful for exploring programming design given a huge list of requirements. But you have to explicitly ask, because it can't read minds and will just choose some design if not specified (it can't read minds).

1 comments

To parse xml without loading it into memory, one interesting element at a time in Python:

  import xml.etree.cElementTree as etree

  def getelements(filename_or_file, tag):
    context = iter(etree.iterparse(filename_or_file, events=('start', 'end')))
    _, root = next(context) # get root element
    for event, elem in context:
        if event == 'end' and elem.tag == tag:
            yield elem
            root.clear() # preserve memory
https://stackoverflow.com/questions/7697710/python-running-o...

The usage is simple: getelements() generates the desired elements one by one. Found using google search for "xml memory iterparse"

Yeah, that's an option I'm aware of, but it gets quite ugly when the rules are more than just a few.