Hacker News new | ask | show | jobs
New tool from curl creator – trurl – for URL parsing and manipulation (daniel.haxx.se)
140 points by michalg82 1169 days ago
8 comments

Might be more useful to link to the blog post: https://daniel.haxx.se/blog/2023/04/03/introducing-trurl/
Previously posted:

https://news.ycombinator.com/item?id=35419573 - 22 points, 2 comments

https://news.ycombinator.com/item?id=35426050 - 5 points, 0 comments

Ok, we've changed the URL to that from https://github.com/curl/trurl. Thanks!
https://github.com/curl/trurl

Apparently, it requires curl 7.81 to compile, but I'm on ubuntu 20 (7.6x) and wanted to do a quick try, so this patch makes it work (trivially) by removing the couple of new symbols depending on that newer version. Just a quick hack to compile.

   --- a/trurl.c
   +++ b/trurl.c
   @@ -362,11 +362,12 @@ static void get(struct option *op, CURLU *uh)
         case CURLUE_NO_PORT:
         case CURLUE_NO_QUERY:
         case CURLUE_NO_FRAGMENT:
   -            case CURLUE_NO_ZONEID:
   +            // would require 7.81
   +            // case CURLUE_NO_ZONEID:
           /\* silently ignore */
           break;
         default:
   -              fprintf(stderr, PROGNAME ": %s (%s)\n", curl_url_strerror(rc),
   +              fprintf(stderr, PROGNAME ": %s (%s)\n", rc,
            variables[i].name);
           break;
         }
   @@ -589,8 +590,8 @@ static void singleurl(struct option *o,
           CURLU_GUESS_SCHEME|CURLU_NON_SUPPORT_SCHEME);
          if(rc) {
     if(o->verify)
   -          errorf(ERROR_BADURL, "%s [%s]", curl_url_strerror(rc), url);
   -        warnf("%s [%s]", curl_url_strerror(rc), url);
   +          errorf(ERROR_BADURL, "%s [%s]", rc, url);
   +        warnf("%s [%s]", rc, url);
          }
          else {
     if(o->redirect)
that an ... interesting ... decsion to put inputs before -- and commands/options after --

Most of the tools assume opposite default (like xargs for example). Sure, you can specify a different order but why make life harder for no reason?

Usage: " PROGNAME " [options] [URL]

But you can put options wherever you want, they are identified by the leading '-' char.

Oh, c'mon, “purl” was right there!
Given what it does, and to avoid confusion with perl, "transform url" seems more reasonable. Though I probably would have gone with "turl" so it's easier to pronounce.
It seems like a nod to the “tr” command.
Yup, I guessed the same, and indeed it explicitly states so in the blog post linked by another commenter.

> trurl is a tool in a similar spirit of tr but for URLs. Here, tr stands for translate or transpose.

Though it hardly gets any use because s/// is so much more flexible, Perl also has a tr/// builtin that replicates the behavior of the command line tool.
I wonder if the name is a Lem reference: https://en.wikipedia.org/wiki/The_Cyberiad
Pardone my ignorance, serious question: Why is this a big deal? I'm probably underestimting the work needed, but it doesn't look like a hard thing to write. What am I missing?
Over the years, curl itself has had 9 CVEs relating to handling URLs [0] so this is most definitely not a trivial piece of code to write. The basic case is easy, yes. Getting everything in the spec right and then some is hard.

[0] https://curl.se/docs/security.html

And, now I'm scared of establishing a TLS connection with untrusted servers after reading CVE-2021-22901 from that page. Remote code execution from an adversarial *server*. I can understand an adversarial client, but that just expanded the things of which I'm wary.
I had to implement a routine involving URL parsing in a library that is supposed to be behavior-identical across implementations in multiple languages. That was fun.
is it somehow better than the stuff available in mainstream languages (go net/url, Python yarl, etc)?

Or is it just that this is focused on command-line usage

I will be using this at work tomorrow.

I currently parse out this stuff using a flaky little bit of python I cooked up myself and it gives me no end of grief when scaled. So many awful edge cases.

The author wrote curl, so I know it's going to do what it says it does well.

That's why it's getting love. It's a rock star dev putting out open source code many of us will absolutely be using regularly.

Way classier than yet another "product" built on the OpenAI API that will be gone in a year.

Why not use urllib.parse?
Hah, I may well do! I think the other thing here is me not realising I needed something till I saw it :-)
From the blog post introducing trurl:

> URLs are tricky to parse and there are numerous security problems in software because of this. trurl wants to help soften this problem by taking away the need for script and command line authors everywhere to re-invent the wheel over and over.

When I was building my CI jobs at $job I needed url manipulation in shell. Had to use python inline, but it was long and ugly... trurl simplified it a bit.
Pretty dang cool, but I might have missed a feature: can trurl do multiple manipulations on a single url, as opposed to a single manipulation on multiple urls (which the blog post says is supported)?

For example, you want to apply a sequence of predefined normalization steps, such as removing the user part, converting http to https, etc etc. You put these steps in a file and then invoke trurl with that file and pointed at your url or urls. Very much like what you would do in sed, say. Possible?

I think it can do what you're suggesting, if I'm understanding correctly:

> $ trurl --url https://example.com --set path=/abc/123 --set scheme=ftp --set host=foo.bar

> $ ftp://foo.bar/abc/123

The readme should include a pronunciation guide for the name.

tru-rul? tee-ru-rul?

Trurl is the name of a character in the science fiction short story collection "The Cyberiad" by Polish author Stanisław Lem. Trurl is a highly intelligent robot and inventor who, along with his friend Klapaucius, goes on various adventures throughout the universe. — via CGPT
A great book that prefigured the kinds of interactions people are now having with ChatGPT. For example:

“Have it compose a poem — a poem about a haircut! But lofty, noble, tragic, timeless, full of love, treachery, retribution, quiet heroism in the face of certain doom! Six lines, cleverly rhymed, and every word beginning with the letter S!!”

“Seduced, shaggy Samson snored.

She scissored short. Sorely shorn,

Soon shackled slave, Samson sighed,

Silently scheming,

Sightlessly seeking

Some savage, spectacular suicide.”

I had a feeling that ChatGPT would have trouble with some aspects of a request that specific, but here's what it gave me:

She sat with scissors, ready to snip,

Slicing strands of hair, curl by curl, clip by clip,

Shears shimmering, sharp as swords in sunlight,

Swiftly, skillfully, she shortened the sight,

Saddened by strands strewn on the floor,

She stood, serene, satisfied, the style was no more.

Clearly nowhere as good, and it couldn't get all the words to start with an 'S', even with a few attempts to correct it.

Still, some software should succeed someday, sans skipped S's. Substituting some synonyms, surely.

Stanisław Lem’s "Cyberiad" is a great book that predicted machine learning and various modern technologies, although credit for this particular poem should probably go to Michael Kandel for his brilliant English translation. It's very different in other languages:

https://mwichary.medium.com/seduced-shaggy-samson-snored-725...

Maybe "tee" "arr" url? I'm guessing the "tr" is for https://en.wikipedia.org/wiki/Tr_(Unix)
> We say "trurel". As if there was an 'e' between the r and l at the end.

From https://curl.se/trurl/

You pronnounce it like curl, but changing how you pronounce "c" by how you pronounce "tr".
where did the extra syllable appear from? like tr -ee and -c url trurl.
I would say, "rhymes with rural"
Only two unit tests?
I’m curious about why people dislike this question. This is a specialized library; it should be testing everything including crazy edge cases that most of us haven’t even considered. Also, someone stated that there have been several URL-related security vulnerabilities in curl.