New tool from curl creator – trurl – for URL parsing and manipulation

Y	Hacker News new \| ask \| show \| jobs

	New tool from curl creator – trurl – for URL parsing and manipulation (daniel.haxx.se)
	140 points by michalg82 1169 days ago

8 comments

Groxx 1169 days ago

Might be more useful to link to the blog post: https://daniel.haxx.se/blog/2023/04/03/introducing-trurl/

link

kencausey 1169 days ago

Previously posted:

https://news.ycombinator.com/item?id=35419573 - 22 points, 2 comments

https://news.ycombinator.com/item?id=35426050 - 5 points, 0 comments

link

dang 1169 days ago

Ok, we've changed the URL to that from https://github.com/curl/trurl. Thanks!

link

gadrev 1169 days ago

https://github.com/curl/trurl

Apparently, it requires curl 7.81 to compile, but I'm on ubuntu 20 (7.6x) and wanted to do a quick try, so this patch makes it work (trivially) by removing the couple of new symbols depending on that newer version. Just a quick hack to compile.

   --- a/trurl.c
   +++ b/trurl.c
   @@ -362,11 +362,12 @@ static void get(struct option *op, CURLU *uh)
         case CURLUE_NO_PORT:
         case CURLUE_NO_QUERY:
         case CURLUE_NO_FRAGMENT:
   -            case CURLUE_NO_ZONEID:
   +            // would require 7.81
   +            // case CURLUE_NO_ZONEID:
           /\* silently ignore */
           break;
         default:
   -              fprintf(stderr, PROGNAME ": %s (%s)\n", curl_url_strerror(rc),
   +              fprintf(stderr, PROGNAME ": %s (%s)\n", rc,
            variables[i].name);
           break;
         }
   @@ -589,8 +590,8 @@ static void singleurl(struct option *o,
           CURLU_GUESS_SCHEME|CURLU_NON_SUPPORT_SCHEME);
          if(rc) {
     if(o->verify)
   -          errorf(ERROR_BADURL, "%s [%s]", curl_url_strerror(rc), url);
   -        warnf("%s [%s]", curl_url_strerror(rc), url);
   +          errorf(ERROR_BADURL, "%s [%s]", rc, url);
   +        warnf("%s [%s]", rc, url);
          }
          else {
     if(o->redirect)

link

theamk 1169 days ago

that an ... interesting ... decsion to put inputs before -- and commands/options after --

Most of the tools assume opposite default (like xargs for example). Sure, you can specify a different order but why make life harder for no reason?

link

ftrobro 1169 days ago

Usage: " PROGNAME " [options] [URL]

But you can put options wherever you want, they are identified by the leading '-' char.

link

obelos 1169 days ago

Oh, c'mon, “purl” was right there!

link

Izkata 1169 days ago

Given what it does, and to avoid confusion with perl, "transform url" seems more reasonable. Though I probably would have gone with "turl" so it's easier to pronounce.

link

enneff 1169 days ago

It seems like a nod to the “tr” command.

link

jcul 1169 days ago

Yup, I guessed the same, and indeed it explicitly states so in the blog post linked by another commenter.

> trurl is a tool in a similar spirit of tr but for URLs. Here, tr stands for translate or transpose.

link

obelos 1169 days ago

Though it hardly gets any use because s/// is so much more flexible, Perl also has a tr/// builtin that replicates the behavior of the command line tool.

link

sidpatil 1169 days ago

That would collide with this: https://en.m.wikipedia.org/wiki/Persistent_uniform_resource_...

link

twic 1169 days ago

I wonder if the name is a Lem reference: https://en.wikipedia.org/wiki/The_Cyberiad

link

Goofy_Coyote 1169 days ago

Pardone my ignorance, serious question: Why is this a big deal? I'm probably underestimting the work needed, but it doesn't look like a hard thing to write. What am I missing?

link

Etheryte 1169 days ago

Over the years, curl itself has had 9 CVEs relating to handling URLs [0] so this is most definitely not a trivial piece of code to write. The basic case is easy, yes. Getting everything in the spec right and then some is hard.

[0] https://curl.se/docs/security.html

link

gcoakes 1168 days ago

And, now I'm scared of establishing a TLS connection with untrusted servers after reading CVE-2021-22901 from that page. Remote code execution from an adversarial *server*. I can understand an adversarial client, but that just expanded the things of which I'm wary.

link

organsnyder 1169 days ago

I had to implement a routine involving URL parsing in a library that is supposed to be behavior-identical across implementations in multiple languages. That was fun.

link

skrtskrt 1169 days ago

is it somehow better than the stuff available in mainstream languages (go net/url, Python yarl, etc)?

Or is it just that this is focused on command-line usage

link

specproc 1169 days ago

I will be using this at work tomorrow.

I currently parse out this stuff using a flaky little bit of python I cooked up myself and it gives me no end of grief when scaled. So many awful edge cases.

The author wrote curl, so I know it's going to do what it says it does well.

That's why it's getting love. It's a rock star dev putting out open source code many of us will absolutely be using regularly.

Way classier than yet another "product" built on the OpenAI API that will be gone in a year.

link

scheme271 1169 days ago

Why not use urllib.parse?

link

specproc 1169 days ago

Hah, I may well do! I think the other thing here is me not realising I needed something till I saw it :-)

link

nerdponx 1169 days ago

Better yet, https://pypi.org/project/yarl, or even better still, https://pypi.org/project/hyperlink .

link

lultimouomo 1169 days ago

From the blog post introducing trurl:

> URLs are tricky to parse and there are numerous security problems in software because of this. trurl wants to help soften this problem by taking away the need for script and command line authors everywhere to re-invent the wheel over and over.

link

sashk 1169 days ago

When I was building my CI jobs at $job I needed url manipulation in shell. Had to use python inline, but it was long and ugly... trurl simplified it a bit.

link

kjellsbells 1169 days ago

Pretty dang cool, but I might have missed a feature: can trurl do multiple manipulations on a single url, as opposed to a single manipulation on multiple urls (which the blog post says is supported)?

For example, you want to apply a sequence of predefined normalization steps, such as removing the user part, converting http to https, etc etc. You put these steps in a file and then invoke trurl with that file and pointed at your url or urls. Very much like what you would do in sed, say. Possible?

link

skibz 1169 days ago

I think it can do what you're suggesting, if I'm understanding correctly:

> $ trurl --url https://example.com --set path=/abc/123 --set scheme=ftp --set host=foo.bar

> $ ftp://foo.bar/abc/123

link

neuracnu 1169 days ago

The readme should include a pronunciation guide for the name.

tru-rul? tee-ru-rul?

link

marban 1169 days ago

Trurl is the name of a character in the science fiction short story collection "The Cyberiad" by Polish author Stanisław Lem. Trurl is a highly intelligent robot and inventor who, along with his friend Klapaucius, goes on various adventures throughout the universe. — via CGPT

link

myhf 1169 days ago

A great book that prefigured the kinds of interactions people are now having with ChatGPT. For example:

“Have it compose a poem — a poem about a haircut! But lofty, noble, tragic, timeless, full of love, treachery, retribution, quiet heroism in the face of certain doom! Six lines, cleverly rhymed, and every word beginning with the letter S!!”

“Seduced, shaggy Samson snored.

She scissored short. Sorely shorn,

Soon shackled slave, Samson sighed,

Silently scheming,

Sightlessly seeking

Some savage, spectacular suicide.”

link

F2hP18Foam 1169 days ago

I had a feeling that ChatGPT would have trouble with some aspects of a request that specific, but here's what it gave me:

She sat with scissors, ready to snip,

Slicing strands of hair, curl by curl, clip by clip,

Shears shimmering, sharp as swords in sunlight,

Swiftly, skillfully, she shortened the sight,

Saddened by strands strewn on the floor,

She stood, serene, satisfied, the style was no more.

Clearly nowhere as good, and it couldn't get all the words to start with an 'S', even with a few attempts to correct it.

Still, some software should succeed someday, sans skipped S's. Substituting some synonyms, surely.

link

omoikane 1169 days ago

Stanisław Lem’s "Cyberiad" is a great book that predicted machine learning and various modern technologies, although credit for this particular poem should probably go to Michael Kandel for his brilliant English translation. It's very different in other languages:

https://mwichary.medium.com/seduced-shaggy-samson-snored-725...

link

cadizm 1169 days ago

Maybe "tee" "arr" url? I'm guessing the "tr" is for https://en.wikipedia.org/wiki/Tr_(Unix)

link

jayrhynas 1160 days ago

> We say "trurel". As if there was an 'e' between the r and l at the end.

From https://curl.se/trurl/

link

otikik 1169 days ago

You pronnounce it like curl, but changing how you pronounce "c" by how you pronounce "tr".

link

b4ke 1169 days ago

where did the extra syllable appear from? like tr -ee and -c url trurl.

link

pimlottc 1169 days ago

I would say, "rhymes with rural"

Only two unit tests?

I’m curious about why people dislike this question. This is a specialized library; it should be testing everything including crazy edge cases that most of us haven’t even considered. Also, someone stated that there have been several URL-related security vulnerabilities in curl.

link