| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by clausecker 215 days ago

You can do it like this, assuming A is the mask of newlines and B is the mask of non-spaces.

1. Compute M1 = ~A & ~B, which is the mask of all spaces that are not newlines 2. Compute M2 = M1 + (A << 1) + 1, which is the first non-space or newline after each newline and then additional bits behind each such newline. 3. Compute M3 = M2 & ~M1, which removes the junk bits, leaving only the first match in each section

Here is what it looks like:

    10010000 = A
    01100110 = B
    00001001 = M1 = ~A & ~B
    00101010 = M2 = M1 + (A << 1) + 1
    00100010 = M3 = M2 & ~M1

Note that this code treats newlines as non-spaces, meaning if a line comprises only spaces, the terminating NL character is returned. You can have it treat newlines as spaces (meaning a line of all spaces is not a match) by computing M4 = M3 & ~A.

1 comments

zokrezyl 214 days ago

Thanks for the suggestion. Indeed, that would be the first approach. That is how I started. This is however not considering the state (am I inside a 'statement' or not)

statement meaning string from first non-space till next EOL or EOF.

Problem starts when you need to cover the "corner cases". Without the corner cases the algo is not algo.

link

zokrezyl 214 days ago

Do you mind trying out your solution? The code is in https://github.com/zokrezyl/yaal-cpp-poc Thanks a lot!

Obviously if your solutions gets closed to the memory bandwith limit, we will proudly mention it!

link

clausecker 213 days ago

So I've thought about it and I don't really feel like spending more time to convince you that this works. If you have questions I am happy to answer them, but please write your own code.

link

zokrezyl 212 days ago

It's fine and thank you! I am playing arround with the idea, in theory all is good.. Only thing is that things like "first non ..." often involve branching that corrupts the prediction ability of the CPU. Therefore I kindly invited you to show it in code.

link

clausecker 211 days ago

You can find the first set bit in an integer with a machine instruction, it's completely branch free. gcc has __builtin_ctz() for this. You'll either need to iterate over all set bits (so one branch per set bit) or use a compression instruction (requiring AVX-512) to turn the bit set into a set of integers.

That said, as you seem to actually want to do something with the results, you'll take a branch per match anyway, so I don't see the problem.

link

clausecker 213 days ago

This does track the state. If you want to track it across multple vectors of input, you'll need to carry it over manually.

link