Hacker News new | ask | show | jobs
TextAnalysisTool.NET (textanalysistool.github.io)
104 points by gadiyar 882 days ago
7 comments

https://lnav.org/

Is a good Linux command line tool in the same genre

> Is a good Linux command line tool in the same genre

It is also a good OS-X/FreeBSD command line tool as well.

(Not .NET I should say ;))
> (Not .NET I should say ;))

Perhaps it might be[0]?

I'm not quite sure how I feel about that however... :-D

0 - https://learn.microsoft.com/en-us/dotnet/core/install/

Everyone has autonomy over their own system. Who are we to judge if they want to do something like that.
> do something like that

What's wrong with this? Genuine question.

Modern .NET is fully open-source with a permissive MIT licence. This includes the compiler and analysers infrastructure (Roslyn), the package manager (Nuget), and even the shell language (PowerShell).

It is a superb alternative to Java, Go, and similar languages. Why is using .NET on Linux or MacOS such a weird thing?

Looks nice! Especially the SQL query feature.
This is grep++ right?

My guess is that it's aimed more at the humanities. Hence the GUI. My experience: in the world of humanities text analysis, there are just a ton of Java programs which were funded by some academic grant. Mostly they are closed source, not updated, might have a horrible GUI, and the website is always written in 8 point font.... Don't hate them for what they are....

reminds me a bit of klogg https://github.com/variar/klogg which is more for log files and based off glogg which went dead. it has nice filtering and highlighting type stuff. It's great for live views of log files.
This looks like the app I wanted ages ago when I tried to find a TextAnalysisTool.net replacement for OSX.

https://superuser.com/questions/706761/textanalysistool-net-...

Marginally related, but this is one of the things I'm bullish on ChatGPT for. Too frequently, I've gotten hundreds of lines of malformed textual data that I need to standardize. This is like impossible with REGEX but I can drop it into GPT and it does this wonderfully.
There is however no indication if it failed on a line when using ChatGPT, it could provide you with a slightly incorrect result.
Yeah that's always been a fear but I always dog food and I've had no issues yet
ive done it with transforming data (for example pasting a table in and asking it to turn it into LaTeX) or something and had the occasional issue with it misordering or forgetting things. It didn't take long to spot the error for me though
You could run it through thrice with a different prompt/temperature/model and pick the majority result (or exit with success on the first two passing runs).
Good idea. If the data is a list of records where the order isn't important, randomly permuting them (ETA: then sorting the final outputs) would be another option.

ETA2: Would the downvoter care to explain why? Genuinely puzzled.

I have no idea how Regex became the standard. The syntax is impossible to remember unless you write regex expressions daily. Most people only rarely need regex so it needs to be relearned every time. It is also incredibly unsatisfying to write (and read).
I used it a lot for a few years decades ago, and only use it rarely now. I remember the syntax well and I am not know for having a great memory. I think its terseness suits the extreme focus of its use perfectly.
As much as I hate to hop on the AI bandwagon, this is definitely where tools like Chat GPT shine.

Not to mention, non tech people will now be able to use what once could’ve only been done with cryptic regex.

The only good thing to come out of regular expressions is https://regexcrossword.com/
What would you replace regex with?
I tried using ChatGPT (4) for format conversion. I had a draft yaml file and needed some differently structured json. Mainly with the same content.

If you just want to change the format it works. If you need more than programming skills it seems too fail duo to the amount of text.

E.g. if you have a list of items and want ChatGPT to generate a meta field which it cannot generate using simple python code it stops after 10 to 20 elements.

Thus at least the cloud version doesn't work so well here.

I also wanted it to help me fill out my i18n file with translations and plural forms. Even thought he got every word correct i needed to split it into multiple requests. Not sure if the api would have worked better (used the web frontend).

For the plural forms I finally added them myself as it was way faster for my natural language than copy pasting all the small chunks. Really hoped for more help there.

hey, if you are search for really seamless i18n with nice DX, check out https://inlang.com – js library, web editor, automation cli & vs code extension are just some of the completely free and open source offerings
Agreed. It works especially well for formatting where semantics matter, such as separating the term and definitions of flashcards. Hard to do with code, but easy with GPT.
Where is the source code? It looks like they only host releases on github, but the license is MIT
The MIT license just gives you permission to use the work as published. Normally that work would be in source form, but there is nothing in the MIT license requiring that. In this case, it seems that the authors chose to release the binaries under the MIT license.
Just for completeness:

> I open-source pretty much all my work, but TextAnalysisTool.NET is an exception due to a variety of historical reasons. Sorry about that!

-- https://github.com/TextAnalysisTool/Releases/issues/22

I wonder what the historical reasons are.
It was originally an internal tool, so I would guess either A) he doesn't have permission from all the contributors or B) he used reused code from elsewhere within Microsoft that wasn't open source compatible.
This tool is pretty good. Used it to find the meaningful errors from giant MSBuild logs
Whenever convenient the MSBuild binary log file in combination with MSBUILD Structured Log Viewer is a better fit.
If I had to chose a name for it, it probably would be "Regex 401".