Hacker News new | ask | show | jobs
by abarrettwilsdon 2379 days ago
Is the code open source? How would one know with certainty their data isn't being surreptitiously copied?

(You can also recreate this functionality locally with a few lines of Python):

import pandas as pd

import os

list_of_files = [f for f in os.listdir('.') if os.path.isfile(f) and ".csv" in f]

print(f"Files that will be concatenated are: {list_of_files}")

original_df = pd.DataFrame()

for file in list_of_files:

  df = pd.read_csv(file, keep_default_na=False)
 
  original_df = pd.concat([original_df, df], axis=0, ignore_index=True)
original_df.to_csv(f"Output file.csv", index=False)
4 comments

The code is not open source. However, If you inspect the network traffic from Chrome developer tools you can verify there are 0 post events occurring.

True, you can do this with many languages like python (quite easily with pandas as you pointed out) or with awk, perl, even cat (minus ordering) etc. For more power users Excel Power Query is your friend, but as you know these require script setup and memory considerations.

This tool is geared for ease of use anyone can do without the fuss of custom scripts and/or setups and for the most part almost zero memory concerns as the output is being written on the fly.

your snippet just concatenates the content of 2 CSVs (while adjusting headers)?

Here's how it's done in JS: https://repl.it/@caub/csv-merge

I like glob for getting the list of files:

    import glob
    list_of_files = glob.glob("*.csv")
You can also use Pathlib:

    from pathlib import Path
    dir = Path(".").glob('*.csv')
Glob is great but on some OS (Linux maybe?) it doesn't guarantee maintaining the order of files. If order matters, it would make sense to sort the results prior to processing.
Obligatory plug for csvkit which allows you to do that and much more without having to write any code > https://csvkit.readthedocs.io/en/latest/