Hacker News new | ask | show | jobs
by MooMooMilkParty 2234 days ago
Posting this verbatim from my reddit comment, but I ran a few quick fits and the (arbitrarily scaled) Planck distribution looks better than standard fits of some other distributions. Of course I ran absolutely no statistics on these or tried any sort of real analysis that should have been included in the paper in the first place, so take it with multiple grains of salt:

I fit a few distributions to the data and actually the Planck distribution at 9000K seems to be noticeably better. I did no further statistics and haven't even looked at the fitted parameters for this. Note: I am an earth scientist, not a physicist. https://imgur.com/Qg4ixF2

Edit: More distributions: https://imgur.com/IrF6lsV

Edit: Filtering out sodium and potassium to try to account for some of the low-wavelength counts doesn't seem to help fitting the distributions either: https://imgur.com/dHDUS9Z

You can get the data here: https://physics.nist.gov/PhysRefData/ASD/lines_form.html

And here's the (garbage) code to reproduce my plots:

    %pylab inline
    import pandas as pd
    import scipy
    import scipy.stats
    
    mpl.style.use('seaborn-muted')
    mpl.rcParams['figure.figsize'] = (18, 12)
    mpl.rcParams['font.size'] = 16
    df = pd.read_csv('nist.csv')
    
    def filter_string(x):
        x = str(x).replace('=', '').replace('"', '').replace('*', '').replace('+', '').replace('(', '').replace(')', '')
        if len(x) == 0:
            x = 'NaN'
        return x
    
    asl_comp = df['ritz_wl_vac(nm)'].apply(filter_string).astype(float)
    asl_ver  = df['obs_wl_vac(nm)'].apply(filter_string).astype(float)
    sp_num = df['sp_num'].astype(float)
    
    df_neutral = df[sp_num == 1]
    asl_comp = df_neutral['ritz_wl_vac(nm)'].apply(filter_string).astype(float)
    asl_ver  = df_neutral['obs_wl_vac(nm)'].apply(filter_string).astype(float)
    sp_num = df_neutral['sp_num'].astype(float)
    
    asl_ver = asl_ver[~asl_ver.isna()]
    
    def planck(wav, T=9000.0):
        h = 6.626e-34
        c = 3.0e+8
        k = 1.38e-23
        a = 2.0*h*c**2
        b = h*c/(wav*k*T)
        intensity = (a / ( (wav**5)) * (1 / (np.exp(b) - 1.0) ))
        return intensity
    
    
    dist_names = ['beta', 'lognorm', 'pearson3', 'gumbel_r']
    x = np.arange(0, 1500, 5)
    intensity = planck(x / 1e9)
    size = len(asl_ver)
    
    yvals, xvals = np.histogram(asl_ver, bins=300, normed=True)
    
    for dist_name in dist_names:
        dist = getattr(scipy.stats, dist_name)
        param = dist.fit(asl_ver)
        pdf_fitted = dist.pdf(x, *param[:-2], loc=param[-2], scale=param[-1])
        plt.plot(x, pdf_fitted, label=dist_name, linewidth=4,)
    
    plt.hist(asl_ver, bins=300, density=True, color='grey');
    plt.plot(x, intensity / 1.1e17, linewidth=4, color='red', linestyle='--', label='scaled planck (9000k)')
    plt.legend()
    plt.xlabel(r'$\lambda (nm)$')
    plt.ylabel('Density')
2 comments

Neat! Still, I wouldn't say the Planck distribution is noticeably better. The reason the other 4 distributions appear to have the peak too low is because they're doing a tradeoff: they do that to better fit the values of the left hand side of the graph.

The Planck fit gets the peak height right, but mainly because it has no chance of fitting the left hand side at all (since it increases much more slowly than the other distributions), so it doesn't even try.

This is why I said it would be better to have actual statistical tests -- we shouldn't have to have this kind of qualitative discussion when it's already a solved problem.

Can you link the discussion on Reddit?
Seemingly multiple people downvoted this. Why??