Hacker News new | ask | show | jobs
by j2kun 3383 days ago
I sure hope this catches on, but we should all be aware of the hurdles:

- Little incentive for researchers to do this beyond their own good will.

- Most ML researchers are bad writers, and it's unlikely that the editing team will do the work needed (which is often a larger reorganization of a paper and ideas) to improve clarity.

- Producing great writing and clear, interactive figures, and managing an ongoing github repo require nontrivial amounts of extra time, and researchers already have strained time budgets.

- It requires you to learn git, front-end web design, random javascript libraries (I for one think d3 is a nuisance), exacerbating the time suck on tangents to research.

Maybe you could convince researchers to contribute with prizes that aligned with their university's goals. Just spitballing here, but maybe for each "top paper" award, get a team together to further clarify the ideas for a public audience, collaborate with the university and their department and some pop-science writers, and get some serious publicity beyond academic circles. If that doesn't convince a university administration that the work is worth the lower publication count, what will?

In the worst case it'll be the miserable graduate students' jobs to implement all these publication efforts, and they won't be able to spend time learning how to do research.

10 comments

You're absolutely right that this is a lot of work, and not many ML researchers have all the skills needed for it.

In the short term, Distill's editorial assistance will help authors produce outstanding papers, although they need to be willing to work as well.

In the longer-term, I'd like to explore match making between data visualization people who would like to get into machine learning and machine learning researchers publishing papers.

And in the very long term, I think the right solution is to add a new component to the research ecosystem. Just like we we have people who specialize as research engineers, theoreticians, and experimentalists, I'd like to have a respected "research distiller" specialization. Eventually, I'd like to try and start special grants for research groups to have someone focused on this.

I fall into the longer-term category as a front end data visualization person who would like to learn more ML. Please reach out to me if you're looking for JS volunteer to help with code review, visualization polish, or implementing new visualizations.
I already know a guy who's doing this. Although he chose to publish very short videos on various research (including many AI/ML), the concept and goal is more or less the same.

Two Minute Papers on YouTube:

https://www.youtube.com/user/keeroyz/videos

Karoly does lovely work! :)
As another designer + researcher with a varied background and an interest in data viz as well as ML, I am super interested in this as a potential contributor. I have experience creating an interactive visualization interface for simple ML algorithms (which has been used by professors in the life sciences department to understand / get a new perspective on what's happening). I would LOVE to be able to be involved with Distill.

I have actually been meaning to write a paper on my findings and have been looking for journals to write for. However it doesn't quite "fit" with most journals. Distill looks like it's more catered to "professional" machine learning people, at least for now. Is there any way that somebody with my background (design+data viz+development+interest and curiosity to learn ML) could be involved with Distill?

> Is there any way that somebody with my background (design+data viz+development+interest and curiosity to learn ML) could be involved with Distill?

Absolutely. We know a number of leading ML researchers who would love to publish papers as Distill articles but don't have the design/data vis skills. We'd like to facilitate collaborations which would lead to data vis people co-authoring cutting edge research papers.

This is very exciting. How are you looking at facilitating these collaborations? Will there be a listing of sorts, where, say, ML researchers would say "I need a dataviz guy" and then dataviz specialists can apply, almost like a job (or rather more like matchmaking I guess--bad analogy)?

Or would said facilitation be done by the admins / editors / steering committee? If so, then how do you plan on finding dataviz people? I'm asking this in particular because I would imagine that people who have ML findings to talk about would probably contact you ("I researched such and such, and found such and such. Now I would love to publish in Distill"). But I wonder if data visualization specialists would do the same thing. Contacting with "hey, I love data viz, would love to collaborate with somebody looking for one" feels a little inappropriate to me.

Thoughts?

> In the longer-term, I'd like to explore match making between data visualization people who would like to get into machine learning and machine learning researchers publishing papers.

As a data viz person, I would be absolutely thrilled to work on this, I'm trying to scratch time here and there to position myself better in that respect, learning more and trying to bridge that gap.

I left a comment on your blog announcement to this effect, but I'd love to be a "research distiller" :)
You already are. :) Love your blog.
Well, to be paid as such :)

And I rarely cover recent work.

  > In the longer-term, I'd like to explore match making
  > between data visualization people who would like to
  > get into machine learning and machine learning
  > researchers publishing papers.
I'm into data viz and interested in doing this. I'm currently plowing through the Fast.AI course, and was actually already considering creating visualisations to help test my thinking.
Thanks for bringing these points up j2kun.

I'm a junior faculty working in ML with no personal knowledge of web development, d3, etc. While the papers currently on Distill are absolutely gorgeous and will be an invaluable tool for learning advanced ML concepts, I simply cannot see myself or my students putting the time to actually create something like that.

Unless a student is especially adept at the specific tools needed to create these and especially enthusiastic at using them, I will actively discourage them from doing it. The time needed is simply not worth it right now.

I would be happy and grateful if tools for creating these articles become easier to learn and use eventually, such that even the lower-budget, time-constrained researchers could afford to create them.

From my experience, most ML researchers are in your camp. They are primarily interested in the ML, and good (not-just-in-your-head) visualizations are at best icing on the cake of their understanding.
i disagree with the first point. I'm working on a distill article with Chris and Shan, and the major draw for this has been impact. It seems very plausible that an article on distill has the potential to reach a far broader (and different) audience than a paper in even a top tier mathematical journal like SIAM would.

I won't deny the time commitment needed for a distill article is not trivial - it is far more work than a technical blog. But in terms of a pure tradeoff of time per publication, the calculus makes sense. Most of the work of research distillation and synthesis is already part of the research process, and writing a distill article is just a matter of putting it all of down on paper. Doing research is a far more time consuming and less predictable process.

I meant incentive with respect to career advancement, in the narrow sense of what metrics hiring and tenure committees use to make decisions.
To get a grad student or post-doc position, you're really just trying to convince a specific human that you're smart, useful, and to some extent personable. Metrics are a good argument for that, but having them know who you are before you apply is even better.

This applies especially if you write the distillation targeted at the lab you want to hire you.

Good points. We do believe that well-written articles save readers time on the other end, which hopefully will offset some (if not all) of the cost of producing them. We also believe that taking the time to edit your ideas not only helps your audience but helps your own thinking. Outsourcing the work to others would most likely just lead to adding a veneer to an article rather than a substantive improvement. Instead of outsourcing we're thinking about how to foster collaborations in the future.
I think you have emphasized the main point: a lot of work for a low reward. Research is more above exploring the state of the art and new venues, divulgation and graphics is more akin to book sellers (for example Nielsen open science, and other interesting books, but for young researcher the most important and rewarding goal is to publish.
I think it depends on what type of researcher you are. In every field there are always authoritative leaders who are comfortable writing "survey papers", which is perhaps most comparable to what the "research distiller" is all about. Except, these guys know from experience that visualization of complexity is perhaps the most direct way of communicating to the brain... and the real-time interactive nature of such technologies is far beyond "book sellers", and more into how you can imagine the future of human communication more generally approaching (perhaps with support of real-time speech recognition and graphics generation AI, e.g.)... but I digress - this is most certainly a fantastic move in the right direction for the research community at large, and especially for the machine learning community where so much is happening so fast, and we really do need people to stop and help us "distill". :) I have fond memories of finally understanding LSTMs based on Christopher Olah's blog, and if we can somehow scale this up and out in other areas, I'll gladly invest time and money and energy into helping pursue the bigger opportunities here...
Well, now you need a Distill WYSIWYG, to make it usable (for most of the intended audience).

Hey let's be honest, most academics (that I know) still don't even use LaTeX (or refuse to do so). This is really cool, but requires way too many skills (in js/css3/html5/distill-extensions and node.js).

Personally, my team and I had really great experience with sharelatex.com, whom only I had knowledge about LaTeX. I liked that it's also opensource with a permissive license. I would rather host that on sandstorm.io the next time, or just pay for the comfort offered by overleaf.com (I've never seen such a beautiful colloborative LaTeX Editor).

• What about vendor lock-in?

• Can you export to LaTeX, Word or PDF?

• Can you selfhost it for your team or company?

> Hey let's be honest, most academics (that I know) still don't even use LaTeX (or refuse to do so).

What field? TeX is pretty much de rigueur in Math/CS/Physics graduate schools in the U.S.

To my surprise certain subfields of CS don't use LaTeX at all or rarely and use MS-Word instead. You kind of have no choice since the conference/journal templates are only provided in one format (well you can create your own template but only if they accept PDF entries...yes some only accept Word files).
I agree with you here, and have no idea what the OP is talking about. In Math and ML, TeX is so ingrained in the culture that there are _jokes_ based on TeX puns. (Where do mathematicians go for a rational rack of ribs? The \mathbb Q)
You're right, i've been myself using git, github, keynote, ffmpeg, medium, JS, python, d3 and others to build blog post.

I clearly don't expect people to do that much. I can only do that because i'm coming from web development, and very nice tools started to appear recently.

People in research needs a design framework like a set of templates for keynotes/PPT/JS/CSS (think about how much traction got bootstrap). Distill is doing an awesome jobs at showing the example of what you could do.

Maybe Distill could open-source the templates they use to build those blog post?

They did actually! [0] The blog posts are also online on their GitHub site.

[0] https://github.com/distillpub/template

Awesome!
up next: a neural net that reorganizes research papers to improve clarity
Your criticism is spot on. If something like Distill existed for my own research area I would applaud it, but probably not use it because of time constraints.

On the other hand, being able to write well and to create good interactive illustrations are valuable skills. Maybe we could incorporate these things into seminars or otherwise crowdsource the creation of e.g. individual figures?

I'm not in academia, but I guess the impact (citations) you could get with a distill-like paper will be higher than the ones you get on a traditional paper-based journal.

So, I guess this will get distill get traction.