Why don't companies feel comfortably "code dumping"? Just throw everything online as a tarball, and say "we aren't supporting this and we don't want to have anything to do wit h this, but here's the source."
DataJoy Co-founder here. A lot of the DataJoy code IS available (https://github.com/sharelatex/web-sharelatex/tree/datajoy). Our other product ShareLaTeX has an open-source version that you can run locally and is very similar to the version we host at sharelatex.com. DataJoy naturally shares a lot of code with ShareLaTeX (if you look at the two products, you'll see they're very similar). However, with DataJoy, we never got the product to a stage where we felt it made sense to invest time into 'good open source' (documentation, installation guides, etc), but the 'code dump open source' version has always been there.
The main thing that isn't open source with DataJoy is our backend for running code. At the moment this is so tied into Docker, S3, and how we deploy it in our infrastructure, that I don't think it would be much use to anyone else. The innovations here have been in how we deploy and provision it, not in the code itself.
if it's innovative, putting it up is still useful as a case study on how to perform said innovation. It's not bringing you any revenue anyway, so it's not like you'd lose anything. Just take out all of the keys and passwords etc from the repo!
Indeed, and one of the reasons is the success of ShareLaTeX means that it takes our team's whole attention to keep up with the growth of ShareLaTeX, and keep investing in feature development to keep up with demand. ShareLaTeX isn't going anywhere.
So that sounds like success (even though it's sad for datajoy users)! You guys tested the waters with 2 products, one has found product-market fit, and now you guys are focusing on growing that one.
> Why don't companies feel comfortably "code dumping"? Just throw everything online as a tarball, and say "we aren't supporting this and we don't want to have anything to do wit h this, but here's the source."
- It may contain configuration information.
- It may contain private keys or passwords.
- It may contain customer specific code (if you maintain customer specific features either via feature toggles or branches), which may leak information of your paying customers.
- It may have unintended copyright violations.
- It may contain software that is licensed in a way that makes it a copyright violation to distribute your software outside or your company (publishing it is distributing it). This may also apply if you distribute your sources without any (source or binary) parts of the proprietary dependency.
- It may fall under the export restrictions for cryptographic software (these have been mainly dropped, but not completely).
- It may directly or indirectly make your patent violations public (oh, you have them already, but nobody knows about them).
- It's of embarrassing quality.
- It may make it public that your company has defrauded its customers / and or users.
- It may make it public that your company has supported its customers to commit fraud and/or other crimes (the RICO act makes this more easier to follow up on for law enforcement).
I have never worked on any non-trivial project where not almost all issues were present.
I know this isn't a general rule, but based on my experiences with proprietary projects, even with the best of intentions most became a jenga-esque pile of hacks and shortcuts that few developers would want to show off in public even if the overall system works well.
I'm a proponent of publishing code that's useless, obsolete, buggy, poorly documented, etc, if the rights holders are so inclined. The reason is that I think it can still be useful as training data for, say, designing new programming languages/tools around coding patterns seen in the wild.
If you have a product people have paid money for, and say you're releasing it as open source because it isn't sustainable, the people looking for that code aren't typically interested "training data".
I get your point that messy code may be useful for other purposes, but believe that the vast majority of users are looking for code to solve their problems, not "training data" or research.
That said, as you say, rights holders can do what they will.
Yep, I'm just saying that if someone can and wants to release their code, they shouldn't let "is this even useful to anyone?" stop them. It could turn out to be useful in ways that nobody has foreseen.
> The reason is that I think it can still be useful as training data for, say, designing new programming languages/tools around coding patterns seen in the wild.
Do you happen to have any evidence of that ever being done even once in the history of mankind?
It's a nice idea, but you should know by now the world isn't a slave to your desires.
I don't have evidence of this approach actually being used to inform language or tool design, but it's not like it's an outrageous concept that nobody's thought of before. Your hostility is perplexing.
No examples come to mind of someone just releasing a bunch of proprietary code without some effort involved. But people sometimes poke fun at the Apache Foundation as a place for companies to wash their hands of old code that they don't want to keep maintaining, and there are some success stories there.
As a part of a startup that just went under (by "just", I mean literally yesterday) they either can't release the code due to fiduciary responsibility to investors (it could be worth something, particularly if someone decides to dump more money in or the company somehow miraculously recovers) or because of contracts into which they have already entered with investors.
Even if they wanted to open source or just share the source, there could be some real effort involved in reviewing the code to ensure that nothing sensitive will be exposed.
Re the other comment about owning copyright on everything (eg. Due to components being licensed from other vendors), they could rip that out. Thinking about this triggered a memory of when Descent 1 source was published, they excluded the sound library for this reason. Could be tough to excise something more integral though I guess.
Can you legally do this? Not sure how the legal terms around assets are normally structured...
If the code is your sole asset, wouldn't investors get access to it? This is kind of like lighting your office chairs on fire vs returning them to investors.
If my current company went under, I don't see who would put in the effort to sift through all 3rd party items we use for licenses to see if we're even allowed to do it.
Someone also owns the code, I guess, and they might want to use it for something else at some point.
Some products might have been hack-jobs form the beginning that ballooned, they might work but they could be total hack jobs that would either A) be worthless on their own or B) slightly embarrassing to show.
I've not been in the situation from a company perspective, but I have taken a product I've made off from the shelves and the factors above came into play. I am sure there are more as well.
Was this 3rd party code that you bought or stuff you found on github? In the latter case, basically, you're saying that you didn't know and didn't care about compliance because no one could have found out?
>Was this 3rd party code that you bought or stuff you found on github? In the latter case, basically, you're saying that you didn't know and didn't care about compliance because no one could have found out?
That's a rather uncharitable interpretation.
I think it's more likely that they don't know if they're licensed to publish the code in question.
The main thing that isn't open source with DataJoy is our backend for running code. At the moment this is so tied into Docker, S3, and how we deploy it in our infrastructure, that I don't think it would be much use to anyone else. The innovations here have been in how we deploy and provision it, not in the code itself.