Hacker News new | ask | show | jobs
by jandrewrogers 4916 days ago
There are (at least) three valid perspectives on the patent situation within the software community but part of the reason so little constructive progress is made is that many people with strong opinions will often flat-out deny or reject the validity of the other perspectives because it is outside their experience.

Three perspectives of which most people ignore one or more:

- Computer technology is over-run with frivolous, vague, stupid, conflicting, and contradictory patents. Any pretense of quality control by the USPTO was lost in the 1990s. This imposes a non-trivial cost on the entire ecosystem and a complete absence of quality control is arguably worse than no patents at all.

- R&D into new computer algorithms is a non-trivial investment, frequently requiring years and millions of dollars. There is a tendency among programmers to discount the level of effort required to develop a new computer algorithm that materially extends the state-of-the-art even though most could not develop such algorithms themselves and have never been involved in such R&D. Organizations that make this investment do so looking for a return.

- Academia is already facing difficulties in computer science because much of the state-of-the-art research is being done by private companies. Much of this research is being treated as trade secrets because (ironically) patents offer flimsy practical protection. As a consequence, there are a number of areas in computer science where the leading academic papers are literally a good half decade behind the state-of-the-art that is buried in NDAs. Lack of publication means that a lot of smart people are wasting time duplicating work. Patents were originally invented precisely to avoid this outcome. One of the reasons that I stopped reading academic computer science in some areas that interest me is that I see computer science under NDA that is much more sophisticated, which is a shame.

Any practical policy will need to take into consideration all of these perspectives. It is not as convenient and simple as "all software patents are evil!" or "software patents FTW!" but it more closely reflects the real tradeoffs.

6 comments

>R&D into new computer algorithms is a non-trivial investment, frequently requiring years and millions of dollars. There is a tendency among programmers to discount the level of effort required to develop a new computer algorithm that materially extends the state-of-the-art even though most could not develop such algorithms themselves and have never been involved in such R&D. Organizations that make this investment do so looking for a return.

I don't think critics of software patents are in the practice of claiming that software R&D is always quick or inexpensive. Rather, the claim is that the patent system is demonstrably incapable of improving that situation, and in the meantime has spawned enormously wasteful multi-billion dollar litigation between otherwise upstanding major companies and struck fear into the hearts of small developers who can no longer produce a successful innovative product without risking a shakedown by despicable parasites.

Companies expecting a return have numerous other, less innovation-damaging alternatives to software patents. First to market advantage, copyright and trade secrets cover the field pretty well on their own, and no one can accuse any of those things of causing the average software entrepreneur to lose sleep over the prospect of totally unpredictable ruinous litigation.

The litigation is very damaging, it's also true however that research is being doubled/tripled/etc as it's done in private by multiple organisations and kept as trade secrets, which is inefficient when you consider the world as a whole.

It's just that we can't presently see any regulation based solution for that.

I understand that duplication is inefficient, but how is the patent system doing anything productive whatsoever to address it in the software industry? If something with market value can be kept as a trade secret then the incentive will be to do so notwithstanding the patent system, and then to patent various other things for legal defensive purposes which are less valuable in the market but more valuable in the courtroom. And things which can't effectively be held as trade secrets won't be wastefully duplicated anyway because the first to produce such a product is by stipulation unable to keep the secret.

More than that, if we were at all concerned about the use of trade secrets causing wasteful duplication then we cannot consistently allow the law to protect them as something of value. If we are so keen on disclosure and reducing duplication then industrial espionage should be fully legal as an efficient means of distributing knowledge to other market participants, and holders of trade secrets should be directed to the Patent Office as their sole means of protection. But I think we are not so keen because the problem of duplication is not so large, and the incentive provided by the patent system would not compare well with the incentive provided by the alternative.

>Academia is already facing difficulties in computer science because much of the state-of-the-art research is being done by private companies...As a consequence, there are a number of areas in computer science where the leading academic papers are literally a good half decade behind the state-of-the-art that is buried in NDAs.

Can you elaborate on which areas of CS academia you think are more sophisticated in the corporate wilds? Other than systems work at Google, I haven't encountered any subtopics which aren't dominated by ideas from traditional research centers (either university labs or academic research units at MSR, IBM, etc...)

One area that's heavily under NDA (though perhaps not quite the situation described) is graphics drivers. NVidia and AMD both have high-performance graphics drivers, but the open-source alternatives (reverse-engineered for NVidia, but AMD sponsors their open-source Linux driver) are far behind. They're reluctant to even share hardware specs, let alone the code from their proprietary drivers.

Part of the reluctance is that they don't want to compromise the DRM systems that their products are complicit in, but most of it is patent related. They already have to pay royalties to many graphics has-beens for things like S3TC, and the fear is that if the details of what their hardware and software is doing were publicly available, they'd be painting targets all over themselves. The people AMD employs to help with their open-source Linux drivers are constantly citing "legal review" as the hold-up for releasing new specs or code, but never that they're concerned about making it easier for NVidia to reverse-engineer their stuff.

One area that's heavily under NDA (though perhaps not quite the situation described) is graphics drivers.

But that's not because the algorithms are new and awesome, it's mostly just because the exact specs of the hardware are not happily shared.

CS research is not about lack of hardware specs.

It is also not about protecting someones DRM.

It is also not about crappy patents.

And it is also not about how easy your published work is to reverse-engineer.

None of those are CS research problems.

> "But that's not because the algorithms are new and awesome, it's mostly just because the exact specs of the hardware are not happily shared."

No, that doesn't fully explain the situation. They're worried about far more than their competitor knowing how many ALUs are on their GPU. Graphics drivers have to solve several hard problems: optimizations for shader compilers, scheduling, and memory management, and that's before you even get into the graphics-specific stuff. In recent years, GPUs have been one of the most active areas of research into computer architectures. For you to suggest that a modern GPU and its drivers don's embody any hardcore CS research is just plain stupid - if GPUs were simple to build, then Intel would have shipped a good IGP by now.

I not disagreeing that they do include core CS problems, I just don't believe their solutions to the core of those problems are that far ahead of academia.

In terms of the vendor specific features, sure, neither academia, not anyone else knows much about them. But in the fundamentals of scheduling, binning, etc, I don't think anyone is very far ahead of all of academia.

Just because a grad student somewhere has discovered an algorithm doesn't mean their knowledge is on par with the company that knows when that algorithm is actually useful, and has shipped code using it. Writing a textbook on matrix decompositions and factorizations doesn't mean you would be able to create a Google-quality search engine given a large enough server farm and a few months to crawl the web. The state-of-the-art is far more than what's theoretically possible, and the really abstract stuff like proofs of bounds on the asymptotic running time of a solution to a problem have never been the kind of research that is patentable. Even if the equation is known in academia, if a company spent years and millions of dollars to find the right coefficients, then the company is ahead of academia in a non-trivial way. Reduction to practice matters.
I consider corporate research centers to be non-academic. I know for a fact they produce a considerable number of CS advances that are not published but which are used internally or quietly embedded into expensive products.

In all three areas where I have been involved in R&D -- distributed spatial indexing, parallel graph analysis, databases -- the state-of-the-art has been under NDA for years. Basically, any company where pushing the envelope on their advanced software systems is a significant competitive advantage.

The most obvious example is parallelizing graph analysis. Benchmarks like Graph500 are dominated by systems operating at a scale far, far beyond the reach of any algorithm in literature. There is ample evidence that vastly superior algorithms exist relative to what CS academia is producing. In fact, when I was active in this particular area almost four years ago, there were (at least) two different algorithms being used to achieve this kind of scale-out.

On the other hand, most people are not familiar with what is in the literature.

The premise of the patent system is that people wouldn't naturally document a secret process and so providing an external incentive to do so is a net benefit. This is a superficial presumption at best...

Since research must be peer-reviewed to be debunked or even understood, there is a limit to the benefits of trade-secrets and NDAs as a hedge against competition. You might be able to get a short-term advantage, but it's unsustainable because the knowledge cannot transfer. An employer ties his/her own hands by internalizing process knowledge (ironically, not what was intended): the increasingly small number of people with the specialized knowledge having increasing leverage; the profits are destroyed when your only choice is to pay a small number of in-house experts whatever they want, because noone else can do the job. This doesn't happen overnight mind you, but it would after at most a single generation (the point at which you have to train your successors or close shop).

IMHO, the case for patents is hollow.

> "On the other hand, most people are not familiar with what is in the literature."

This is the larger friction. We will continue to grasp at straws until people have thorough and integrative education and access to existing techniques without exorbitant costs or fear of prosecution. Even major (though more frequently minor) improvements over existing techniques are only temporary wins; research that never becomes widely disseminated and understood ultimately becomes a sunk cost.

To refer to your original 3-points:

1) I totally agree, "computer technology is over-run with frivolous, vague, stupid, conflicting, and contradictory patents."

2) I do not discount the level of investment required to push the boundaries of process knowledge, but I would say that it is part of normal competition and those costs are part fo being in the game. A trade-secret is only valuable for a few years, after which it can and should be disclosed to maintain a low cost of employment (unless you want your employees becoming your partners). My point is that patents don't provide any significant benefit toward this end and it has always been the case that if a company or person doesn't have to reveal it's process, it won't. (And the employment of patent lawyers and trolls is not itself a valuable end.)

3) "Academia is already facing difficulties..." Again, I would say this is a temporary situation at worst. We should not extrapolate from the first few points on this curve. We know more about the context: companies that don't publish (internalize all their knowledge) will eventually have no ground to stand on since no one will be able to contribute without an understanding of their internal processes. At most a company can keep only as much knowledge as they can afford to convey, at their expense, to a new hire. They would have to turn their back on a lot of our publicly standardized and centralized educational framework. Unless something were to fundamentally change, I don't see any way those costs can be justified.

Databases. Oracle, Microsoft, etc. have figured out a lot about how to make high-performance query execution engines and transactional storage systems and written about very little of it.

Research has caught up some, but it definitely lags.

You're confusing solid engineering with research.

SQL Server started as a fork of Sybase (Microsoft bought Sybase's source code and started hacking). Sybase, in turn, was based on Ingress and "Ingres was first created as a research project at the University of California, Berkeley, starting in the early 1970s and ending in the early 1980s" (http://en.wikipedia.org/wiki/Ingres_(database)).

SQL Server is literally based on technology for 70s.

Ingress was started by Michael Stonebraker, who then did PostgreSQL (which added novel, at the time, extensions to relational model), who then did Aurora, C-Store and Vertical (column-oriented databases), them Morpheus, then H-Store and then VoltDB.

Stonebreaker did more research (as in: creating novel things) than Microsoft as a whole in SQL Server.

SQL Server is a great database but it's a result of Microsoft paying an army of programmers for 24 years to work on improving a single product. It's a result of running a profiler often, not some unknown-to-the-world algorithms.

That has happened in the past too, in many areas. eg compiler research used to be like this. But many of the companies died, and stuff was reinvented again, probably wastefully (or differently, who knows).
I am sure they have a lot of tiny performance improvements. But since when is CS research about tiny performance improvements?

It's not like Oracle or anyone else has any secret algorithm which runs in linear time when all of academia only knows of exponential time solutions for the same class of problems.

If "CS research" includes "Software Engineering" then yes, query plan optimisers are definitely currently researched.
Do you think quicksort was an unimportant advance? After all, it's only O(n log n) in the average case, and is O(n^2) in the worst case. By your standards, it shouldn't have been seen as any kind of improvement over mergesort.
> there are a number of areas in computer science where the leading academic papers are literally a good half decade behind the state-of-the-art that is buried in NDAs.

At a former company, I learned of the existence of a code packaging technology that would apply arbitrary code changes to a running VM instance atomically. By 2002, it was possible to implement a web application server whose pages were JIT compiled to machine code yet still upload and apply a code change atomically. With some care, transactions in flight could even be stopped mid-way and continue execution on the new code base. On top of that, it was possible to debug page renderings and change and recompile code in the debugger. Not so fancy now, but this was pretty slick in 2002.

From what I understand, a lot of the FPGA and vector calculus stuff that financial-services companies are doing is leaps and bounds ahead of the published materials. But I'm not under NDA in any related projects, so that's just hearsay.
I'm pretty sure we have Wall street quants here, unless one of them offers a convincing argument that yes, they are well ahead of academia, I don't believe it.

In my anecdotal experience, the financial industry is light years behind most post-graduate level mathematicians and physicists. Finanical math is kind of scary for anyone who knows more math than the average banker.

But that's the thing, right? Most of the people working on this stuff are disincented from discussing it, both contractually and in terms of their own profit motives.
That's because you never hear about what's going on outside of the research centers- they're all under NDA.
Do companies want to profit from this computer science research being done under a NDA? Don't they have to release these advances in the form of a product to earn a profit?

Then show me some products which demonstrate this "cutting-edge" computer science research.

They don't necessarily have to, and even if they did, it may not be obvious.

Consider a financial company working to improve prediction algorithms for their in-house use, hiring smartypants PhDs and giving them free reign and great pay. The result after ten years could be way ahead (or even just a little ahead) of the academic world's work and never release a product with a sticker for a big shiny new algorithm.

The firm wouldn't even have to stand out in its success; it could do reasonably well compared to others, and just attribute a lot of its modest success to its algorithmic insights.

Even in externally released products, really clever ways to get around things aren't necessarily visible. Just today I was reading about Jonathan Blow's work [1] on localised kriging [2] for his upcoming game The Witness. He's pulling from advanced geostatics academia for a little feature he wanted in a game, and if he didn't blog about it (and then discuss enhancements in the comments) no one would know it existed - even once the game is released. A small example to be sure, but I think it exemplifies the point.

[1] http://the-witness.net/news/2010/05/kriging-is-cool/ [2] http://en.wikipedia.org/wiki/Kriging

I don't see how this situation is an issue of patents, really. A company that is decidedly keeping its prediction technology secret to gain an edge in a market which is entirely about being better at prediction than the other guys is not going to patent and therefore open source their technology no matter what happens to the patent system.
Patents exist to encourage inventions to be published, in exchange for exclusive licensing rights of the invention. Their entire purpose is to be an alternative to trade secrets. So it is fair to say that effective patents are those that are best at convincing people that they should publish their work instead of keeping it secret, and that there will be no financial impact to them doing so.
A company could instead profit by trading using their results, and some results could belong to governments.

I'm reminded that my graph theory professor said that he could factor polynomials over finite fields in polynomial time, but that he could not tell me how to do it.

...but doesn't this sort of argument reduce to "you can't disprove the existence of unicorns?". Eventually this cutting-edge stuff has to come out from under NDA, so we should at least know of major advances from the 90s which were only revealed later.
> R&D into new computer algorithms is a non-trivial investment, frequently requiring years and millions of dollars.

There is a strong opinion among computer researchers that pure algorithms are indistinguishable from math, and thus should be unpatentable. The math guy doing research on sound waves might be slightly annoyed when he gets denied a patent, but the guy working "on a computer" with the exact same math, doing the exact same research, gets a patent because he described the math "on a computer".

Of course, we could just start letting math also be patentable, because it too requires non-trivial investment, frequently requiring many many years and millions of dollars. That is, if the work behind something alone is enough to qualify for patentability.

I'm interested in a reply to iskander's question as well as some concrete examples of new "computer algorithms... frequently requiring years and millions of dollars".
Think of the algorithms sitting in the base band of your cell phone. They implement things like modulation, hand-offs, power control, etc. E.g. there is some algorithm that controls the transmit power of your device to maintain the minimal transmit power necessary to close the link to the base station while you move around, walk into buildings, etc. There's another algorithm that tries to optimize hand-offs as you go from one cell site to another. Etc. These algorithms are very expensive to develop because: 1) you need very expensive people (PhD's with decades of experience) to work on them; and 2) getting them right takes a ton of experimentation and tuning with real equipment in a variety of physical scenarios.
All of your examples sound like variations on fundamental CS problems, and most CS undergrads should have been exposed to them and to their solutions.

Are you saying industry is leaps and bounds ahead in the fundamentals of CS theory? Or just that there is a lot of vendor specific detail in the hardware and infrastructure? Because the latter is not CS.

They're not just variations on fundamental CS problems because the complexity of the problem is dominated by physics + hardware + infrastructure. E.g., while most CS undergraduates are exposed to control theory, a power control loop isn't a simple application of a controller. It has to deal with the physics of signal propagation, knowledge of the kinds of environments users encounter, the characteristics of the underlying radio, and the nature of the network infrastructure. All that insight and experimental validation is ultimately packaged as an algorithm (although a very specific and detailed one).

To analogize to another domain: a power control loop in a cell phone base band is as much "just a variation on fundamental CS problems" as is register allocation for a hairy architecture like x86. Yes, graph coloring gives you a conceptual framework to start with, but that gets you 10% of the way to a usable solution.

Sure, but that's not CS is it. It's physics and hardware engineering, which just happens to be codified in software.
You seem to be saying that it's "fundamental computer science" to be aware of a problem and the naive ways to solve it, but not fundamental CS to know what methods are actually usable in the real world. That definition may have its merits, but is certainly not appropriate to use in a patent law discussion.
Yep, examples would be interesting. There are counter examples - like the open Opus audio codec, which was developed by collaboration of various engineers and which is state of art precisely because they weren't burdened by stupid patenting issues, and could instead concentrate on creating a beautiful technology.
How is it a counter example? Open Opus is built on patented technologies that are licensed royalty free by the original inventors (Broadcom, Xiph.org, Microsoft, etc). It certainly wasn't cheap to develop--the development cost is just being subsidized so the end result can be given out freely.
"Patented" doesn't equal "was expensive to develop".

In the context of codecs we're actually seeing how it prohibits improvements of technology.

Broadly speaking (i.e. I'm sure there are counter-examples, but they are not significant in the big picture), people didn't patent compiler technology, and we had great progress in compiler technology.

People didn't patent database technology and we had great progress in database technologies, from both academic research and competition.

People didn't patent word processing technologies so that Google can re-implement Word functionality in the browser because none of the fundamental techniques have been patented. Again, lots of progress from competition.

For whatever reason, audio and video codecs are heavily patented and the technology is ludicrously outdated compared to what it could have been if we had progress from competition or academic work of free software implementations because the patents cover basic ideas in compression so no-one can build on them and patent holders have no incentive to improve the technology because it's much easier (and more profitable) to just collect royalty checks.

However, developing video codecs isn't any more expensive than developing databases or compilers or word processors.

See e.g. h264 encoder which is an open-source effort by few amateurs that is widely regarded as being of better quality than commercial offering costing thousands of dollars (because of patent monopolies, not because they were so expensive to develop).

I don't disagree with your premise per-say, but, I do think its pretty easy for me, personally, to dismiss some of these positions wholesale.

1) I agree with

2) Computer technology moves at such a breakneck pace and is generally so hard to reverse engineer that this point doesn't hold water for me. We know that google's search algorithm works, ok, but can we reverse engineer it from the outside? The answer seems to be no, so why protect it with patents? Even if we could, by the time we were done google would already be on to the 'next big improvement', there is no catching up in software short of a drastic stumbling in the incombent, should we really protect the incombant from messing up?

3) The entire concept of going to college for software related endeavors is in such peril straits that I don't think we should even consider the affects or impacts of academia on software development, it's well in past at this point regardless of the patent situation.

We know how Google's algorithm works in no small part because Larry and Sergey came up with it as part of their graduate studies at Stanford. Stanford owns the patent(s?) on it and made hundreds of millions licensing it to Google.
You know how it worked ten years ago. Do you really think that has much to do with how it works today?
As of four years ago (last time I had any meaningful contact with search quality people at Google), it was a very convoluted layer cake of diverse signals and carefully tuned heuristics.
I guess that's what they needed Go (golang) for.
Well said points!

It takes time and paying people so they can focus on solving hard problems for real innovation to happen. And its sad when it then has to become proprietary so that the operation can protect having any ROI

sharing is caring, and if that means you can have some revenue still, great!