Hacker News new | ask | show | jobs
by ck2 4504 days ago
It would be funny if netflix just looked for one of their hubs/datacenters and moved in next door on purpose.

ISPs are common carriers and must be regulated as such, because as soon as Comcast makes its own netflix-like service, you can forget getting netflix to stream smoothly.

5 comments

I've done this before for large scraping projects. I find the datacenter the target website is hosted in, then get a dedicated server right next to it. I've never gotten better performance.
Putting aside legal issues, you don't have any moral problems doing volumes of scraping content that is not yours?
Certainly not! If I was re-selling the data, maybe. But I'm generally using it for statistics and data viz. I include the source of the data and I always obey robots.txt. Sometimes I'm even able to talk with the owners beforehand to get their ok.

(Don't downvote him, it's a valid question)

Now you'll have to tell us about the project...

300TB is quite a lot, even today.

Over time, I've learned to wget every web page and content archive I want to keep. The Internet forgets.
In an earlier age, I ran everything through squid to consolidate browser caches. About five minutes after setting it up, I realised that pulling all the references in the log file and then indexing the lot with htdig would be tremendously useful when I was on the road without internet access.

I spent way too much time pruning stupid crap such as slashdot and started to learn this 'Bayesian classifier' thing.

Your idea is much better.

That's personal use, I have no problem with that. The above project sounds commercial in nature.
That seems pretty presumptuous...
Why should he? It's publicly available information.
How large of a scraping project are we talking about in terms of throughput?
Largest was over 300TB. I talked with the owner beforehand and got access to the internal IP address so traffic wouldn't leave the datacenter (free of cost).

I offered to help them set up an API instead of scraping, but they decided scraping was easier in the short term.

dumb question, but how were you finding their datacenter?
traceroute is a good starting point. Sometimes I have to try a couple different datacenters until I hit <4ms ping time. Sometimes I just ask the datacenter is website x is hosted there.
The easiest way is usually to run a 'whois' on the IP of the server (not the domain).
> as soon as Comcast makes its own netflix-like service, you can forget getting netflix to stream smoothly.

Comcast owns NBCUniversal and a 1/3 share of hulu as well as an "ondemand" service through the comcast cablebox and the internet, "soon" happened already. And in many places netflix on comcast already does not stream smoothly (almost unquestionably due to throttling of netflix traffic).

Comcast agreed to not throttle competing video services, at least until 2018, back when it acquired NBC (and so part of Hulu, which it had to step back from management from) a couple of years back.

So if you can make a case that they're "almost unquestionably" throttling Netflix, you could probably find some people very interested in that... but it sounds more like they were just saturated and are now taking steps to alleviate it.

Comcast has it's own netflix-like service. They call it something like Xfinity OnDemand. It's actually not terrible.
If you're talking about streampix, read the disclaimers [1] and see why Netflix still wins (availability, selection, not tethered to one set-top-box, always on-demand, no random fees/taxes tacked on, doesn't require bundling to a subscription level). Until that changes, Netflix has nothing to worry about.

"Not available in all areas. Set-top box required to access On Demand on TV. Programming not available On Demand in all areas. Basic Service subscription required to receive other levels of service. Not all programming available in all areas. Equipment, installation, taxes and franchise fees extra. Pricing subject to change. Streampix included with the following tiers of service: Blast Plus, HD Preferred Plus XF Triple Play, HD Premier XF Triple Play and HD Complete XF Triple Play. Services and features subject to change..."

[1] http://www.comcast.com/streampix (see the legalese at bottom)

Except for the STB interface. At least the mobile app is decent.
Are you on X1?
Not available here. A Google Fiber truck was spotted in Atlanta, so I wouldn't be surprised if I can get that out in Winder before X1.
It's not like Netflix is fantastic by itself. It's serviceable, but the player is very rudimentary, and often has severe issues with things like subtitles.
There are only a handful of buildings where you need to be in order to just be a cross-connect or switch fabric away from nearly every network in the world.
Comcast already has a competing service: Cable TV