Hacker Newsnew | past | comments | ask | show | jobs | submit | additive's commentslogin

Is it faster than pdftohtml?


Probably not, as font conversion is slow. pdftohtml does not extract fonts for now.


It's amusing to see Bill Gates upset about college students not learning much and many not finishing. He's a billionaire. But he's also a dropout. And now he's reading self-help books.

I'd like to see Bill Gates go back to school and earn a degree or two. Is that a bad thing to do? Why? He obviously has the time and money. But how dare I even suggest the idea? Who am I compared to Bill Gates? A mere plebian. So why would I suggest it? Beause it would be a great example to set. In my opinion. Not sure if he is a believer in setting examples and the tendency of young people to emulate "role models". Like, e.g., billionaire dropouts.


That would also be a good way for him to see what college is really like, since he is so intent on fixing it.

(I'm a dropout too)


From the indictment: JSTOR is a not-for-profit (=no tax liability?). JSTOR charges universities annual subscription fees as high as $50,000.

Are we told anywhere what JSTOR's actual costs are for scanning documents and serving PDF's? Yes. Someone provided a link to details of JSTOR's budget in the other Swartz thread. Very interesting. They appear to be some very well paid "librarians" (archivists). And lo and behold, they are trying to figure out how to make Google bucks. Seed the Google index with links to JSTOR articles that sit behind a paywall, then charge $10 or more for a la carte access. (Can you feel the desperation?)

Sounds great, they can piggyback on Google, maybe run some SEO, do some behavioural tracking and all that. But there's just one problem: this is library material. Library as in the kind that is funded by grants, taxes, tuition or endowments. Non-commercial. And even more, does Google Scholar show ads? Do they charge anyone for access?

I'd put my money on projects like archive.org or publicresource.org, who charge nothing, before I'd bet on these guys. Sadly, this criminal case may really be all for nothing. Because businesses like JSTOR will likely fail, not because of kids like Swartz, but because they simply are not as smart about technology as the folks who run sites like archive and publicresource.


I'm pretty confident all the data in PACER will be made public at no cost at some point in time. I think the $.08/page charge is supposed to cover the administrative costs (like paying for photocopies) but the PACER program actually runs at a surplus. I think this may have even been part of the rationale for the trial they ran which Swartz used to do the downloads. The government wants to open up PACER.

But what struck me as partularly stupid about what Swartz did (besides the fact that the data will probably be released anyway, in due course, without the need for "activism") is that he installed stealth code on a computer in a Federal Court Law Library. Of all places he chose a federal building, and a Federal Court Law Library. This just sounds idiotic.

And the irony of it all, at least to me, is I just downloaded his Superceded Indictment for free from archive.org. It appears others are succeeding in making court documents publicly available without installing stealth code on federally-owned computers. Maybe they do not have everything in PACER yet, but I think it's only a matter of time. Courts are perhaps a little slow to change with new technology but despite their budget constraints they are definitely making progress. And publicresource.org seems to be getting bulk data with the blessing of the courts and without installing any stealth scripts on Law Library computers.

If one really wanted to engage in some sort of activism to free up (what should be free) legal documents, maybe a better focus is Lexis-Nexis. A true monopoly, founded on a dubious intepretation of copyright law. Can you copyright court decisions? They managed to do it. And the founder is on the Forbes list.


PACER is partially open now; for example, I went looking a little while ago for filings relate to modafinil prosecutions, and wound up paying nothing at all because I fell below their $10/monthly cap or whatever.


This is one of my favourites. So many nuggets in there. A very nice interview. Programmers like Thompson are the only thing that keeps me interested in computers.

It's really sad that the loudest voices in computing are no longer thinkers like Ken Thompson. Look at some of comments in this thread. Pathetic. It's like Slashdotters insulting W. Richard Stevens after his passing. Extreme stupidity.

How many programmers these days can build from the bottom up? How many can start with a blank canvas?

It's a little troubling to me that he has to work for Google (as much as I love GOOG their work is not exactly stuff of Bell Labs: how can we serve more ads?), but I guess you have to do what you have to do.

The comments about his music collection are great. "Illegal downloading!" :) The legal department just looked the other way.

He also more or less says, in this field, there are really no new ideas. This is hard for today's programmers to accept I guess: the idea it's all been done before. But oh how I wish today's self-proclaimed "productive" programmers - who are alamringly ignorant of history and even contemptuous toward any code does not meet their strange notions of "freshness" - how I wish they would take Thompson's comments to heart.

Now, here's a question: How similar is Go to Sean Dorward's Limbo? If we put them side-by-side how many similarities would we see? If no one answers, I may just do this myself. I think it would be interesting.


> It's a little troubling to me that he has to work for Google (as much as I love GOOG their work is not exactly stuff of Bell Labs: how can we serve more ads?)

google has some very interesting systems problems to be solved that are almost entirely divorced from the fact that the monetary goal is serving ads, or even that the core goal is organising the world's information. the company is seriously attempting to push the boundaries of "how can we make large clusters of people, machines and networks more efficient and productive", and thompson's work on go is squarely in that realm.

(disclaimer: i work for google, but i was a fan of that aspect of the company long before i joined)


I agree with your gentle disagreement with your parent.

Google serves ads on top of search results for the same reason that ATT sent bills for phone service: to make money. If you focus on Google's ad serving as their reason to exist, then you might as well focus on ATT as a billing company, as those are the direct ways that both make money.

And in that vein, Google and ATT (well, the old ATT) are much more alike than different. They both took existing fledgling technology and essentially re-invented it into a profoundly reliable and life changing system; in fact they all but invented a new science.

There are orders of magnitude difference in depth and impact between twisting two wires together and the physical and information science discovered and invented by ATT, and there are equally orders of magnitude difference between serving up an html page of a computer's directory and the physical and information science discovered and invented by Google.

Either one is a fine place for Ken Thompson or anyone else to work.


Good points. But... why did Google kill Google Labs? Why not have a separate entity for basic research, like Bell Labs? There is no shortage of funding.

I do see a major difference between selling ads to advertisers (and organising the world's personal information for profit) and selling phone service(s) to customers, but maybe that's just my perspective - I've been using the telephone and the web much longer than most people working at Google - I've seen how things could be done differently.


You seem to have misunderstood what Google Labs was. It was not an organization doing basic research. It was (crappy) infrastructure for making neat little experiments available to the public without making them production quality.


Yeah, I knew that. I didn't mean to imply it was a separate entity for doing basic research. What I meant was the fact they closed it seemed to suggest that the idea of research for its own sake is not really somethng Google is interested in. At least, that how it looks from outside the Googleplex. Inside it may look different. (And yes, I am aware that employees are allowed to publish. But that research is almost always for the ultimate purpose of furthering ad sales, though it may not be immediately obvious to the outside observer.)

It's also interesting how Google employees, at least the ones who post on HN, seem to be isolated from what is going on elsewhere in the company. Is there cross-pollination between functional groups within Google? Great research organisations often have this quality. (Prepare for waxing poetic from young Google employees who think they are changing the world...)


"labs" was an unfortunate name, google labs was about product experiments rather than basic research. what you want is http://en.wikipedia.org/wiki/Google_X_Lab, whence emerged (in the "what has google done for us lately" department) the self-driving car and the augmented-reality glasses.


Yes, I thought of it as a cross between beta testing and A/B testing. Or throwing spaghetti on the wall and seeing what sticks. Certainly not basic or serious research, which is probably going on right now somewhere in Google.


> Now, here's a question: How similar is Go to Sean Dorward's Limbo? If we put them side-by-side how many similarities would we see? If no one answers, I may just do this myself. I think it would be interesting.

Quite a few, and you might note that Sean Dorward is now also at Google working on Spanner.

And for details on Limbo, see this paper by Dennis Ritchie: http://doc.cat-v.org/inferno/4th_edition/limbo_language/limb... and this one by Kernighan: http://doc.cat-v.org/inferno/4th_edition/limbo_language/desc...

And as others pointed out, is also worth checking Alef: http://doc.cat-v.org/plan_9/2nd_edition/papers/alef/


If you do compare Limbo to Go, compare them to Alef as well.


Hypothesis: Limbo --> Alef --> Go?

Is this wrong?



While I do not disagree, one point I would like to add regarding the notion of "breaking into a computer closet, etc.". Someone, a Harvard student, recently posted to HN a "love letter" to MIT. The letter went into detail about how liberal MIT is with its resources for students. And how that really has benefitted her studies. It seems MIT is somewhat unique, at least vis-a-vis other universities in the region, in their approach to making resources available to "almost anyone" (i.e. you do not need to be an MIT student) for academic purposes.

Is it possible that if he were to have tried this stunt at another institution he would not have so easily succeeded? Was he simply taking advantage of MIT's liberal policies with respect to computer resources? Or is MIT's "do whatever you need to do" environment irrelevant... as we ponder thoughts of "breaking and entering". Just a thought. Maybe it's irrelevant. What do you think?


MIT alum here. Lots of MIT resources are accessible to almost anyone. That doesn't mean they all are -- it's easy to get into our computer labs, but the network closets are actually off-limits to anyone other than network admins. Some things are more liberal, but that's not a free license to do whatever you want.

Nor does it mean it's acceptable to abuse MIT's trust. In particular, presumably as a result of this case, JSTOR now requires strong authentication from the individual MIT account holder, instead of permitting access from MIT's IP address space as they used to.

Finally, yes, MIT does have an "it's better to ask forgiveness than permission" culture. But that very clearly only applies to legitimate MIT affiliates. I know of at least one other legal case (of perhaps equivalent importance) where MIT's lawyers said, if this guy were an MIT student or staff member, we'd go to bat for him, but since he's not, take the content down.


Thanks for this. I was just curious. "Abusing trust" is exactly the type of thought I had when I first read about this case. It sounded to me like MIT is very generous with letting people use the computer labs and he really took these privileges a little too far. But being far from MIT I can only form a picture from what I read. Thanks for the color.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: