Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This change makes a ton of sense if RECAP wants to make these documents accessible. Aggregate the data and make it easy for wholesale users. Soon enough, numerous sites/services will crop up, likely funded by ads, subscription fees or non-profit status, competing on usability for retail consumers. That will drive down profits and promote UI development and curation. Compare, e.g., ERISA and EDGAR data.


I'm the author of the article linked at the start of the thread. Replying to try to focus the discussion on the specific change I was writing about.

Piker, can you say more about how "the change" make the documents "accessible" (or more accessible)? They were already at Internet Archive just fine. Several sites already copied the documents from IA and added their own presentation, cross-linking, notifications, and other services on top. I don't see the proposed changes as helping with this. Indeed, by sending the latest data only to CourtListener but not to IA, the proposed changes stand in the way of the other sites and services you envision -- as it seems they'll now have to license the data from FLP/CL (on a paid basis), rather than get it free directly from IA. These are the general concerns I was trying to present in my article.


>> FLP also proposed to upload litigation materials to IA in only machine-readable formats compressed into enormous multi-gigabyte tarballs, ending the human-readable individual HTML files that have for years made it easy for normal users with standard web browsers to see court records.

Perhaps the "only" is telling here. Were they previously also uploading the tarballs? No wholesale user would want to scrape the thousands of extra pages of HTML to download the content. So if they weren't already uploading the tarballs, this is actually a beneficial change.


Previously FLP was uploading files that users can read with a web browser -- HTML, PDF, and also XML with metadata. I could and did link directly to HTML and PDFs, including circulating these materials with coauthors and research assistants and members of the press.

If FLP begins uploading only huge tarballs, and not the individual constituent files, I won't be able to do any of that.


"Soon enough, numerous sites/services will crop up, likely funded by ads, subscription fees or non-profit status, competing on usability for retail consumers."

Will they compete with the Internet Archive, which does not need to serve ads or charge fees?

The authors question remains unanswered. Is there any reason for the data to now be withheld from the Internet Archive?

This comment appears to be an appeal for "competition" that involves eliminating a competitor: the Internet Archive.

But consider that there are some users who prefer the Internet Archive for usability. Perhaps this is why the author writes about this on his blog. Anyone wishing to compete on usability can copy the data from Internet Archive and reformat it as they wish.

In the same way, there are some users who prefer EDGAR for SEC filings versus alternatives in terms of usability. Anyone can copy the data from EDGAR, repackage it and then compete on usability.

The existing source of the public data may present the data in a format free from Javascript tracking, third party advertising, paywalls and "free apis" that would allow access to be limited and optionally denied (contradicts stated objective of "making these documents accessible"; instead seeks to limit access). For some users, being free from these impediments makes the data highly accessible.

With respect to those users, any new proposed source must compete with the existing source. Not to mention the potential it allows for any others who may wish to reformat/repackage the data.

When the people behind the new proposed source call for the discontinuation of the original source, this raises a red flag.

Eliminating a competitor is not a prerequisite for "competition".

Finally, the suggestion of "non-profit status" is interesting.

Lets say someone who enjoys programming wants to start a project/company that repackages donated public information in a way that she believes is more usable than the alternatives.

Lets assume the costs of doing this are not that much, mainly just her time.

If she charges fees to users or advertisers for access, her income from this effort might exceed her costs.

She might reinvest the surplus into the project. She might pay herself a salary.

Is there a limit on how much she can pay herself while the business still remains tax-exempt?


In fairness to free.law which appears to provide unrestricted bulk access and uses open formats, what stops anyone, including the blog author, from downloading the bulk data, and then uploading to the Internet Archive?

To some users, it is actually preferable to have bulk access to raw data than thousands of individual html pages on a www server. Because for some it may be less work to transform the raw data into some other format, e.g., 1000s of pages of html, than it is to scrape all those html pages from a www server, process and store them in a searchable format.

Consider raw text versus PDF. PDF may look great but text is more flexible and more easily searchable. While its easy to convert text to PDF for reading, converting from PDF to text for searching is fraught with difficulty and a high margin for error.

If one accepts

   more difficult: PDF -> text 
   less difficult: text -> PDF
then it stands to reason that text is the more preferable format to start with. Because its both easy to search and easy to generate PDF for aesthetics and reading.


There is: A reasonable salary that's decided upon an an independent board of directors or an independent compensation committee based on evidence that that salary is in line with market rates for similar work.

Some details: http://blueavocado.org/content/how-much-pay-executive-direct...




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: