Hacker Newsnew | past | comments | ask | show | jobs | submit | NewDimension's commentslogin

Anyone know where such websites get their data? I'm looking for an API for Netflix that would let me list all movies that have specific subtitles available. Netflix.com/browse/subtitles only shows a limited amount of movies.


Netflix closed their API some years ago. Only a few grandfathered sites still have access.

eg. I use instantwatcher.com


They maintain scrapers/indexers undoubtedly. In the absence of an API, you can scrape the aggregators yourself.


How does one approach companies for consulting when it's process improvement? I have a product for a specific industry I'm part of, it needs to be adopted company wide as it touches on all departments. Ideally, a company would let me spend a few days to understand their processes, and I'd tell them how they can improve.

I would do the discovery phase for free, but I'm not sure how to phrase it. I fear using the word free, as I feel that changes the dynamics. And would set expectation for a cheap product down the line.


1) Work your network. Our first few customers were personal connections or connections of a strategic business angel of ours. Identify the ones that most feel the pain you're solving. I'm talking about specific roles in these companies, not the companies as an entity. For example, imagine you're building a tool that improves collaboration between marketers and devs, find out who needs this more: the CMO? the CTO? VP Eng? a project manager? Find your champion and deeply understand his pains.

2) Even if your technology solves a company wide process, I wouldn't sell it like this at the beginning. Touching many departments it's a pain in the ass for your champion: need to convince a lot of people, etc, so it's unlikely they'll do it. Optimize for your champion pains and find a reduced version of your product that solves his pains.

3) Engage your champions in the product discovery, make them feel like if they're also defining the features (but don't build exactly what they asks for!!). Once they're excited about it start selling them the grand vision so they can sell it internally involving more departments.


Somewhat offtopic, do you know of a library that would allow me to select an area of a PDF through a GUI and only read the text in those coordinates?


The tesseract-cli (and so I'm sure the library also) will give you HOCR output, which is an HTML format that gives you the text, with bounding boxes around paragraphs and individual characters.

https://github.com/tesseract-ocr/tesseract/wiki/Command-Line...

It's not quite what you want, but I think you could probably filter the output based on the selected region and pretty quickly get what you want.


Try tabula[0]

It is opensource and runs on Java.You can also extract the areas of interest in the pdf and run it via cmdline[1].You can get more details if required on my blog[2]

[0]https://tabula.technology/

[1]https://github.com/tabulapdf/tabula-java/wiki/Using-the-comm...

[2]https://narayanansiyer.com/Tabula/tabula/


I think the Project Naptha extension by the folks that wrote this library will do that, no? https://projectnaptha.com/

Not sure if it only reads at those coordinates vs. OCRing the whole thing (for example if you were legally prohibited from OCRing content outside a certain coordinate space), but it is selectable.


You could simply pipe an area screenshot to tesseract, discard the input image and get the tesseract output, am I wrong?


That sounds like a valid approach, any idea what tools I could use to get the define the area and get the screenshot?


You possibly have one installed. Mine comes with my desktop (Xfce), and gives me a GUI and a CLI to take screenshots of the full desktop, any window, or a particular area defined by crosshairs.

There's a very popular and minimalist CLI called scrot that I think would be ideal... well scratch that, I made a search and our question has already been asked and answered:

https://askubuntu.com/questions/280475/how-can-instantaneous...

https://stackoverflow.com/questions/21497447/ocr-on-a-screen...


If I remember correctly, I did it with the ImageMagick "import" command. I found I had to add a wide white border, as Tesseract got confused near the edges of the image (this was over 10 years ago though).


I'm not sure if there's a non GUI interface for it, but zathura does this for pdfs.


Commercial or open source? PDFTron can do it, but they’re not an open source project.


I prefer something I can install locally (doesn't need to be open source). I'm trying to extract text from a PDF at a certain position, the PDF is indeed text not an image so OCR isn't strictly needed.

The goal is to draw a box using GUI, then use those coordinates to extract text from several homogeneous pages.

I also have a different goal of trying to interpret structure of a PDF that has visual structure (headers, sections and subsections all numbered). But that seems to lend itself to some sort of text parsing.


I also have a different goal of trying to interpret structure of a PDF that has visual structure (headers, sections and subsections all numbered). But that seems to lend itself to some sort of text parsing.

Some reading here: https://stackoverflow.com/questions/53219016/detecting-secti...


PDFTron provides an SDK and isn't really meant as a plug-and-play end-user application. But it can accomplish what you're looking for.

Here's how to extract text from a PDF based on coordinates (this explains how to do it on web, but it's also possible using other platforms):

https://groups.google.com/d/msg/pdfnet-webviewer/h2W3VksbQUI...

Here's how to extract a PDF's logical structure:

https://www.pdftron.com/documentation/samples/#logicalstruct...


Pdf.js and filtering the output. Par.sr with the good input module configuration


Do you have a static IP at home? How does your cloudflare setup work?


I've done similar. You firewall your home network to all IP's other than Cloudflare's. You can use a Cloudflare provided certificate for HTTPS - they will MITM and use a trusted cert for outward connections. You can update Cloudflare DNS records via their API - the typical dynamic DNS tools work fine. It works well.

I've always been unable to pull this off completely as I always want a way to SSH into my home network - but maybe there is a better way I can pull off this sort of 'break glass' functionality.


> I always want a way to SSH into my home network

Guacamole (sorta) gives me that. If CloudFlare or nginx or Guacamole have problems then I'm hosed... but I work from home so remote access isn't a huge concern.

And I've got nothing terribly "household critical" at home, just the PiHole needs to be running to keep everyone happy. I do wish that PiHole had an HA solution. I've been tempted to set up a pfSense / pfBlockerNG HA pair but that's a lot of overhead just for DNS.


> I do wish that PiHole had a HA solution

You could run 2 Pi’s or a Pi and a container in another always on machine for example. Then just point your router‘s primary to the Pi and secondary to the other instance.


That's not a terrible solution. I've just been looking at possibly forwarding SSH over WebSocket - then I can put that behind CloudFlare. Latency would however suffer.


> ... want a way to SSH into my home network ...

IMO, using a Tor hidden service is a (damn near) perfect solution for this.


aren't jitter and latency still major problems with this approach? plus connection resets, though maybe long-lived flows are more reliable than I remember, and I suppose you could do multipath (if Tor doesn't handle that already, not sure.)

have you made it work? my Tor career ended in college after running an exit node - no visits from the FBI, just got auto-klined from every IRC server since I was on the list of proxies.


Does anyone have recommendation for a non-software dev environment? e.g. user task workflow management. I'm looking for a fully fledged product and/or an API backend.


Does something like Zapier fit your use cases?


Zapier looks like a app trigger. I'm looking for more of a task management system.


Check out the BPM tools others have mentioned (Activiti, Camunda, etc...)


I would venture to say this is an April fools joke


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: