Anyone know where such websites get their data? I'm looking for an API for Netflix that would let me list all movies that have specific subtitles available.
Netflix.com/browse/subtitles only shows a limited amount of movies.
How does one approach companies for consulting when it's process improvement? I have a product for a specific industry I'm part of, it needs to be adopted company wide as it touches on all departments.
Ideally, a company would let me spend a few days to understand their processes, and I'd tell them how they can improve.
I would do the discovery phase for free, but I'm not sure how to phrase it. I fear using the word free, as I feel that changes the dynamics. And would set expectation for a cheap product down the line.
1) Work your network. Our first few customers were personal connections or connections of a strategic business angel of ours. Identify the ones that most feel the pain you're solving. I'm talking about specific roles in these companies, not the companies as an entity. For example, imagine you're building a tool that improves collaboration between marketers and devs, find out who needs this more: the CMO? the CTO? VP Eng? a project manager? Find your champion and deeply understand his pains.
2) Even if your technology solves a company wide process, I wouldn't sell it like this at the beginning. Touching many departments it's a pain in the ass for your champion: need to convince a lot of people, etc, so it's unlikely they'll do it. Optimize for your champion pains and find a reduced version of your product that solves his pains.
3) Engage your champions in the product discovery, make them feel like if they're also defining the features (but don't build exactly what they asks for!!). Once they're excited about it start selling them the grand vision so they can sell it internally involving more departments.
The tesseract-cli (and so I'm sure the library also) will give you HOCR output, which is an HTML format that gives you the text, with bounding boxes around paragraphs and individual characters.
It is opensource and runs on Java.You can also extract the areas of interest in the pdf and run it via cmdline[1].You can get more details if required on my blog[2]
I think the Project Naptha extension by the folks that wrote this library will do that, no?
https://projectnaptha.com/
Not sure if it only reads at those coordinates vs. OCRing the whole thing (for example if you were legally prohibited from OCRing content outside a certain coordinate space), but it is selectable.
You possibly have one installed. Mine comes with my desktop (Xfce), and gives me a GUI and a CLI to take screenshots of the full desktop, any window, or a particular area defined by crosshairs.
There's a very popular and minimalist CLI called scrot that I think would be ideal... well scratch that, I made a search and our question has already been asked and answered:
If I remember correctly, I did it with the ImageMagick "import" command. I found I had to add a wide white border, as Tesseract got confused near the edges of the image (this was over 10 years ago though).
I prefer something I can install locally (doesn't need to be open source). I'm trying to extract text from a PDF at a certain position, the PDF is indeed text not an image so OCR isn't strictly needed.
The goal is to draw a box using GUI, then use those coordinates to extract text from several homogeneous pages.
I also have a different goal of trying to interpret structure of a PDF that has visual structure (headers, sections and subsections all numbered). But that seems to lend itself to some sort of text parsing.
I also have a different goal of trying to interpret structure of a PDF that has visual structure (headers, sections and subsections all numbered). But that seems to lend itself to some sort of text parsing.
I've done similar. You firewall your home network to all IP's other than Cloudflare's. You can use a Cloudflare provided certificate for HTTPS - they will MITM and use a trusted cert for outward connections. You can update Cloudflare DNS records via their API - the typical dynamic DNS tools work fine. It works well.
I've always been unable to pull this off completely as I always want a way to SSH into my home network - but maybe there is a better way I can pull off this sort of 'break glass' functionality.
Guacamole (sorta) gives me that. If CloudFlare or nginx or Guacamole have problems then I'm hosed... but I work from home so remote access isn't a huge concern.
And I've got nothing terribly "household critical" at home, just the PiHole needs to be running to keep everyone happy. I do wish that PiHole had an HA solution. I've been tempted to set up a pfSense / pfBlockerNG HA pair but that's a lot of overhead just for DNS.
You could run 2 Pi’s or a Pi and a container in another always on machine for example. Then just point your router‘s primary to the Pi and secondary to the other instance.
That's not a terrible solution. I've just been looking at possibly forwarding SSH over WebSocket - then I can put that behind CloudFlare. Latency would however suffer.
aren't jitter and latency still major problems with this approach? plus connection resets, though maybe long-lived flows are more reliable than I remember, and I suppose you could do multipath (if Tor doesn't handle that already, not sure.)
have you made it work? my Tor career ended in college after running an exit node - no visits from the FBI, just got auto-klined from every IRC server since I was on the list of proxies.
Does anyone have recommendation for a non-software dev environment? e.g. user task workflow management.
I'm looking for a fully fledged product and/or an API backend.