NewDimension's comments

NewDimension · on May 25, 2020

Anyone know where such websites get their data? I'm looking for an API for Netflix that would let me list all movies that have specific subtitles available. Netflix.com/browse/subtitles only shows a limited amount of movies.

pcmaffey · on May 25, 2020

Netflix closed their API some years ago. Only a few grandfathered sites still have access.

eg. I use instantwatcher.com

kodablah · on May 25, 2020

They maintain scrapers/indexers undoubtedly. In the absence of an API, you can scrape the aggregators yourself.

NewDimension · on Jan 16, 2020

How does one approach companies for consulting when it's process improvement? I have a product for a specific industry I'm part of, it needs to be adopted company wide as it touches on all departments. Ideally, a company would let me spend a few days to understand their processes, and I'd tell them how they can improve.

I would do the discovery phase for free, but I'm not sure how to phrase it. I fear using the word free, as I feel that changes the dynamics. And would set expectation for a cheap product down the line.

ericmarcos · on Jan 17, 2020

1) Work your network. Our first few customers were personal connections or connections of a strategic business angel of ours. Identify the ones that most feel the pain you're solving. I'm talking about specific roles in these companies, not the companies as an entity. For example, imagine you're building a tool that improves collaboration between marketers and devs, find out who needs this more: the CMO? the CTO? VP Eng? a project manager? Find your champion and deeply understand his pains.

2) Even if your technology solves a company wide process, I wouldn't sell it like this at the beginning. Touching many departments it's a pain in the ass for your champion: need to convince a lot of people, etc, so it's unlikely they'll do it. Optimize for your champion pains and find a reduced version of your product that solves his pains.

3) Engage your champions in the product discovery, make them feel like if they're also defining the features (but don't build exactly what they asks for!!). Once they're excited about it start selling them the grand vision so they can sell it internally involving more departments.

NewDimension · on Dec 20, 2019

Somewhat offtopic, do you know of a library that would allow me to select an area of a PDF through a GUI and only read the text in those coordinates?

ncallaway · on Dec 20, 2019

The tesseract-cli (and so I'm sure the library also) will give you HOCR output, which is an HTML format that gives you the text, with bounding boxes around paragraphs and individual characters.

https://github.com/tesseract-ocr/tesseract/wiki/Command-Line...

It's not quite what you want, but I think you could probably filter the output based on the selected region and pretty quickly get what you want.

narayanans · on Dec 21, 2019

Try tabula[0]

It is opensource and runs on Java.You can also extract the areas of interest in the pdf and run it via cmdline[1].You can get more details if required on my blog[2]

[0]https://tabula.technology/

[1]https://github.com/tabulapdf/tabula-java/wiki/Using-the-comm...

[2]https://narayanansiyer.com/Tabula/tabula/

sailfast · on Dec 20, 2019

I think the Project Naptha extension by the folks that wrote this library will do that, no? https://projectnaptha.com/

Not sure if it only reads at those coordinates vs. OCRing the whole thing (for example if you were legally prohibited from OCRing content outside a certain coordinate space), but it is selectable.

severine · on Dec 20, 2019

You could simply pipe an area screenshot to tesseract, discard the input image and get the tesseract output, am I wrong?

NewDimension · on Dec 20, 2019

That sounds like a valid approach, any idea what tools I could use to get the define the area and get the screenshot?

severine · on Dec 20, 2019

You possibly have one installed. Mine comes with my desktop (Xfce), and gives me a GUI and a CLI to take screenshots of the full desktop, any window, or a particular area defined by crosshairs.

There's a very popular and minimalist CLI called scrot that I think would be ideal... well scratch that, I made a search and our question has already been asked and answered:

https://askubuntu.com/questions/280475/how-can-instantaneous...

https://stackoverflow.com/questions/21497447/ocr-on-a-screen...

mkl · on Dec 20, 2019

If I remember correctly, I did it with the ImageMagick "import" command. I found I had to add a wide white border, as Tesseract got confused near the edges of the image (this was over 10 years ago though).

mdtusz · on Dec 20, 2019

I'm not sure if there's a non GUI interface for it, but zathura does this for pdfs.

jjohansson · on Dec 20, 2019

Commercial or open source? PDFTron can do it, but they’re not an open source project.

NewDimension · on Dec 20, 2019

I prefer something I can install locally (doesn't need to be open source). I'm trying to extract text from a PDF at a certain position, the PDF is indeed text not an image so OCR isn't strictly needed.

The goal is to draw a box using GUI, then use those coordinates to extract text from several homogeneous pages.

I also have a different goal of trying to interpret structure of a PDF that has visual structure (headers, sections and subsections all numbered). But that seems to lend itself to some sort of text parsing.

severine · on Dec 20, 2019

I also have a different goal of trying to interpret structure of a PDF that has visual structure (headers, sections and subsections all numbered). But that seems to lend itself to some sort of text parsing.

Some reading here: https://stackoverflow.com/questions/53219016/detecting-secti...

jjohansson · on Dec 20, 2019

PDFTron provides an SDK and isn't really meant as a plug-and-play end-user application. But it can accomplish what you're looking for.

Here's how to extract text from a PDF based on coordinates (this explains how to do it on web, but it's also possible using other platforms):

https://groups.google.com/d/msg/pdfnet-webviewer/h2W3VksbQUI...

Here's how to extract a PDF's logical structure:

https://www.pdftron.com/documentation/samples/#logicalstruct...

pierre · on Dec 20, 2019

Pdf.js and filtering the output. Par.sr with the good input module configuration

NewDimension · on Oct 13, 2019

Do you have a static IP at home? How does your cloudflare setup work?

bpye · on Oct 13, 2019

I've done similar. You firewall your home network to all IP's other than Cloudflare's. You can use a Cloudflare provided certificate for HTTPS - they will MITM and use a trusted cert for outward connections. You can update Cloudflare DNS records via their API - the typical dynamic DNS tools work fine. It works well.

I've always been unable to pull this off completely as I always want a way to SSH into my home network - but maybe there is a better way I can pull off this sort of 'break glass' functionality.

tbyehl · on Oct 13, 2019

> I always want a way to SSH into my home network

Guacamole (sorta) gives me that. If CloudFlare or nginx or Guacamole have problems then I'm hosed... but I work from home so remote access isn't a huge concern.

And I've got nothing terribly "household critical" at home, just the PiHole needs to be running to keep everyone happy. I do wish that PiHole had an HA solution. I've been tempted to set up a pfSense / pfBlockerNG HA pair but that's a lot of overhead just for DNS.

rovr138 · on Oct 13, 2019

> I do wish that PiHole had a HA solution

You could run 2 Pi’s or a Pi and a container in another always on machine for example. Then just point your router‘s primary to the Pi and secondary to the other instance.

bpye · on Oct 13, 2019

That's not a terrible solution. I've just been looking at possibly forwarding SSH over WebSocket - then I can put that behind CloudFlare. Latency would however suffer.

jlgaddis · on Oct 13, 2019

> ... want a way to SSH into my home network ...

IMO, using a Tor hidden service is a (damn near) perfect solution for this.

sterlind · on Oct 13, 2019

aren't jitter and latency still major problems with this approach? plus connection resets, though maybe long-lived flows are more reliable than I remember, and I suppose you could do multipath (if Tor doesn't handle that already, not sure.)

have you made it work? my Tor career ended in college after running an exit node - no visits from the FBI, just got auto-klined from every IRC server since I was on the list of proxies.

NewDimension · on May 14, 2018

Does anyone have recommendation for a non-software dev environment? e.g. user task workflow management. I'm looking for a fully fledged product and/or an API backend.

tedmiston · on May 14, 2018

Does something like Zapier fit your use cases?

NewDimension · on May 15, 2018

Zapier looks like a app trigger. I'm looking for more of a task management system.

BMarkmann · on May 15, 2018

Check out the BPM tools others have mentioned (Activiti, Camunda, etc...)

NewDimension · on March 31, 2015

I would venture to say this is an April fools joke