More

mtigas · on April 3, 2013

1) We’re not actually using Python for OpenCV, just ruby-opencv and possibly some bindings in Java/JRuby. (I think Python’s in the build instructions due to a numpy dependency in OpenCV. Though that might be specific to using Homebrew on OS X. Definitely looking into it soon.)

2) No plans at the moment, though that's an awesome idea.

mtigas · on April 3, 2013

In fact, we’re working on an auto-detection feature at this very moment! :D

mtigas · on July 1, 2012

For my blog, static files (anything stored in an app's "static" directory, basically[1]) the like are handled transparently through django-storages' S3 support [2]. (The STATICFILES_STORAGE option in settings.) If you’ve used Django's staticfiles framework before, it’s pretty much plug-and-play.

For more dynamic file storage (say, using FileField or ImageField in a model), I believe django-storages would work, too. (Make sure you configure django-storages with the DEFAULT_FILE_STORAGE option set to S3 also.)

Assuming you’re managing your site via a local dev server (or a server that "hosts" the "hot type" version of the site), any time you "upload" a file to your local server, it'll actually upload to S3 (and any calls to "field.url" will actually map to the S3 URL). Not sure how well it'll work in all use cases: I haven't actually used FileField or ImageField myself in the django-medusa+django-storages usecase, but I have used both separately so I’m fairly sure this is possible.

This is a pretty darn good question though, so I’ll likely make a follow-up blogpost with a more comprehensive walkthrough regarding handling staticfiles and FileField/ImageField. Sometime in the near future.

Multiple sites/subdomains is a bit more complicated. I’d say you should probably use separate Django instances for each and render them separately. (For S3, you’d need to use separate buckets, anyway.) If they need to share data, you can configure multiple Django settings.py configurations for each site but still use the same source tree and local database. (See the Django sites framework: [3])

[1]: https://docs.djangoproject.com/en/dev/howto/static-files/ [2]: http://django-storages.readthedocs.org/en/latest/backends/am... [3]: https://docs.djangoproject.com/en/1.4/ref/contrib/sites/

mtigas · on July 1, 2012

What if you don’t have control over your hosting environment? (Or don’t have an application hosting environment to work with? Or don’t want to provision one?)

What if you don’t want to use your own infrastructure? (See Ars Technica’s WWDC liveblog[1], which polls JSON files that are in the same directory and appeared to be periodically updated during the event, by some software that a reporter was using. Ostensibly because the feature is short-lived, super-high-traffic — likely thousands of concurrent users — and should be as low latency as possible due to the nature of the event.)

Not to knock on Varnish, because I use it on plenty of larger things and love it. I just think that there are usecases where you can rationalize not even having an application server to cache in front of.

[1] https://s3.amazonaws.com/liveblogs/wwdc-keynote-2012/index.h...

mtigas · on July 1, 2012

It was an itch I wanted to scratch. I’d been tempted to convert my entire (Django-based) blog over to something like Hyde[1], but wanted a bit more flexibility than the framework provided.

In most applications I work on, I lovingly use and abuse Memcached, Redis, and Varnish. If you’re working on an application that warrants using a live website and the whole application server shebang, then yes, I’d agree with you.

But for something like my blog and other non-dynamic websites which don’t update very much at all, I’m not sure if I see a pressing need for an application server. My blog previously ran varnish-nginx-uwsgi-django, but I was moving off of a VPS and it was the last thing left on that server. I got curious.

In the case of something like the L.A. Times’ Data Desk[2] projects (who use their own django-bakery app), if some views are very expensive/slow to generate, you can offload the work from the application server and do it in advance. (This makes sense if you want to just render everything out on a fast workstation or if you have a local database of several hundred gigabytes that you don’t want a live server querying to crunch the data.) It’s not out of the question to pregenerate HTML pages, JSON for visualizations, and simple image files (generated in PIL).

In any case: it’s not so much a question of “high traffic apps” as much as the tradeoff between (computation cost + server maintenance cost) and (app that is server-side dynamic or updates frequently). Most people don’t want to configure and maintain an app server (with cache layers and all) for a simple app and those that don’t seem to have uptime issues the moment they get any legitimate traffic: see [3].

So:

* I decided I didn’t want to maintain an app server for my blog, and my historical average for updates is about once every four weeks (or even more infrequent). * People seemed to be big fans of Jekyll/Hyde, Movable Type’s static publishing mode[4], WP Super Cache, etc. * I felt a Django-friendly analogue to those would be cool. * Like any developer tinkering with their own blog, there didn’t have to be a point.

[1]: http://pypi.python.org/pypi/hyde/ [2]: http://datadesk.latimes.com/ [3]: http://inessential.com/2011/03/16/a_plea_for_baked_weblogs [4]: http://daringfireball.net/linked/2011/03/18/brent-baked

ericingram · on July 1, 2012

I'm wondering, what kind of dynamic features were you interested in for a blog that you update once per month? Why not just edit static files on S3?

mtigas · on July 1, 2012

I'm actually coming up on ten years of having the same blog: waaaayyyyy back when I started, I was editing all my pages manually. (Though blog posts didn’t have permalinks, it was just a growing massive "list page" that I’d break off and paginate every so often.) It started to become a pain in situations where I wanted to modify some basic aspect of every page: I’ve consistently gotten the itch to either re-architect or re-design my blog annually.

You really can’t beat a system that uses templates.

I'd tried my own Javascript-based content system in the past (where everything is based on one HTML page and JS loads the page content), but those add a bit more client-side complexity (not to mention search engine reachability).

I think the ability to regenerate an entire 500+ page static HTML site is pretty powerful and useful. (Also: who wants to manually update date-based URL paths every time there’s a new thing? http://v3.mike.tig.as/blog/2012/06/30/ http://v3.mike.tig.as/blog/2012/06/ http://v3.mike.tig.as/blog/2012/ etc.)

EDIT: As to your first question wrt "dynamic features I wanted": CMS and full control over my site's behavior and templates. It’s Django-based, too, so I can theoretically extend it with any features as necessary. (I also have plenty of "non-published" content that I can view on the local, "hot type" development server of my site, but the "renderer" file is only configured to upload blog posts marked as "live". I find that feature pretty useful.)

ashray · on July 1, 2012

I see how you can use this :) It's actually a pretty great idea for some scenarios so kudos on that!

TBH, I still think Django is overkill for a static site, there are templating systems and frameworks that would be far lighter (flask + jinja ?) - especially given that you really don't need any dynamic usage at all.

Still, great work on creating and releasing a tool that makes your (and possibly others..) life easier!

mtigas · on June 30, 2012

It’s very wget-like, due to the use of the Django HTTP test client — just slightly more elegant due to the programatic definition of what gets scraped/rendered and the addition of the "direct to S3" backend, which allows arbitrary mimetypes.

Glad you like it.

mtigas · on May 23, 2012

It's still a regular linearly-scaled chart, so I don't think it's too confusing.

Funny enough, I plugged this into Google Docs to see if I could re-scale to give you a comparison of the given chart vs a 0-based, 5-unit-interval chart and it looks like that's actually what they used to generate the chart.

https://docs.google.com/spreadsheet/ccc?key=0Ag2TNlAslc4GdFd...

Apparently the automatic scaling gives you the 8-based scale you see in their post. (And apparently you can't manually re-scale a Google Docs chart.) Using "stack" mode (I don't actually know what this option does for a single-series chart like this) gives you a 0-based, 10-unit-interval chart. (Again, not sure if you can adjust this to any other unit-scale.)

Slightly different scale, but I don't think it's a significant difference (though the "television" portion does appear slightly smaller in the automatic scale).

Charts like these are always susceptible to scale-skew and perception, and even simple charts (like this) can be debated on those subjective measures.

mtigas · on May 12, 2012

Note: The AT&T patent in question is actually 775 rather than 755: http://www.google.com/patents?id=UOMZAAAAEBAJ&dq=4555775

This was posted a few years ago and that also resulted in a fairly interesting HN thread: https://news.ycombinator.com/item?id=607335

mtigas · on May 3, 2012

Haven't looked at the SPDY spec[1] too closely, but I think each side of the SPDY (or underlying TCP) connection would be able to idle-disconnect after a timeout or during a high-load situation. (i.e. to prevent idle connections from consuming ports/file descriptors)

So in the case you quoted, the server would also be able to explicitly tell the browser to start a new connection later. (It's not just a browser-to-server signal.)

Generally, most HTTP 1.1 (keepalive-aware) servers have a default timeout for those "persistent" connections[2][3] so this isn't actually a new problem specific to SPDY.

(Aside: simply consuming leaving open an idle TCP connection for later re-use doesn't necessarily imply that idle users will "DDOS" a server. Depending on the server software and OS, the cost-per-socket is low enough that many idle connections isn't actually a problem until you get to port and file descriptor limits — which, again, is already well-dealt with in plenty of other HTTP/TCP applications by using timeouts at all.)

[1]: http://www.chromium.org/spdy/spdy-protocol [2]: http://wiki.nginx.org/HttpCoreModule#keepalive_timeout [3]: https://httpd.apache.org/docs/2.2/mod/core.html#keepalivetim...

mtigas · on April 26, 2012

This is the biggest glaring bug on the app right now.[1] Doesn’t seem to always affect the app (~30% of the time) when backgrounding to another app or if the phone is manually locked, but more regularly (>75%) happens when the phone idle sleeps.

I’m trying to nail this down, but it’ll likely take me some time to find a real fix.

[1]: https://github.com/mtigas/iOS-OnionBrowser/issues/2