More

charukiewicz · on Aug 17, 2023

I've been working on a web application that has a bar code code scanning component and I will say that while the apparent performance of this library is impressive, I am not a fan of this pricing model, and that alone pushes me to use alternative libraries [1]. As an aside, the library I'm thinking of is based on zxing-js and seems to have perfectly acceptable performance scanning the bar codes in your OSS comparison [2] using the demo implementation [3] (it scans bar codes despite the name being "html5-qrcode").

For a library like this, the only really appealing pricing model is a flat fee that buys unlimited use of the version I buy in perpetuity (even if that fee is several thousand dollars). I'm not, however, going to pay monthly forever for code that (1) you wrote once and is now static, (2) that you aren't hosting, and (3) that I need to integrate into my application myself. That feels like paying my car manufacturer a monthly subscription for heated seats [4].

If I need support I will happily pay you a support fee. Moreover, I cannot take on the liability of my app's bar code scanner ceasing working because you decide to close your doors and then the license won't reactivate.

[1]: https://github.com/mebjas/html5-qrcode

[2]: https://strich.io/comparison-with-oss.html

[3]: https://blog.minhazav.dev/research/html5-qrcode

[4]: https://www.theverge.com/2022/7/12/23204950/bmw-subscription...

alex_suzuki · on Aug 17, 2023

Fair enough. The library you are using is based on ZXing and if it works well for you, then great. I did not manage to scan those damaged codes using ZXing but perhaps I did not fully explore all possible tweaks.

It's a bit disingenuous to claim that I wrote the code once and it is now static (I wish!), but I see that you are not a fan of aubscription models and I understand that.

charukiewicz · on Aug 17, 2023

> It's a bit disingenuous to claim that I wrote the code once and it is now static

Unless the library code were in some way self-updating or had some server side component, I think it's disingenuous to call the library code embedded in my app anything other than static. I assume by "not static" you are referring to decision to continue improving it, which I think is great and affords you the ability to sell me a newer version.

alex_suzuki · on Aug 17, 2023

The problem is that unlike „server-side computing“ the web browser and also mobile platforms like iOS and Android move pretty fast. Things change, and things break. Having a „stable version“ is great until a browser vendor decides to update/break an API an the library breaks. Now you have feature updates and hotfixes for an infinite number of „stable“ versions.

charukiewicz · on Aug 17, 2023

> If you want a much clearer separation between frontend and backend of a site, especially in terms of contributors/teams, then it might not be the right tool.

The point to take away from htmx and hypermedia more broadly is that there isn't a clean separation between the front end and back end of a site. The concept of completely separate front and back end teams building SPAs/JSON APIs has been a very costly development that has brought substantial complexity that is wholly unnecessary for most applications.

Your banking/project management/todo list/budgeting/insurance/education/whatever app almost certainly doesn't need to be an SPA and would be developed faster or for less money if it leaned into server side rendering (SSR) and used a library like htmx or Stimulus to enhance the user experience as needed.

charukiewicz · on June 20, 2023

Correct. Elm's publicity issues stem from how closed off the leadership is. There's Evan (creator) and a handful of trusted others that have a huge amount of sway over how the language and, arguably more importantly, the core libraries develop. They also get special access to write libraries that rely on native JS code (something you can't do as of 0.19 even locally in your own projects).

Elm is a great language and I've written extensively[1] about how nice it is to use in production. But if you want to get involved in the community, it feels like there's not much to get involved in. The fact that even on the Elm Discourse[2], posts auto-lock after 10 days means that the forum is now pretty dead compared to what it used to be in 2018-2019.

[1] https://charukiewi.cz/posts/elm/

[2] https://discourse.elm-lang.org/

charukiewicz · on June 20, 2023

> Librem 5 fails #5

Where's the Librem 5 antenna? Can you link to reports of reception issues stemming from this?

charukiewicz · on May 23, 2023

The default settings append a number at the end, so that's 2.7e+14. Moreover, there's a separator character that may or may not be there, and may or may not be changed from the default. Furthermore, I don't know why the "jargon" list isn't included by default, which adds 8,800 relatively common words (e.g. born, advice, engine, perspective). So with the digit at the end and all other default settings, that's 11000^4*10 = 1.46e+17.

charukiewicz · on March 21, 2021

> If you're a software engineer, it won't make a difference, just send two queries to the DB and combine the results before sending them off.

This isn't true if you have certain use cases in your application, pagination being one of them. There's no simple way to implement pagination when you have two or more queries that return and unknown number of results and you're tasked with maintaining a consistent order across page changes.

If you make a single query do all the work, it's easy to implement paging in any number of ways, the most performant being to filter by the last id the client saw. If your users aren't likely to paginate too far into the data, LIMIT and OFFSET will work fine as well.

charukiewicz · on March 21, 2021

Author here. Between this and your other response, where you expound on the same point, I think you're being far too hand wavy about what causes performance issues. The number of joins alone doesn't have much to do with the performance characteristics of any query.

What's more important for performance of queries on larger data sets than the number of joins is that there are indexes and that the query is written in a way that can utilize indexes to avoid multiplicative slowdowns. The reason the UNION query is fast is because the query on either side effectively utilizes the indexes so that the database engine can limit the number of rows immediately, rather than filter them out after multiplying them all together. I can expand this schema to have a UNION query with two 10-table joins and it would still perform better than the 7 table query.

I think someone new to SQL is likely to read your statement and think "okay joins are slow so I guess I should avoid joins". This is not true and this belief that joins are slow leads people down the path that ends at "SQL just doesn't scale" and "let's build a complicated Redis-backed caching layer".

SQL performance is a complex topic. The point of our post was to illustrate that a UNION query can simplify how your join your tables and allow you to write constituent queries that have better performance characteristics. Morphing this into "the number of joins is smaller so the performance is better" is just incorrect.

kroolik · on March 21, 2021

I think it's good to note in the article that the second query is slow because the RDBMS doesn't use indexes (and which ones). Currently, the text is hand waving the problem and moves on.

If the article had, instead, listed indexes, shown they were used in simple cases, shown they weren't used in the second query, dug into why they weren't (maybe they were but it was still hella slow) - that would be a ton of value!

slt2021 · on March 21, 2021

Sorry, if I was hand-wavy. I was trying to give other people a simple framework that I use myself (Big O calculated as a number of rows in each table). ALthough I dont know how many rows you have in each table, Big O frameworks still works, because cartesian of tables is being dominated by two huge different datasets (customers's orders being joined to employees' orders). The Union simply calcualtes them separately, rather than doing cartesian of two completely different datasets that represent different entities.

Well written predicates and indexes can help, as well as poorly written predicates make it worse. So there is balance. This is not a shortcut or a silver bullet, it is a trade-off being made. More indexes->faster selects and slower updates/inserts. One bad index=>failed insert and possible losing customer data (happened to me once)

I agree with you that "SQL performance is a complex topic." and one should definitely study query Execution Plan to understand the bottlenecks and make optimization decisions

aidos · on March 21, 2021

Sql queries never really adhere to that simple Big O analysis though. Sure, if you had zero indexes so every operation was nested sequential scans, then it’s Cartesian, but that’s never the case. Most often you get really fast index lookups, by design.

slt2021 · on March 21, 2021

I invite you to share execution plans of queries #2 and #4 with the public so that people can decide what is actually slowing down, whether itnis realy cartesian or anything else

charukiewicz · on March 20, 2021

Author of the original article here. Temporary tables are different than using WITH (which are common table expressions, or CTEs). In many database engines, can make a temporary table that will persist for a single session. The syntax is the same as table creation, it just starts with CREATE TEMPORARY TABLE ....

More info in the PostgreSQL docs [1]:

> If specified, the table is created as a temporary table. Temporary tables are automatically dropped at the end of a session, or optionally at the end of the current transaction (see ON COMMIT below). Existing permanent tables with the same name are not visible to the current session while the temporary table exists, unless they are referenced with schema-qualified names. Any indexes created on a temporary table are automatically temporary as well.

[1]: https://www.postgresql.org/docs/13/sql-createtable.html

ewalk153 · on March 21, 2021

If this is a database for reporting, using a temporary table is probably fine and a union all wouldn’t concern me.

On Mysql, using a union all creates a temp table which can perform catastrophically under database load. I’ve seen a union all query with zero rows in the second half render a database server unresponsive when the database was under high load, causing a service disruption. We ended rewriting the union all query as two database fetches and have not seen a single problem in that area since.

I was shocked by this union all behavior, but it is apparently a well known thing on MySQL.

I can’t speak to Postgres behavior for this kind of query.

jeffbee · on March 21, 2021

Yeah, you _really_ have to watch out for this. I once spent months chasing down a serious but rare performance problem in a large-scale mysql 5.6 deployment, which was eventually root-caused to kernel slab reclaim pressure, caused by very high XFS metadata mutation rate, caused by MySQL creating an on-disk temporary ISAM file for every query with a union, which was most of the traffic.

charukiewicz · on March 21, 2021

In the past we worked on a system that used MySQL 8. We used UNION (not UNION ALL, but I assume it doesn't matter) in several places, applying it to improve performance as we described in the article. There were definitely cases in the system where one side of the UNION would return zero rows, but we never ran into any of the types of issues you're describing.

charukiewicz · on March 20, 2021

Author here. This doesn't give the correct results. It produces meal_items that have both customer_id and employee_id. Here's an excerpt (the full result set is thousands of rows, as opposed to the expected 45):

    id |label       |price|employee_id|customer_id|
    ---|------------|-----|-----------|-----------|
    ...
    344|Poke        | 4.18|       3772|      13204|
    344|Poke        | 4.18|       3313|      13204|
    344|Poke        | 4.18|       2320|      13204|
    344|Poke        | 4.18|        632|      13204|
    344|Poke        | 4.18|       4264|      13204|
    344|Poke        | 4.18|        699|      13204|
    344|Poke        | 4.18|       1070|      13204|
    344|Poke        | 4.18|       3022|      13204|
    344|Poke        | 4.18|       1501|      13204|
    344|Poke        | 4.18|        808|      13204|
    344|Poke        | 4.18|       2793|      13204|
    344|Poke        | 4.18|       1660|      13204|
    344|Poke        | 4.18|        932|      13204|
    ...

To be clear, there are ways to write this query without UNION that have both good performance and give the correct results, but they're very fiddly and harder to reason about that just writing the two comparatively simple queries and then mashing the results together.

balfirevic · on March 20, 2021

> Author here. This doesn't give the correct results.

Right you are! :) Should have given it more thought before posting. Thanks for jumping in.

charukiewicz · on March 20, 2021

Author here, you are indeed correct that Query #2's final join can be an INNER join. However, I just tested it against our test data set and it makes no impact on the performance.

tech2 · on March 21, 2021

Ah well, thanks for humouring me. Good thing those query optimisers do the right thing on occasion.