If you can't tell then I'm not sure what more needs to be said. I took a look through the commit history and it was glaringly obvious to me.
To trust something like data-storage to vibe-coded nonsense is incredibly irresponsible. To promote it is even moreso. I'm just surprised you can't tell, too.
I don't know about trash, but this post, this repo and even their comments on this thread are blatantly written by an AI. If you still need to ask for evidence, consider that you might be AI-blind.
this is not vibe coded project, this is developed by understanding sqlite code. Have you ever looked into examples ? Have you checked the code ? Now my post got flagged. and If I use AI to understand code than what is wrong with that ? what is the use of AI ? to make person more productive, right ?
You've been copying and pasting directly from Claude to reply to comments that ask how this works. You also realise you've been caught and are now replying in a completely different style.
I’d imagine there’s an extremely long tail of features and quirks that will take time to iron out even after SQL compatibility is achieved. Looks like it’s still missing some important features like savepoints (!!!), windows and attach database.
I’d be more excited and imagine it would be more marketable if it focused instead on being simply an embedded sql db that allowed multiple writers (for example), or some other use case where SQLite falls short. DuckDB is an example- SQLite but for olap.
I stumbled on the lock page myself when I was experimenting with writing a sqlite vfs. It's been years since I abandoned the project so I don't recall much including why I was using the sqlitePager but I do recall the lockpage being one of the first things I found where I needed to skip sending page 262145 w/ 4096 byte pages to the pager when attempting to write the metadata for a 1TB database.
I'm surprised they didn't have any extreme tests with a lot of data that would've found this earlier. Though achieving the reliability and test coverage of sqlite is a tough task. Does make the beta label very appropriate.
I think that async support for multi-read(and write) are part of the reason for the separate Turso library in Rust over the C fork (libSQL). I also wouldn't be surprised if baking in better support for replication was a design goal as well. Being file-format compatible with SQLite is really useful as well.
In the end, the company is a distributed DBaaS provider using a SQLite interface to do so... this furthers that goal... being SQLite compatible in terms of final file structure just eases backup/recovery and duplication options.
I think being able to self-host either in an open/free and commercial/paid setting is also going to be important to potential users and customers... I'm not going to comment on the marketing spin, as it is definitely that.
You're only file format compatible if you don't use any of the Turso extensions.
Just like STRICT tables, as soon as you use an unsupported feature in your schema, your database becomes incompatible.
With STRICT tables you needed to upgrade SQLite tools.
But if you use something from Turso you're placing yourself outside the ecosystem: the SQLite CLI no longer works, Litestream doesn't work, sqlite_rsync doesn't work, recovery tools don't work, SQLite UIs don't work.
Turso has no qualms with splitting the ecosystem. They consider themselves the next evolution of SQLite. The question is do you want to be part of it?
Maybe. I don't think having parallel divergence is inherently bad. DuckDB doesn't replace SQLite...
But depending on the need, I'm probably more inclined to reach for something like Turso today than Firebird of I want something I can embed and connect to a server to sync against, for example.
IIRC, multiple writers in SQLite is supported, their writes will just be serialized. What you don't have is concurrent writes. But, given SQLite writes are so fast, in practices it's not really a big deal.
If you haven't used SQLite in a real project with heavy writes, I'd say you do it. SQLite is WAY more powerful than people tend to think of it.
I've been using a sqlite alternative to avoid dependencies on a native library. It's go application that uses a native go sqlite reimplementation so i can create platform specific binaries that include all dependencies. Makes installation easier and more reliable.
modernc.org/sqlite is upstream SQLite, compiled to Go using ccgo. Spiritually similar to, say, a WASM build of SQLite. Not a separate reimplementation.
I’ve been using it locally and with their hosted offering for awhile now and it’s rock solid other than if I make super deeply nested joins which overflow something. But other that that it’s super fast and cheap I haven’t had to need more than the free tier with a bunch of stuff I host on cloudflare workers
Fuck, my wife got a notice that she would have to increase her iCloud storage so last week began the process of ordering a backup of all her pictures so I could get them off iCloud and organized on some drives at home. We got 12 zips of the pictures along with csv's and some metadata, and I literally just finished iterating on the script to sort them into year-based folders and convert all the HEIC shit into JPG. It's running literally right now.
It’s too bad tech seems so much to take away this kind of configurability in the name of “we know better”. There’s so much to be said for software that can last so long, as opposed to the constant treadmill of forced updates.
Fuck gnome eternally for destroying gtk and fuck Wayland.
This is disingenuous and probably was written this way for HN cred and clicks. Sqlite's test suite simulates just about every kind of failure you can imagine - this document is worth reading if you have any doubts: https://www.sqlite.org/atomiccommit.html
> Sqlite's test suite simulates just about every kind of failure you can imagine
The page you link even mentions scenarios they know about that do happen and that they still assume won't happen. So even sqlite doesn't make anywhere near as strong a claim as you make.
> SQLite assumes that the operating system will buffer writes and that a write request will return before data has actually been stored in the mass storage device. SQLite further assumes that write operations will be reordered by the operating system. For this reason, SQLite does a "flush" or "fsync" operation at key points. SQLite assumes that the flush or fsync will not return until all pending write operations for the file that is being flushed have completed. We are told that the flush and fsync primitives are broken on some versions of Windows and Linux. This is unfortunate. It opens SQLite up to the possibility of database corruption following a power loss in the middle of a commit. However, there is nothing that SQLite can do to test for or remedy the situation. SQLite assumes that the operating system that it is running on works as advertised. If that is not quite the case, well then hopefully you will not lose power too often.
There was a time that Oracle databases used raw disk partitions to minimize the influence of the OS in what happens between memory and storage. It was more for multiple instances looking at the same SCSI device (Oracle Parallel Server).
> So even sqlite doesn't make anywhere near as strong a claim as you make.
And? If you write to a disk and later this disk is missing, you don't have durability. SQLite cannot automatically help you to commit your writes to a satellite for durability against species ending event on Earth, and hence its "durability" has limits exactly as spelled out by them.
You're arguing a strawman and I pointed at a specific example. Sticking with my specific example they could probe for this behavior or this OS version and crash immediately, telling the user to update their OS. Instead it seems they acknowledge this issue exists and they hope it doesn't happen. Which hey everybody does but that's not the claim OP was making.
It’s not really a libraries job to cover all bases like you’re suggesting. They outline the failure scenarios fairly well and users are expected to take note.
If you're talking to a 100KB SQLite database file this kind of thing is likely unnecessary, just opening and closing a connection for each query is probably fine.
If you're querying a multi-GB SQLite database there are things like per-connection caches that may benefit from a connection pool.
> What is an asyncio SQLite connection anyways? Isn’t it just a regular one that gets hucked into a separate thread?
Basically yes - aiosqlite works by opening each connection in a dedicated thread and then sending async queries to it and waiting for a response that gets sent to a Future.
That's even crazier - so you're using asyncio because you have a ton of slow network-bound stuff - but for your database access you are running every sqlite connection in it's own thread and just managing those threads via the asyncio event loop?
Thread pooling for databases, whether network based, or disk based, is common. A lot of times it will be baked into your client, so the fact that you think it’s crazy means you’ve only dealt with clients that did this for you.
For really large data sets, you can query and wait a few minutes before getting a result. Do you really want to await that?
SQLite doesn't have a separate server process; it does all of the work for queries in your process. So it's intrinsically CPU-heavy, and it needs threads to avoid blocking the event loop.
One way to look at is that with a client-server database and an async client library, you have a thread pool in the database server process to do the heavy lifting, and async clients talk to it via TCP. With SQLite, you have that "server" thread pool in the same process instead, and async "clients" talk to it via in-process communication.
Python's asyncio is single threaded. If you didn't send them into a different thread, the entire event loop would block, and it would degenerate to a fully synchronous single threaded program with additional overhead.
In which case SQLite is probably the wrong tool for the job, and you should be using Postgres or MySQL that is actually designed from the ground up for lots of concurrent connections.
SQLite is amazing. I love SQLite. But I love it for single-user single-machine scenarios. Not multi-user. Not over a network.
I didn't say anything about concurrent access. SQLite with WAL mode is fine for that these days for dozens of concurrent readers/writers (OK only one writer gets to write at a time, but if your writes queue for 1-2ms who cares?) - if you're dealing with hundreds or thousands over a network then yeah, use a server-based database engine.
Multi GB is tiny, but that doesn't make SQLite magically better at large queries of multi GB databases. That's why DuckDB has been getting more popular.
Sqlite != DuckDB... two totally different DB types. One is a row based, the other is a column based database. Both run different workloads and both can handle extreme heavy workloads.
Yes, that's the point I'm making. If SQLite didn't ever struggle with databases in the GB ranges, then there wouldn't be much call to replace it with DuckDB. The fact that there's significant value in an OLAP RDBMS suggests that SQLite is falling short.
The problem is not SQLite struggling with databases in GB range. It does that with ease. OLAP requires a different database structure, namely column storage (preferably with compacting / compression / and other algorithms).
That is DuckDB its selling point. You want data analyzing, you go DuckDB. You want oltp you go SQLite. Or combine both if you need both.
Even postgres struggles with OLAP dataloads, and that is why we have solutions like TimescaleDB / postgres plugin. That ironically uses postgres rows but then packs information as column into columns row fields.
That does not mean that postgres is flawed working with big data. Same with Sqlite... Different data has different needs, and has nothing to do with database sizes.
I always had troubles having multiple processes get write access to the sqlite file. For example if I have node.js backend work with that file, and I try to access the file with different tool (adminer for example) it fails (file in use or something like that). Should it work? I don't know if I'm doing something wrong, but this is my experience with multiple projects.
- sqlite is a bit like a RWLocked database either any number or readers xor exactly one writer and no readers
- but with WAL mode enabled readers and writers (mostly) don't block each other, i.e. you can have any number of readers and up to one writer (so normally you want WAL mode if there is any concurrent access)
- if a transaction (including implicit by a single command without "begin", or e.g. upgrading from a read to a write transaction) is taking too long due to a different processes write transaction blocking it SQLITE_BUSY might be returned.
- in addition file locks might be used by SQL bindings or similar to prevent multi application access, normally you wouldn't expect that but given that sqlite had a OPEN_EXCLUSIVE option in the past (which should be ignored by half way modern impl. of it) I wouldn't be surprised to find that.
- your file system might also prevent concurrent access to sqlite db files, this is a super obscure niche case but I have seen it once (in a shared server, network filesystem(??) context, probably because sqlite really doesn't like network filesystems often having unreliable implementations for some of the primitives sqlite needs for proper synchronization)
as other comments pointed out enabling WAL mode will (probably) fix your issues
Your throughput will be much worse than a single process, but it's possible, and sometimes convenient. Maybe something in your stack is trying to hold open a writable connection in both processes?
Those are a nasty trap. The solution is non-obvious: you have to use BEGIN IMMEDIATE on any transaction that performs at least one write: https://simonwillison.net/tags/sqlite-busy/
This is correct, and one of the things I tell anybody who is considering using SQLite to watch out for. The busy timeout and deferred write transactions interact in a really non intuitive way, and you have to use BEGIN IMMEDIATE for any transaction that performs any writes at all, otherwise SQLite gives up and throws an error without waiting if another traction is writing when your traction attempts to upgrade from a read to a write.
Thanks for the direction. I thought SQLite was limited in how multiple processes can access the db files, but now I see the problem is on my end. Btw. I'm a fan of your AI/LLM articles, thanks for your awesome work.
An average human being can produce around 650MB of text during a while work lifetime when doing nothing but write text 4 hours per weekday without any interruptions.
Saying multi gigabyte databases for single user usage is the norm feels insane to me.
SQLite is a great database for organizing data in desktop applications, including both productivity software and even video games. It's certainly not at all unreasonable for those use cases to have files that are in the low GB and I would much rather use SQLite to process that data instead of bundling MySQL or Postgres into my application.
It's a bit re-inventing the wheel, since solving all the problems that come with network access is precisely why those databases exist, and what they've already done.
asyncpg is a nice python library for postgres.
I think postgres releasing a nice linkable, "serverless" library would be pretty amazing, to make the need for abusing sqlite like this (I do it too) go away.
Postgres has really not solved problems that come with being a networked server and will collapse under concurrent connections far before you start to feel it with SQLite. 5000 concurrent connections will already start to deadlock your Postgres server; each new connection in Postgres is a new Postgres process and the state for the connection needs to be written to various internal tracking tables. It has a huge amount of overhead; connection pooling in PG is required and often the total system has a rather low fixed limit compared to idk, writing 200 lines of python code or whatever and getting orders of magnitude more connections out of a single machine.
A connection definitely has overhead in PG, but "5000 concurrent connections will already start to deadlock your Postgres server" is bogus. People completely routinely run with more connections.
Check the throughput graphs from this blog post from 2020 (for improvements I made to connection scalability):
That's for read-mostly work. If you do write very intensely, you're going to see more contention earlier. But that's way way worse with sqlite, due to its single writer model.
Yeah, I think I'm conflating our fear of >5000 connections for our Postgres workload (read-write that is quite write heavy) with our SQLite workload, which is 99.9% read.
The way our SQLite workload works is that we have a pool of hundreds of read connections per DB file, and a single writer thread per DB file that keeps the DB up to date via CDC from Postgres; basically using SQLite as a secondary index "scale out" over data primarily written to Postgres. Because we're piping Postgres replication slot -> SQLite, we don't suffer any writer concurrency and throughput is fine to keep up with the change rate so far. Our biggest bottleneck is reading the replication slot on the Postgres side into Kafka with Debezium.
Postgres will shit itself without a connection pooling proxy server like PGBouncer if you try even like 5000 concurrent connections because Postgres spawned a UNIX process per inbound connection. There’s much more overhead per connection in Postgres than SQLite!
Likewise MySQL will shit itself with just a couple hundred connections unless you have a massive instance size. We use AWS' RDS proxy in front for a similar solution. I've spent way too many hours tuning pool sizes, resolving connection pinning issues...
A connection pool is absolutely a best practice. One of the biggest benefits is managing a cache of prepared statements, the page cache, etc. Maybe you have temp tables or temp triggers too.
Even better is to have separate pools for the writer connection and readers in WAL mode. Then you can cache write relevant statements only once. I am skeptical about a dedicated thread per call because that seems like it would add a bunch of latency.
It is not that strange when you consider the history. You see, as we started to move away from generated HTML into rich browser applications, we started to need minimal direct DBMS features to serve the rich application. At first, few functions were exposed as "REST APIs". But soon enough those few featured turned into full-on DBMSes, resulting in a DMBS in front of a DBMS. But then people, rightfully, started asking: "Why are we putting a DBMS in front of a DBMS?"
The trouble is that nobody took a step back and asked: "Can we simply use the backing DBMS?" Instead, they trudged forward with "Let's get rid of the backing DBMS and embed the database engine into our own DBMS!" And since SQLite is a convenient database engine...
Is it? It was designed for damage control system on naval combat vessels. I have no idea what it does on a naval vessel, but I imagine there is certain level of safeness.
reply