krilovsky's comments

krilovsky · on Aug 25, 2024

You're assuming here that mmap is only used for writing, where TFA is actually describing a read-only scenario, in which case EIO is synchronous as the read can't be completed.

As for the triviality of writing a SIGBUS handler correctly, that is an oversimplification at best. I/O errors aren't always fatal, even in the write case, and handling SIGBUS in the way you describe wouldn't work when e.g. you're also out of file descriptors, or when the cause of SIGBUS isn't due to an I/O error. So what works for 95% of your usecases doesn't necessarily apply to the 95% of other people's usecases.

jcalvinowens · on Aug 25, 2024

The point is the same for reads: the vast majority of usecases just immediately abort() when a read fails. Writing byzantine fault logic to deal with broken storage media is like trying to recover from SIGSEGV, it's almost never a good idea.

> I/O errors aren't always fatal, even in the write case

Linux will not return -EIO unless the disk is in an unrecoverable state. Generally the assumption is that userspace will treat -EIO as fatal, so the kernel won't return it unless it's truly hosed. Sometimes the error is specific to a file, but that's the far less common case in practice.

> e.g. you're also out of file descriptors,

ENFILE is easy to deal with in a fatal path, by closing stdin so fd #0 can be reused (you're about to call abort(), you don't need it anymore). Try again :)

> or when the cause of SIGBUS isn't due to an I/O error.

It's either -EIO, or it's I/O beyond EOF. The second thing is a bug equivalent to a buffer overrun. That's synchronous, you can handle it just like you handle SIGSEGV if you want to emit more debugging or even write byzantine recovery logic.

krilovsky · on Aug 25, 2024

> Generally the assumption is that userspace will treat -EIO as fatal

A single bad disk doesn't make the situation fatal (unless that's the only disk in your system, in which case you're not even guaranteed to have your signal handler code in memory).

> ENFILE is easy to deal with in a fatal path, by closing stdin so fd #0 can be reused

That's assuming you have stdin open, which again, may work for 95% of your usecases, but isn't universal.

> It's either -EIO, or it's writing beyond EOF

That's an unfounded statement. A quick search of the kernel code will show that there are other reasons for getting a SIGBUS, which are unrelated to mmap (non-disk hardware failures, certain CPU exceptions, to name a few). So yeah, if you know that apart from the disk (or filesystem, at any rate) your hardware is in order, and that the only reason for SIGBUS could be a failed I/O through a memory mapped file, and you know that all of the code in your process is well behaved, writing a SIGBUS handler that terminates the process with a message indicating an mmap I/O error might be reasonable, but that's not the reality for every process, and likely not even 95% of processes.

Regardless, my main point wasn't that lack of file descriptors makes your suggestion problematic, but that your description of it as trivial is an oversimplification at best. mmap has its uses (as does writing a SIGBUS handler to deal with errors), but that doesn't mean that it doesn't have issues. Highlighting them doesn't mean that plain read/write are perfect and free from issues either, and certainly code that isn't ready to deal with EIO will have a bad time when a VFS operation fails. But there are cases where making I/O explicit is better, and I'm not sure why you seem to be making blanket statements that trivialise the issues with mmap.

jcalvinowens · on Aug 25, 2024

> A single bad disk doesn't make the situation fatal (unless that's the only disk in your system, in which case you're not even guaranteed to have your signal handler code in memory).

Yes it does. Your point about signal handlers is why I'm right, that's beyond the point where you can expect the machine to function in a sane way. Trying to recover is often actively harmful.

> That's assuming you have stdin open, which again, may work for 95% of your usecases, but isn't universal.

If you've hit EMFILE, you absolutely have some FD which you can sacrifice to collect debug info, is my point. If you don't you can reserve one a priori, this isn't that hard to deal with.

> writing a SIGBUS handler that terminates the process with a message indicating an mmap I/O error might be reasonable, but that's not the reality for every process, and likely not even 95% of processes.

You're completely wrong here: you've invented an ambiguity that does not exist. Take a look at the manpage for sigaction(), and you'll see that all the non-I/O cases you mention are independently identifiable via members of the siginfo_t struct passed to your SIGBUS handler (just like the I/O cases).

> but that your description of it as trivial is an oversimplification at best.

I'm not oversimplifying: you're spewing unfounded FUD about the mmap() interface, and I'm telling you that none of these details matter for 95% of usecases.

krilovsky · on Aug 25, 2024

> Yes it does. Your point about signal handlers is why I'm right

If it's not a single disk system, not necessarily. To give you a concrete example: a process that writes logs to a disk dedicated for log collection can simply ignore an EIO/ENOSPC if logging isn't its main task. It can't easily recover from a SIGBUS in that scenario though.

> If you've hit EMFILE, you absolutely have some FD which you can sacrifice to collect debug info, is my point. If you don't you can reserve one a priori, this isn't that hard to deal with.

I'm not sure why you keep sticking to this example, when I already said that it was just an example of another detail that you need to take into account when implementing a SIGBUS handler. Sure, you can open /proc/self/maps a-priori and side-step the issue, but that's another detail that you need to take into account (and that you didn't mention until I brought it up). I never said that it was hard, only that writing a proper handler that deals with the edge cases isn't as trivial as you claim.

> you've invented an ambiguity that does not exist [...] you'll see that all the non-I/O cases you mention are independently identifiable via members of the siginfo_t struct

I'm not sure what's the ambiguity that you're claiming that I've invented. Yes, some of the specific examples that I gave (specifically CPU exceptions) are identifiable if you already know the details, but not all of them: non-disk faults can still result in SIGBUS with BUS_ADRERR, so that alone isn't enough to identify EIO errors or EOF coming from memory-mapped files, and I know that from personal experience debugging SIGBUS crashes.

> you're spewing unfounded FUD about the mmap() interface

I don't know where this is coming from. I never said that using mmap is bad or that it's impossible to write a SIGBUS handler to output debug info before crashing. I merely pointed out that it's not necessarily trivial, as there are details that should be taken care of, and that it may not in fact be suitable for 95% of usecases as you claimed.

You have a mental model of an ideal system which either can't recover from I/O errors, or doesn't get SIGBUS for reasons other than EIO or reading beyond EOF. I'm trying to tell you that not every system is like that, and that while mmap is useful, there are cases where explicit I/O is better suited for the task, and that your 95% might not be everyone's 95%. If you see FUD in simple facts, then I'm sorry, but I see no point in continuing this discussion.

jcalvinowens · on Aug 25, 2024

> If it's not a single disk system, not necessarily.

Again, you miss the point. 95%+ of Linux systems are single disk. That's the expected case.

>> If you've hit EMFILE

> I'm not sure why you keep sticking to this example

You brought this up initially, saying it was difficult to handle. I'm demonstrating that you're wrong, it's actually quite trivial to handle. Handwaving about "edge cases" is FUD, if you have some specific point to make then make it

> I'm not sure what's the ambiguity that you're claiming that I've invented.

You claimed it wasn't possible to be sure SIGBUS is from an I/O error. That's wrong.

> non-disk faults can still result in SIGBUS with BUS_ADRERR, so that alone isn't enough to identify EIO errors or EOF coming from memory-mapped files

Wrong. You can resolve that ambiguity from the cited address and si_errno etc. Try it next time.

> I'm trying to tell you that not every system is like that.

The fact you think I need to be told that is amusing. You're completely missing the point.

Let me try one more time:

Don't make things hard when they don't have to be. 95% of the time, they don't have to be. Saying "no, this is actually really hard, and you need to care about these normally irrelevant things" without first acknowledging the simple case is FUD in my book.

krilovsky · on Aug 25, 2024

> Again, you miss the point. 95%+ of Linux systems are single disk. That's the expected case.

I specifically added ENOSPC as an example that's relevant on single disk systems as well.

Regardless, I thought we were talking about 95% of usecases in relation to implementations, not runtime systems, but even if we're talking about runtime systems, I'm not sure where you're pulling that 95% number from (or why you felt the need to add a plus sign this time around). That may be true for personal computers, but most Linux systems are servers, which generally aren't deployed in a single disk configuration.

> You brought this up initially, saying it was difficult to handle

I didn't say anything about difficulty. I only said that it wasn't trivial as you made it out to be, which isn't the same thing. Also, when I initially brought it up all I said was that in the FD exhaustion case it wouldn't work in the way you described in the comment that I responded to.

> You claimed it wasn't possible to be sure SIGBUS is from an I/O error. That's wrong.

I didn't. All I said in response to your claim of "It's either -EIO, or it's writing beyond EOF" was that there are other reasons for getting a SIGBUS. Moreover, I actually said (in the same paragraph), that if you know that a SIGBUS is caused by an I/O error, and that all of the code in your process is well-behaved (and by that I meant that terminating it with an abort() wouldn't cause side-effects due to e.g. atexit() handlers not running), using mmap with a SIGBUS handler might be reasonable.

> Wrong. You can resolve that ambiguity from the cited address. Try it next time.

First you claimed that I invented an ambiguity that doesn't exist, and that SIGBUS causes can be identifiable if I just read the sigaction(7) manpage. Now you say that there is an ambiguity, but that it can be resolved using the address, so which is it? [0]

I never said that using mmap is impossible, or even hard (and definitely not "this is actually really hard"). I actually agreed that in some cases it might be reasonable to do it with a SIGBUS handler. All I did say was that it isn't trivial to deal with errors, and that the 95% figure might be true for your usecases, but that it doesn't necessarily apply to other people's usecases.

The only one who said that something was "hard" during this discussion was you.

I get it, it's easier to attack the strawman rather than respond to my comments. I'm just not sure why you think it has anything to do with what I said.

[0] EDIT: I now see that you edited the sentence I quoted to say "from the cited address and si_errno etc.". It might surprise you to learn that si_errno is almost never set in Linux (the manpage is actually explicit about it with "si_errno is generally unused in Linux"), and definitely not in mmap-related SIGBUS coming from memory mapped files. I have no idea why you added this remark telling me that I should try it, when you clearly didn't.

jcalvinowens · on Aug 25, 2024

> I have no idea why you added this remark telling me that I should try it, when you clearly didn't.

You are hilariously hostile here, I don't get it. si_errno is the second field in the struct after si_signo, saying "si_errno etc." is obviously in reference to the rest of the fields in the structure...

krilovsky · on Aug 25, 2024

> You are hilariously hostile here, I don't get it.

I apologise if it came out hostile. That was not my intention. I was in a bit of hurry when I made the edit, and I just trying to expand my comment in response to your edit, and explain that non-I/O and non-disk SIGBUS errors sometimes look exactly like disk and filesystem errors that return EIO (not just signum being SIGBUS, but also si_code being set to BUS_ADRERR, etc.), so looking at the siginfo_t fields alone wouldn't be enough to diambiguate.

Then there's the address field, which can be probably be used in combination with parsing /proc/self/maps, but my point in that comment was that the information on the manpage alone wouldn't have helped people trying to implement a handler correctly.

In any case, I already described a scenario where crashing would be the wrong thing to do IMO, which you seemed to ignore. Even in scenarios where crashing is reasonable, I'm sure there's a solution for every edge case that I would bring up, but I never said that it was impossible, so I'm not sure why asking me to list every possible edge case is relevant when my point was just that there are edge cases, and that you'd need to consider them (and they would be different for different apps), thus making an implementation not trivial. That doesn't mean that it's necessarily difficult, just that it might be a more complex solution when compared to dealing with a failing VFS operation.

As it seems that we've reached an impasse, I'll just say that simplicity depends on the context and is sometimes a matter of personal taste. I don't have anything against mmap, and I was only trying to argue that there's a trade-off, but you are of course free to disagree and use mmap everywhere if that works for you.

I don't think I have anything more to add to what I already said, and I'm sorry again if you felt personally attacked, or that I had something against mmap and trying to spread FUD.

jcalvinowens · on Aug 25, 2024

> but most Linux systems are servers, which generally aren't deployed in a single disk configuration.

You are incorrect about that: most Linux servers in the world have one disk. Most servers are not storage servers.

> I didn't say anything about difficulty. I only said that it wasn't trivial as you made it out to be

...and I demonstrated by counterexample that you're wrong, it is trivial. If you think I'm missing some detail, you are free to explain it. You're just handwaving.

> First you claimed that I invented an ambiguity that doesn't exist, and that SIGBUS causes can be identifiable if I just read the sigaction(7) manpage. Now you say that there is an ambiguity, but that it can be resolved using the address, so which is it?

Both, obviously? If you only look at signo there's an "ambiguity", but with the rest of siginfo_t the "ambiguity" ceases to exist. There is no case where you cannot unambiguously handle -EIO in a mmap via SIGBUS.

You claimed that you could only use SIGBUS with mmap if you were sure there were no other sources of SIGBUS. Quoting you directly:

> So yeah, if you know that apart from the disk (or filesystem, at any rate) your hardware is in order, and that the only reason for SIGBUS could be a failed I/O through a memory mapped file, and you know that all of the code in your process is well behaved, writing a SIGBUS handler that terminates the process with a message indicating an mmap I/O error might be reasonable

That statement is completely wrong: you can always tell whether it came from the mmap or something else, by looking at the siginfo_t fields.

> and by that I meant that terminating it with an abort() wouldn't cause side-effects due to e.g. atexit() handlers not running

Any system that breaks if atexit() handlers don't run is fundamentally broken by design. There are a dozen reasons the process can die without running those.

> All I did say was that it isn't trivial to deal with errors

Yes, and that statement is wrong. Most of the time it is trivial, because you just call abort(). There is no possibly simpler error handling than printing a message and calling abort(). For 95% of the workloads running across the world on Linux, that is entirely sufficient.

It is very unusual to try to recover from I/O error, and most programmers who try are really shooting themselves in the foot without realizing it.

You're free to disagree obviously, but I'm directly refuting the points you're making. Calling it a "strawman" make you look really really silly.

krilovsky · on Aug 24, 2024

Yes, on POSIX systems you'd get a SIGBUS if the I/O fails or if there's no available physical memory to back the mapping.

krilovsky · on June 20, 2023

This is definitely not the first RISC-V SBC at this size (the Sipeed Nezha SBC[0] launched over two years ago based on the Allwinner D1, and the ARIES FIVEBerry[1] launched almost a month ago based on the Renesas RZ/Five). It's not even the first SBC with that specific SoC, as StarFive (the company behind the JH7110 SoC used by this SBC) launched the VisionFive 2 SBC[2] on KickStarter back in September, and Pine64 had the STAR64 since last year as well.

As for PoE support, the presence of the 4-pin header on the board suggests that it's optional, and requires the help of something like the PoE+ HAT[3], same as on the VisionFive 2 and the RPi.

[0] https://www.indiegogo.com/projects/nezha-your-first-64bit-ri...

[1] https://www.aries-embedded.com/evaluation-kit/cpu/rzfive-ren...

[2] https://www.starfivetech.com/en/site/boards

[3] https://www.raspberrypi.com/products/poe-plus-hat/

brucehoult · on June 21, 2023

If you want to talk about customers actually receiving stuff, rather than announcements or taking preorders, then the timing is:

- Nezha: late June / early July 2021

- VisionFive 2: February 2023

- Star64: May 2023

The PineTab-V also uses the JH7110. It was supposed to ship late May at the same time as the (very similar) A55-based PineTab2, but according to the company they found something they wanted to fix before shipping. Hopefully soon! My VF2 that arrived in February was supposed to have been in November, and the Star64 was scheduled to ship in December, so slips of a few months are just in the nature of the industry, especially at this time.

krilovsky · on June 21, 2023

No argument there, though I'm not sure why you feel that the delivery date is the relevant metric in the context of the Milk-V Mars which has only been announced and isn't even available for preorder yet.

brucehoult · on June 21, 2023

Because the delay between announcement or order-taking and receiving the thing can be highly variable, and having it in my hands and being able to use it is what is important to me.

The biggest delays are usually with a new SoC. Once the SoC is available, building yet one more circuit board is usually a pretty straightforward exercise, at least for the competent.

krilovsky · on Oct 9, 2021

Note: while it is based on MariaDB, it replaces InnoDB (the MariaDB/MySQL storage engine) with MyRocks (which is based on RocksDB), and as a consequence it is missing some features (such as foreign keys[1]) that prevent it from being usable in many applications.

[1] https://github.com/facebook/mysql-5.6/wiki/MyRocks-limitatio...

krilovsky · on April 29, 2020

From a cursory look, this is one of the most unsafe pieces of code I have seen, with complete disregard to memory alignment requirements and the lifetime of temporary objects passed as arguments to functions.

Definitely don't use this in production code.

jedimastert · on April 29, 2020

From the FAQ[0]

> Can it be used in Production?

> It might be better to try Cello out on a hobby project first. Cello does aim to be production ready, but because it is a hack it has its fair share of oddities and pitfalls, and if you are working in a team, or to a deadline, there is much better tooling, support and community for languages such as C++.

[0]: http://libcello.org/home

asveikau · on April 29, 2020

The author has already made it clear they consider "memory allocation by unaligned offset into temporary char[] cast into a struct pointer" to be a valid strategy, so frankly I'm not very interested in their opinions on whether it's production ready.

I've seen it on HN before, whenever this project gets mentioned. People who don't know much about C confuse it for a really cool thing you can do with C, as if it's just another legit library that you can pick up and use. It's a lot of undefined behavior. People have enough problems writing safe C as it is, and on top of this complaint about alignment and misuse of temporaries, this thing makes the problem worse in other ways too, removing the few safeguards that exist by treating everything as void* for instance.

krilovsky · on April 29, 2020

Well it shouldn't be used for toy projects either with all of that hackery that it pulls (even if you don't use atomics, violating alignment requirements alone means that your code can never make use of vector instructions).

The new code on GitHub makes it clear that this library is not about fat pointers, but about writing unsafe C in another language, powered by undefined behaviour and the glory of the C preprocessor.

If you want to have safe arrays in C, just use a proper library for that instead of going the route of abusing undefined behaviour. There are plenty of libraries to choose from: https://github.com/search?l=C&q=vector+array&type=Repositori... (not all of them are serious projects, so you should be careful when you pick one. I personally have been using this one: https://github.com/iscgar/cvec2 because I like the easy interface it provides and especially the fact that type safety is a top priority, but other may have different tastes and priorities).

krilovsky · on April 19, 2020

I don't know what the best prctices are now, but it used to be best practice to blow the CFG_AES_Only eFUSE when using bitstream protection, which prevents the loading of a bitstream which isn't authenticated, and thus foils this attack. If a manufacturer went to the trouble of encrypting the FPGA but then allowed loading of plaintext bitstreams they probably didn't really understand what they were doing.

Nokinside · on April 19, 2020

This attack breaks the encrypted and authenticated bitstream.

I thought that the title "A Full Break of the Bitstream Encryption of Xilinx 7-Series FPGAs" would give some information even for those who don't want to read the article before commenting. :)

krilovsky · on April 19, 2020

While I understand that without the proper context (knowing a bit about bitstream protection in the Xilinx 7-Series FPGAs) my comment may seem a bit obscure, I did read the paper.

As the sibling comment mentions, the attack requires programming a plaintext bitstream in order to perform the readout of the WBSTAR register after the automatic reset caused by the HMAC authentication failure. Blowing the CFG_AES_Only eFUSE prevents the loading of that plaintext readout bitstream and the first stage of the attack is thus foiled (preventing the second stage of the attack from taking place as well).

Nokinside · on April 19, 2020

That was the first attack. How about the second attack where they show how to encrypt a bitstream?

krilovsky · on April 19, 2020

See my reply in the sibling comment thread. Basically, the second attack is not possible without the first succeeding.

teraflop · on April 19, 2020

As the paper explains, the attack requires alternately tampering with the encrypted bitstream (to write one word of the decrypted data at a time to a non-volatile register) and then resetting the FPGA and loading a separate, attacker-created, unencrypted bitstream to read that register's contents.

I don't know enough about Xilinx FPGAs to definitively say whether setting the fuse that OP mentions would prevent the attack, but it seems plausible.

Nokinside · on April 19, 2020

Attack can be used to encrypt bitstreams also.

>3.4 Attack 2: Breaking Authenticity

>Therefore the attacker can encrypt an arbitrary bitstream by means of the FPGA as a decryption oracle. The valid HMAC tag can also be created by the attacker, as the HMAC key is part of the encrypted bitstream. Hence, the attacker can set his own HMAC key inside the encrypted bitstream and calculate the corresponding valid tag. Thus, the attacker is capable of creating a valid encrypted bitstream, meaning the authenticity of the bitstream is broken as well

krilovsky · on April 19, 2020

> 3.4 Attack 2: Breaking Authenticity

> With the first attack, the FPGA can be used to decrypt arbitrary blocks. Hence, it can also be seen as a decryption oracle. Thus,we can also use this oracle to encrypt a bitstream, as shown by Rizzo and Duong in [41], and generate a valid HMAC tag

This requires the first stage of the attack to succeed. If it fails and the FPGA cannot be used as a decryption oracle, there's no way to generate a valid encrypted bitstream with the technique outlined in the paper.

krilovsky · on Feb 12, 2020

Calling this open source is a bit of a stretch. At the heart of this phone there is a huge proprietary dependency (the Adafruit FONA module).

taneq · on Feb 12, 2020

It's hard to get any open source software finished, what with all the goat farming.

(https://pics.me.me/i-thought-using-loops-was-cheating-so-pro...)

pratio · on Feb 12, 2020

I was about to post the same link

pratio · on Feb 12, 2020

It goes opensource in my book maybe you will create an opensource alternative for the hardware. This sort of gatekeeping is maybe the reason why hobbyists avoid HN.

krilovsky · on Feb 12, 2020

Huh? I didn't say that wasn't a cool project. On the contrary, I think this is a great project even if it was all closed source.

Maybe I should have clarified that I just meant to highlight the fact it's almost impossible to understand what many of the devices around us are doing because of the dependency on fundamental building blocks which are closed source and have no open source alternatives.

I'm sorry if my comment came out offensive in any way. That wasn't my intention.

pratio · on Feb 12, 2020

I understand if that wasn’t your intention but i saw a cool project on HN and you gatekeeping. It seems the author open sourced everything they knew. Your comment didn’t mention any alternatives to open hardware or such. It felt condescending, something cool and hacked up.

I didn’t like the tone, hence the comment.

hellotomyrars · on Feb 12, 2020

I think your comment is pretty fair and not an indictment of the project. It's 100% worth mentioning.

clSTophEjUdRanu · on Feb 12, 2020

The problem with these open source phone projects is that all the modems use closed source drivers (I think).

Do you know a path forward to do this entirely with open source?

krilovsky · on Feb 12, 2020

Unfortunately I don't know of any project in the area of open source basebands. There was OsmocomBB for GSM, which used the TI Calypso chipset, but I'm not aware of any such efforts for 3G/4G.

fsh · on Feb 12, 2020

This module is basically just a 3G baseband chip on a board. By your definition, no circuit that contains a chip would be open source.

_-___________-_ · on Feb 12, 2020

"just a 3G baseband chip" does hide an enormous amount of complexity. The point is just that this design contains both non-open-source hardware and non-open-source software (running on the 3G baseband chip). It's still mighty cool!

krilovsky · on Feb 12, 2020

It's very cool indeed, even if it was all closed source :)

krilovsky · on Feb 12, 2020

Except it's not "just" a 3G baseband chip. That chip[0] runs a closed source software that even contains a Lua interpreter. That's a proprietary binary blob that I don't know what it does, so this is not something that I would call "open source" even if the controller is open source.

[0] https://simcom.ee/modules/wcdma-hspa/sim5320/

saalweachter · on Feb 12, 2020

Boy howdy, you are not going to like to hear what happens when those radio signals leave your cellular telephone.

krilovsky · on Dec 23, 2019

No, it should be easy to get the entry point since it's defined in the ELF file header. The author simply tried to look for the `_start` symbol, which failed because the binary doesn't contain any symbol information (also the entry point doesn't have to be named `_start`, it's just a convention).

krilovsky · on Dec 2, 2019

I'm not the author. I'm just fascinated with size optimisations and I thought that it was an interesting achievement. But yes, the simplest optimisation is to not write code that you don't need to.

I've had my fair share of x86 assembly programming, and I used to do all sorts of tricks back then as well (I vaguely remember calling DOS interrupts just to get multiple registers into the state I wanted them when speed didn't matter as much as size), but I don't think that I'd have been able to cramp a working chess program in such a small amount of bytes.

ksaj · on Dec 3, 2019

Agreed. It's smaller than a boot sector, yet does a whole lot more.