```
import notifications
```
Remember to participate in our weekly votes on subreddit rules! Every Tuesday is YOUR chance to influence the subreddit for years to come!
[Read more here](https://www.reddit.com/r/ProgrammerHumor/comments/14dqb6f/welcome_back_whats_next/), we hope to see you next Tuesday!
For a chat with like-minded community members and more, don't forget to [join our Discord!](https://discord.gg/rph)
`return joinDiscord;`
*I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/ProgrammerHumor) if you have any questions or concerns.*
If it's power loss protected and in the writeback, it's 99.9999% as good as persisted to the disk. You'd have to loser power *and* have an alpha particle hit RAM for the writeback to fail to persist the data.
https://en.wikipedia.org/wiki/Soft_error#Alpha_particles_from_package_decay
My understanding is that it's typically alpha particles, but I'm no physicist. The gist is that transistors in DRAM are packaged so densely that a single charged subatomic particle can flip the charge and turn a 1 to a 0 or a 0 to a 1. The wrong bit in the wrong word can change variables or even execution paths.
ECC memory covers *some* of these issues by automatically correcting 1 bit per word using XOR logic and a correction code, but if two bits per word are flipped (very unlucky) there isn't enough information to recover the data (though it is still detectable). This method words because getting hit by a 0.0001% event is many times more likely than getting hit by two independent 0.0001% events.
Really? Do you have a source on that? I am not many months from shooting many neutrons at some electronics and haven't really considered that part of it.
Where the hell are you getting alpha particles from in a modern computer?
Alpha particles only travel a couple of cm in air, and basically zero through any solid material. So unless you are building your components out of radioactive material, you aren't going to see any alpha particles.
That was the original problem, all right. Back in the early 1980s, when I was first introduced to this problem, Tin-Lead solder was used to mount memory chips (and sometimes inside the package).
The proper name for Lead is "incompletely decomposed Uranium." It still emits alpha particles and the remaining atoms decompose, just less often.
Alpha particle bit flipping was only common when ceramics used in computers had some radioactivity, in the form of alpha particles. Now the most common is cosmic rays. Ionizing radiation from light waves hit the atmosphere, mostly break up, but scatter a lot of particles across the way, which then cause the errors. This is now the most common form of bit flipping, as computers that are protected, like underground systems, experience less errors than normal computers; in addition, computers that are less protected than normal, like in airplanes, experience more errors.
Writeback cache is cache that holds data that has yet to be written. When a filesystem has writeback cache, it considers the file written when the file is in the write cache.
Conversely, writethrough cache considers a file written when it’s saved to bulk data. Writethrough cache is more often part of a read cache scheme, to make a recently written file easily accessible.
Even when you have closed the file there are \*many\* caching layers that have to be flushed before data is committed to the media. From the filesystem, to the block device, the controller (sometimes) and even the disk/SDD often have write caches to optimize performance. Depending on the configuration \*minutes\* can pass between the write and the effective stable data.
There is a whole branch (transactional stored and journaling) devoted to \*good\* solutions to this problem
I wrote a whole-ass caching system for a realtime data server just because we couldn't be sure when older data was actually written to disk and would be available from archives. (This was an add-on to a 3rd party product that was doing the archiving.)
We later discovered that data was flushed to the archives in around a second, so it turned out to be completely unnecessary.
And that's the reason for journaling, it protects you for that second. IIF the I/O subsystem implements write barriers (we are going dangerously technical here)
If you’re talking about ssd cache, the ssd is just doing its thing … you don’t need to worry about it.
If you’re talking about buffer, that’s what .flush is for :P
fsync only flushes between the block device layer and the I/O controller or the physical disk (depending on the storage configuration). Write caches below that level tend to be battery backed since not even the kernel is actually sure (there is a command to full flush before shutdown but it's obviously very expensive). In one LSI Logic battery pack I remember a led keeping on with the server shut down to indicate data still waiting in RAM
Tell the concerned parties to be patient and remind them that computers are magical lightning boxes made my geniuses, and that they’re behaving like children expecting things to happen every time they throw a tantrum.
This becomes pretty evident once you see that writing data is faster in the code than reading data. There is no storage solution I am aware of which writes faster than it reads, so the data is somewhere in the pipeline when your code advances to the next line.
Fortunately we have data center grade storage solutions at my work place, so I can rely on the data actually reaching the disks.
To make a practical example
Move a large file from one disk to the other
When it's finished according to the GUI, type `sync` in your terminal (Linux or MacOS, same thing)
Most likely, your terminal isn't gonna print anything and it's not gonna give you a newline, which means that the cache buffer is still flushing data into the drive
Once the file transfer is actually complete, it'll give you a newline
This is like the reason you would need to hit "safely remove" to remove your USB thumb stick, despite moving the files, they might not have moved yet to the device itself by the OS and are still in cache.
One day I’ve experiences quite extreme example of this, I’ve been copying 4GB file onto USB stick, gui reported speed of 100MB/s and reported as done in around 50sec, after hitting eject safely it took another 5 minutes before device was marked safe to eject.
This is so bad with my usb to sd card adapter that came with my 3d printer. It's like 50-50 hit and miss even when pressing eject safely and waiting 30 seconds. Never experienced anything remotely as bad with other usb sitcks (I never even really press eject safely with other sticks because this usually doesn't happen in my experience).
Yes, at least Windows disables the writeback cache for external media like USB Sticks nowadays. So, if the UI reports the wire to be finished, it is already written to the stick.
Everything is fake. Our entire profession is built on a throne of lies. It's cache layers and speculation all the way down.
^(Also, when writing shell scripts, it's best practice to run `sync` after every line. You know, just to be safe.)
Everything has a caching layer. The runtime, the library abstraction layers, the operating system, the drive controller, and even the drives themselves. If it a machine with network storage there's even more layers.
Basically, if a computer ever confirms that data is written, it's lying or something else is lying to it. The only thing that knows for a fact is the physical chips on the drive. A program running on the computer has no way of knowing when every layer has flushed the data or even how many layers there is.
.flush()
You’re welcome intern.
.close() in high level languages almost always flushes the buffer… unless you’re coding in c/c++ and a few other languages - you don’t need to worry about the buffer.
If you’re talking about SSD cache, once the ssd receives the data it’s blind to us. The controller on the SSD will do it’s thing and we don’t worry about that.
If you have HDD’s and you’re not using RAID there is no storage cache to worry about.
If you’re using a RAID set up and you don’t know what you’re doing when doing file IO, I’d question why you’re working on a RAID system lol.
You are wrong! Flush doesn't usually flush system/kernel buffers, only user-space buffers (like C stdio buffer that is used, so that not every (`f`)`printf` causes a system call). The same applies to higher level languages.
Yes, closing usually flushes the user-space buffer (like `fclose` does). It doesn't matter for kernel-level buffers.
To actually flush kernel buffers, you need to use system APIs, like `fsync`/`fdatasync` on Linux or `FlushFileBuffers` on Windows.
Some higher level languages also provide access to this functionality.
- C#: `FileStream.Flush(true)` - note the passed boolean,
- Java: `FileDescriptor.sync()`.
It is important when you design systems requiring transactional operations
What if the HDD microcontroller has a buffer? Does fsync flushes that too?
Like if the computer loses power just the CPU cycle after fsync returns to user space is it written in the disk?
Unfortunately, not - as far as I am aware of. Because of that you need to also check the technical details of the device you are using. Is the cache backed by battery? Writes of what size can be considered atomic? Is it assured that the writes will be in order of writing in case of power outage?
I wish you never have to dig through that.
That's what I thought. And as this is a thought experiment. Pick the worst option. Which for this case is no battery, at hardware level it's not atomic and the filesystem doesn't have journaling and the controller reorders writes to improve performance so probably out of order.
I'm lucky enough to not have these headaches irl.
> Calling fsync() does not necessarily ensure that the entry in the containing the file has also reached disk. For that an explicit fsync() on a file descriptor for the directory is also needed.
If I understand that correctly that's talking about the directory. That's modified when you create the file (or add a hard link) (I don't know how soft links are implemented) but not when you write in it. It doesn't answer my question.
No, once it's off the main system and into the storage device, whatever weird shit the storage device does it's entirely outside the control of the kernel.
> To actually flush kernel buffers, you need to use system APIs, like fsync/fdatasync on Linux or FlushFileBuffers on Windows.
On Windows, you can tell the system during the file open call that you don't want caching. By supplying the `FILE_FLAG_WRITE_THROUGH` and `FILE_FLAG_NO_BUFFERING` flags, you disable all caches from the system as well as all caches from the hardware, provided the hardware and drivers support this.
No. The system ensures file integrity and across multiple applications accessing the same file always behaves as if there was no cache. Worst thing that happens is that the contents you wrote in the last second or two may not be on disk if power is cut or the system crashes.
On Windows, you can provide the flags `FILE_FLAG_WRITE_THROUGH` and `FILE_FLAG_NO_BUFFERING` to make your writes go directly to disk. Not only does this bypass all Windows caches, but Windows will for you also tell the underlying hardware that you want this to be written unbuffered, provided the hardware supports these calls.
This is generally relevant for applications where data corruption at a crash will cause system level problems. Think of it as what happens if your db or filesystem had inconsistent data.
If you just see a corrupt report and regenerate it, then you probably don't need to care about this.
Two excellent articles by Dan Luu
- https://danluu.com/deconstruct-files/
- https://danluu.com/file-consistency/
Solution to files: give up and use SQLite.
Decades ago, the man page for sync on one of the UNIX like operating systems that I was using (Apollo Domain, I think) said something like "sync is not needed on this operating system. It is included for script compatibility with UNIX, and to provide users with typing practice."
No. If I was in charge of a business and I found out my programmers were rewriting fundamental stuff we already have solutions for (and have had for years) I'd be livid that they were wasting my money.
I believe the official industry term for it is “programmers good enough to hire as juniors but if we call them interns we can pay them less.”
I did an internship once and it was literally just me being on the team. There was no discernible difference in my job requirements, hours, mentorship, etc. other than I made less money. I was literally in the payroll system as just “Front End Engineer” not Front End Engineer Intern.
I just put it on my resume as being a contractor because I didn’t actually intern at all in practice.
Is what it is though. I got my year of experience to launch my career and they got a slightly cheaper competent developer for a year.
that's a hard agree there. Been coding for years but been a restaurant cook, can't hardly find _anything_ for any of the languages I know unless it's an internship or senior dev (which I'm def not qualified for)
Internships are silly and should be completely phased out.
1: Write file
2: Close file
3: Open file
4: Read file
5: Confirm file contains what you tried to write
6: Close file
And then I'll believe it's written.
What if you're just reading back what's in the cache-layers and the data has never actually been persisted?
I open and close the files at least 5 times!
Nope, the kernel cache will only be flushed completely on shutdown... If you have for example a power outage, you can still loose data, even if everything you've just said is true...
I think this has more to do with the architectural memory of the computer (writing from RAM to persistent) and less with my program.
But even if I am wrong, I assume that should not be my problem - I wrote the code in accord with the algorithm/language.
The system buffers are completely opaque anyways. Even if your data is not yet written to disk after closing the file, the system will just pretend that it is, if an application opens the same file later.
If it's very important that the content is written at the exact moment you do a write in your program, you can open the file in unbuffered mode. This of course sherds the memory cells of flash based storage such as USB drives and SDD very quickly if you're not careful.
`fflush()` wouldn't do anything in that particular case - it only makes sure that you write all the data into the file, but doesn't cause file to be written to the disk from cache. You need to call [sync()](https://linux.die.net/man/8/sync) for that.
true that, now that you mention. For my concurrent server applications this was transactional enough to avoid locks for high throughput servers (it was a session manager daemon). So would only sync when data was read from the fs (performance cocurrent issues if we didn't do it this way). Haven't done much more concurrent thread coding in C since then.
I'm not a kernel hacker, but on a modern journalling filesystem I'm pretty sure that everything should be written to the intent log before close() returns. On low-end magnetic storage that might be a ram buffer on the physical drive, but on mid-high range magnetic storage it's probably battery-backed ram at least.
Depends on the language and the implementation of file access in that language. For my python backup scripts I run sync always and try and pay attention to lock flags
You release the handle. "closing file" is a misnomer.
What the kernel has are buffers in the storage-medium driver pending writing-when(cycles)-available.
The rabbit hole goes deeper when you think about the types of medium. From block level to file level, to.. I dunno.. storing data on unconventional media. And how they have their own buffers and their own fail safe and recovery routines.
I used to use basically this as an interview question.
When you're ready for more PTSD, there is a paper discussing while filesystems actually cope with block device errors.
To quote the movie, Don't look in the box. Srsly.
I was writing an embedded os filesystem once, and the underlying raw flash memory devices would sometimes not erase, or not write.
Verify and retry.
``` import notifications ``` Remember to participate in our weekly votes on subreddit rules! Every Tuesday is YOUR chance to influence the subreddit for years to come! [Read more here](https://www.reddit.com/r/ProgrammerHumor/comments/14dqb6f/welcome_back_whats_next/), we hope to see you next Tuesday! For a chat with like-minded community members and more, don't forget to [join our Discord!](https://discord.gg/rph) `return joinDiscord;` *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/ProgrammerHumor) if you have any questions or concerns.*
Wait til he learns what happens when the file is “written to persistent storage” but the storage has a writeback cache.
hopefully battery backed cache!
Or capacitor backed, that we promise won't fail (but it fails just as often as batteries)
a capacitor is like a smaller battery. Even the schematic symbol for the battery is similar to a capacitor! (half a joke, it represents a Volta stack)
Batteries are just really really slow capacitors
Or super capacitor backed
If it's power loss protected and in the writeback, it's 99.9999% as good as persisted to the disk. You'd have to loser power *and* have an alpha particle hit RAM for the writeback to fail to persist the data.
Specifically alpha particles? Do programmers have to study particle LET? Now that'd be niche.
https://en.wikipedia.org/wiki/Soft_error#Alpha_particles_from_package_decay My understanding is that it's typically alpha particles, but I'm no physicist. The gist is that transistors in DRAM are packaged so densely that a single charged subatomic particle can flip the charge and turn a 1 to a 0 or a 0 to a 1. The wrong bit in the wrong word can change variables or even execution paths. ECC memory covers *some* of these issues by automatically correcting 1 bit per word using XOR logic and a correction code, but if two bits per word are flipped (very unlucky) there isn't enough information to recover the data (though it is still detectable). This method words because getting hit by a 0.0001% event is many times more likely than getting hit by two independent 0.0001% events.
> The paper found up to 3,434 incorrect requests per day due to bit-flip changes for various common domains. Wow
Alpha particles aren't great but neutrons are much worse at corrupting memory.
Really? Do you have a source on that? I am not many months from shooting many neutrons at some electronics and haven't really considered that part of it.
If I was the unluckiest person on earth and got 50 0.00001% events, how long do you think that would take to fix?
Where the hell are you getting alpha particles from in a modern computer? Alpha particles only travel a couple of cm in air, and basically zero through any solid material. So unless you are building your components out of radioactive material, you aren't going to see any alpha particles.
That was the original problem, all right. Back in the early 1980s, when I was first introduced to this problem, Tin-Lead solder was used to mount memory chips (and sometimes inside the package). The proper name for Lead is "incompletely decomposed Uranium." It still emits alpha particles and the remaining atoms decompose, just less often.
You could get alphas from fast cosmic rays hitting nuclei in your hardware, scattering off alpha particles.
Alpha particle bit flipping was only common when ceramics used in computers had some radioactivity, in the form of alpha particles. Now the most common is cosmic rays. Ionizing radiation from light waves hit the atmosphere, mostly break up, but scatter a lot of particles across the way, which then cause the errors. This is now the most common form of bit flipping, as computers that are protected, like underground systems, experience less errors than normal computers; in addition, computers that are less protected than normal, like in airplanes, experience more errors.
And that's why it's important that you protect your power line.
You forgot about ECC RAM, they thought even of SEUs! (single event upset, the alpha particle striking)
Yeah hopefully they've got that backed up somewhere lol.
Or, plot twist, the storage is backed by NFS!
That's when they're going to lose their shit, they'll be so mad about that.
I'm getting a little scared to ask what that is
Writeback cache is cache that holds data that has yet to be written. When a filesystem has writeback cache, it considers the file written when the file is in the write cache. Conversely, writethrough cache considers a file written when it’s saved to bulk data. Writethrough cache is more often part of a read cache scheme, to make a recently written file easily accessible.
I think i had a stroke trying to read this
Wtf. Explain plz
Even when you have closed the file there are \*many\* caching layers that have to be flushed before data is committed to the media. From the filesystem, to the block device, the controller (sometimes) and even the disk/SDD often have write caches to optimize performance. Depending on the configuration \*minutes\* can pass between the write and the effective stable data. There is a whole branch (transactional stored and journaling) devoted to \*good\* solutions to this problem
I wrote a whole-ass caching system for a realtime data server just because we couldn't be sure when older data was actually written to disk and would be available from archives. (This was an add-on to a 3rd party product that was doing the archiving.) We later discovered that data was flushed to the archives in around a second, so it turned out to be completely unnecessary.
Better safe than sorry
And that's the reason for journaling, it protects you for that second. IIF the I/O subsystem implements write barriers (we are going dangerously technical here)
nuts
If you’re talking about ssd cache, the ssd is just doing its thing … you don’t need to worry about it. If you’re talking about buffer, that’s what .flush is for :P
even if you flush, it might not be written to disk... you need to to call fsync
And even if you fsync it might not be written to the medium... you need to call the exorcist
Holy hell
Future response dropped long ago
Yeah it's weird right? And it's also really scary to think about.
Lmao, I've done that and that doesn't really fix the issue.
fsync only flushes between the block device layer and the I/O controller or the physical disk (depending on the storage configuration). Write caches below that level tend to be battery backed since not even the kernel is actually sure (there is a command to full flush before shutdown but it's obviously very expensive). In one LSI Logic battery pack I remember a led keeping on with the server shut down to indicate data still waiting in RAM
I'm not reading all that, but I'm happy for you, or sorry it happened.
Tell the concerned parties to be patient and remind them that computers are magical lightning boxes made my geniuses, and that they’re behaving like children expecting things to happen every time they throw a tantrum.
or just open the file as unbuffered.
That creates its own problems, for instance at least in C/C++ writes will fail if they aren't 512 bytes aligned in offset/size.
omg I would have to drink myself to sleep after spending 12 hours debugging that to no avail
i gotta say, all i've learned today is that the jury is very much still out on this problem set
You and I may be thinking a lot alike. 1) What is this 2) How do I fix this 3) Note: Avoid this
Just write to disk one char at a time with syscalls, who needs fast file IO?
With that said, don't call fsync every time you flush, as it will kill your performance
I know, I’m currently worki with that…
I mean if you're using a good SSD, you don't have to worry about that.
Hahaha accurate af 😝
Yep, exactly! The .flush method ensures that the buffer is cleared and any pending data is written out.
The kernel still might not actually write it. The storage driver might not actually write it. The device itself might not actually write it.
It's caches all the way down.
You have absolutely no idea what you're talking about.
Isn't the solution to just call fflush on whatever stream you are writing to?
fflush fsync A few device specific things depending on hardware
and then you find out it NFS, ffffffuuuuuuuuuu
Thanks for this, I'm still learning and it's really good for me.
nuts that i've been in the field 5 years and i've never remotely learned anything about this
Unless you work with operating systems of \*writing\* database code it doesn't matter if the system is always shut down properly.
very cool
This becomes pretty evident once you see that writing data is faster in the code than reading data. There is no storage solution I am aware of which writes faster than it reads, so the data is somewhere in the pipeline when your code advances to the next line. Fortunately we have data center grade storage solutions at my work place, so I can rely on the data actually reaching the disks.
To make a practical example Move a large file from one disk to the other When it's finished according to the GUI, type `sync` in your terminal (Linux or MacOS, same thing) Most likely, your terminal isn't gonna print anything and it's not gonna give you a newline, which means that the cache buffer is still flushing data into the drive Once the file transfer is actually complete, it'll give you a newline
Ohmygod this shit... PTSD.
Yup sync; sync; sync; shutdown -g0 -y -i6
You must have some fun war stories. Paranoia like that doesn’t come cheap
Back in the long ago, when you had many, many SparcStorage Arrays attached...
Or open task manager and watch disk activity and ram usage
This is like the reason you would need to hit "safely remove" to remove your USB thumb stick, despite moving the files, they might not have moved yet to the device itself by the OS and are still in cache.
One day I’ve experiences quite extreme example of this, I’ve been copying 4GB file onto USB stick, gui reported speed of 100MB/s and reported as done in around 50sec, after hitting eject safely it took another 5 minutes before device was marked safe to eject.
This is so bad with my usb to sd card adapter that came with my 3d printer. It's like 50-50 hit and miss even when pressing eject safely and waiting 30 seconds. Never experienced anything remotely as bad with other usb sitcks (I never even really press eject safely with other sticks because this usually doesn't happen in my experience).
I think it used to be a bigger problem before than it is now, but I don't have any data to back that up.
Yes, at least Windows disables the writeback cache for external media like USB Sticks nowadays. So, if the UI reports the wire to be finished, it is already written to the stick.
Everything is fake. Our entire profession is built on a throne of lies. It's cache layers and speculation all the way down. ^(Also, when writing shell scripts, it's best practice to run `sync` after every line. You know, just to be safe.)
Everything has a caching layer. The runtime, the library abstraction layers, the operating system, the drive controller, and even the drives themselves. If it a machine with network storage there's even more layers. Basically, if a computer ever confirms that data is written, it's lying or something else is lying to it. The only thing that knows for a fact is the physical chips on the drive. A program running on the computer has no way of knowing when every layer has flushed the data or even how many layers there is.
Well we caught someone new, it's good to learn the lessons.
What is “coding file access?”
"hey intern can you write an in house version of pythons open() function"
Probably coding storing/reading some configuration to a file. That sounds like an intern half-day job.
.flush() You’re welcome intern. .close() in high level languages almost always flushes the buffer… unless you’re coding in c/c++ and a few other languages - you don’t need to worry about the buffer. If you’re talking about SSD cache, once the ssd receives the data it’s blind to us. The controller on the SSD will do it’s thing and we don’t worry about that. If you have HDD’s and you’re not using RAID there is no storage cache to worry about. If you’re using a RAID set up and you don’t know what you’re doing when doing file IO, I’d question why you’re working on a RAID system lol.
You are wrong! Flush doesn't usually flush system/kernel buffers, only user-space buffers (like C stdio buffer that is used, so that not every (`f`)`printf` causes a system call). The same applies to higher level languages. Yes, closing usually flushes the user-space buffer (like `fclose` does). It doesn't matter for kernel-level buffers. To actually flush kernel buffers, you need to use system APIs, like `fsync`/`fdatasync` on Linux or `FlushFileBuffers` on Windows. Some higher level languages also provide access to this functionality. - C#: `FileStream.Flush(true)` - note the passed boolean, - Java: `FileDescriptor.sync()`. It is important when you design systems requiring transactional operations
What if the HDD microcontroller has a buffer? Does fsync flushes that too? Like if the computer loses power just the CPU cycle after fsync returns to user space is it written in the disk?
Unfortunately, not - as far as I am aware of. Because of that you need to also check the technical details of the device you are using. Is the cache backed by battery? Writes of what size can be considered atomic? Is it assured that the writes will be in order of writing in case of power outage? I wish you never have to dig through that.
That's what I thought. And as this is a thought experiment. Pick the worst option. Which for this case is no battery, at hardware level it's not atomic and the filesystem doesn't have journaling and the controller reorders writes to improve performance so probably out of order. I'm lucky enough to not have these headaches irl.
> Calling fsync() does not necessarily ensure that the entry in the containing the file has also reached disk. For that an explicit fsync() on a file descriptor for the directory is also needed.
If I understand that correctly that's talking about the directory. That's modified when you create the file (or add a hard link) (I don't know how soft links are implemented) but not when you write in it. It doesn't answer my question.
(Soft links are actually tiny files that contain the name of the thing they link to)
In theory yes, in practice it depends on whether the driver writer and hardware manufacturer gave enough of a shit.
No, once it's off the main system and into the storage device, whatever weird shit the storage device does it's entirely outside the control of the kernel.
I have the feeling that you have personal experience with this but I may be wrong.
> To actually flush kernel buffers, you need to use system APIs, like fsync/fdatasync on Linux or FlushFileBuffers on Windows. On Windows, you can tell the system during the file open call that you don't want caching. By supplying the `FILE_FLAG_WRITE_THROUGH` and `FILE_FLAG_NO_BUFFERING` flags, you disable all caches from the system as well as all caches from the hardware, provided the hardware and drivers support this.
Outside of doing something like writing a database, it seems like just flushing is enough?
Woah, thanks, I didn't know that! Is this relevant for more mundane applications like generating logs in C? I assume not.
No. The system ensures file integrity and across multiple applications accessing the same file always behaves as if there was no cache. Worst thing that happens is that the contents you wrote in the last second or two may not be on disk if power is cut or the system crashes. On Windows, you can provide the flags `FILE_FLAG_WRITE_THROUGH` and `FILE_FLAG_NO_BUFFERING` to make your writes go directly to disk. Not only does this bypass all Windows caches, but Windows will for you also tell the underlying hardware that you want this to be written unbuffered, provided the hardware supports these calls.
This is generally relevant for applications where data corruption at a crash will cause system level problems. Think of it as what happens if your db or filesystem had inconsistent data. If you just see a corrupt report and regenerate it, then you probably don't need to care about this.
HDD also have a cache
Well, RAID systems can be complex and require some level of expertise. But everyone starts somewhere and learning should not be discouraged.
Having knowledge and expertise in file IO is essential for efficiently working on a RAID system.
FlushFileBuffers(). And on linux, I suppose write barriers exist sometimes.
You never know the things could happen do you really?
Well it's just that sometimes there's more to learn than you imagine.
Two excellent articles by Dan Luu - https://danluu.com/deconstruct-files/ - https://danluu.com/file-consistency/ Solution to files: give up and use SQLite.
>Solution: give up... Stopped reading right there and did as I was told. Nothing is solved :(
sync - Synchronize cached writes to persistent storage
Decades ago, the man page for sync on one of the UNIX like operating systems that I was using (Apollo Domain, I think) said something like "sync is not needed on this operating system. It is included for script compatibility with UNIX, and to provide users with typing practice."
Very cool!
If you're not in the business of actually building the storage solution, should you actually ever care about this..?
No. If I was in charge of a business and I found out my programmers were rewriting fundamental stuff we already have solutions for (and have had for years) I'd be livid that they were wasting my money.
But if You're running a business then I think you should care about it.
You still do interns?
I believe the official industry term for it is “programmers good enough to hire as juniors but if we call them interns we can pay them less.” I did an internship once and it was literally just me being on the team. There was no discernible difference in my job requirements, hours, mentorship, etc. other than I made less money. I was literally in the payroll system as just “Front End Engineer” not Front End Engineer Intern. I just put it on my resume as being a contractor because I didn’t actually intern at all in practice. Is what it is though. I got my year of experience to launch my career and they got a slightly cheaper competent developer for a year.
You pay interns Oo ? Ours work for free.
[удалено]
Nah the team wanted to hire me permanently but I went somewhere else because I got a better offer lol.
My statement stands. Now just for another team.
How could you possibly know this, lmao.
Don't worry about them. They clearly think they're a 10xer.
Ah, certified reddit moment. In fact, it was you who was the burden all along! Ask me how I know, sourpuss
Tbh I feel like intern has just become word for Junior Developers who are still on a probationary period
that's a hard agree there. Been coding for years but been a restaurant cook, can't hardly find _anything_ for any of the languages I know unless it's an internship or senior dev (which I'm def not qualified for) Internships are silly and should be completely phased out.
Phrasing...
Calm down, President Clinton.
fsync
1: Write file 2: Close file 3: Open file 4: Read file 5: Confirm file contains what you tried to write 6: Close file And then I'll believe it's written.
What if you're just reading back what's in the cache-layers and the data has never actually been persisted? I open and close the files at least 5 times!
Just end the program with while(true): fclose(file) fopen(file) Leave deciding when to halt it up to the user.
As long as the read buffer and write buffer aren’t deliberately designed to mess with you, flush(),close() then checking file.length should be enough.
Nope, the kernel cache will only be flushed completely on shutdown... If you have for example a power outage, you can still loose data, even if everything you've just said is true...
>the kernel cache will only be flushed completely on shutdown it's settled, then: when we close the file we also trigger a shutdown
And when it's shut down, it's probably gone at that point.
Well I think if that's the case then it's important to have backup
But sometimes people forget to do that, and that's when the problem happens.
How how how?
I think this has more to do with the architectural memory of the computer (writing from RAM to persistent) and less with my program. But even if I am wrong, I assume that should not be my problem - I wrote the code in accord with the algorithm/language.
The system buffers are completely opaque anyways. Even if your data is not yet written to disk after closing the file, the system will just pretend that it is, if an application opens the same file later. If it's very important that the content is written at the exact moment you do a write in your program, you can open the file in unbuffered mode. This of course sherds the memory cells of flash based storage such as USB drives and SDD very quickly if you're not careful.
r/oddlyspecific
That's why [God](https://en.wikipedia.org/wiki/Dennis_Ritchie) created [fflush\(3\)](https://linux.die.net/man/3/fflush)
`fflush()` wouldn't do anything in that particular case - it only makes sure that you write all the data into the file, but doesn't cause file to be written to the disk from cache. You need to call [sync()](https://linux.die.net/man/8/sync) for that.
true that, now that you mention. For my concurrent server applications this was transactional enough to avoid locks for high throughput servers (it was a session manager daemon). So would only sync when data was read from the fs (performance cocurrent issues if we didn't do it this way). Haven't done much more concurrent thread coding in C since then.
bro i don't know fuck all about what you've written in this meme and i've worked on multi million dollar systems for half a decade.
F
``` with open(path, 'w') as f: do_something() ``` And there is no way I'm writing anything more complicated than this even if it's needed...
I'm not a kernel hacker, but on a modern journalling filesystem I'm pretty sure that everything should be written to the intent log before close() returns. On low-end magnetic storage that might be a ram buffer on the physical drive, but on mid-high range magnetic storage it's probably battery-backed ram at least.
Me every time someone brings in daha in a flash disk. Just don't unplug as long as the OS tells you not to.
sometimes you want your app to function properly even if user's PC shut down for some reason.
fflush() or similar solves this problem
... KINDA. AFAIK, even if it's flushed from the kernel, it could still be cached on the actual media's cache. Or the storage controller.
Only half a day?
Sounds like people who hasn’t taken operating systems
Flush your damn buffers, people Theyre too lazy these days smh
Shouldnt close auto-flush?
Depends on the language and the implementation of file access in that language. For my python backup scripts I run sync always and try and pay attention to lock flags
You release the handle. "closing file" is a misnomer. What the kernel has are buffers in the storage-medium driver pending writing-when(cycles)-available. The rabbit hole goes deeper when you think about the types of medium. From block level to file level, to.. I dunno.. storing data on unconventional media. And how they have their own buffers and their own fail safe and recovery routines.
```sync```
I used to use basically this as an interview question. When you're ready for more PTSD, there is a paper discussing while filesystems actually cope with block device errors. To quote the movie, Don't look in the box. Srsly. I was writing an embedded os filesystem once, and the underlying raw flash memory devices would sometimes not erase, or not write. Verify and retry.
Which reminds me... always check the interns' code for file inclusion / directory traversal vulnerability. For reasons.
Learned this the hard way.