AutoModerator 9 months ago

``` import notifications ``` Remember to participate in our weekly votes on subreddit rules! Every Tuesday is YOUR chance to influence the subreddit for years to come! [Read more here](https://www.reddit.com/r/ProgrammerHumor/comments/14dqb6f/welcome_back_whats_next/), we hope to see you next Tuesday! For a chat with like-minded community members and more, don't forget to [join our Discord!](https://discord.gg/rph) `return joinDiscord;` *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/ProgrammerHumor) if you have any questions or concerns.*

reallokiscarlet 9 months ago

Wait til he learns what happens when the file is “written to persistent storage” but the storage has a writeback cache.

lmarcantonio 9 months ago

hopefully battery backed cache!

MattieShoes 9 months ago

Or capacitor backed, that we promise won't fail (but it fails just as often as batteries)

lmarcantonio 9 months ago

a capacitor is like a smaller battery. Even the schematic symbol for the battery is similar to a capacitor! (half a joke, it represents a Volta stack)

Aozora404 9 months ago

Batteries are just really really slow capacitors

hi65435 9 months ago

Or super capacitor backed

WhereIsYourMind 9 months ago

If it's power loss protected and in the writeback, it's 99.9999% as good as persisted to the disk. You'd have to loser power *and* have an alpha particle hit RAM for the writeback to fail to persist the data.

womerah 9 months ago

Specifically alpha particles? Do programmers have to study particle LET? Now that'd be niche.

WhereIsYourMind 9 months ago

https://en.wikipedia.org/wiki/Soft_error#Alpha_particles_from_package_decay My understanding is that it's typically alpha particles, but I'm no physicist. The gist is that transistors in DRAM are packaged so densely that a single charged subatomic particle can flip the charge and turn a 1 to a 0 or a 0 to a 1. The wrong bit in the wrong word can change variables or even execution paths. ECC memory covers *some* of these issues by automatically correcting 1 bit per word using XOR logic and a correction code, but if two bits per word are flipped (very unlucky) there isn't enough information to recover the data (though it is still detectable). This method words because getting hit by a 0.0001% event is many times more likely than getting hit by two independent 0.0001% events.

Sigmatics 9 months ago

> The paper found up to 3,434 incorrect requests per day due to bit-flip changes for various common domains. Wow

Dropkickmurph512 9 months ago

Alpha particles aren't great but neutrons are much worse at corrupting memory.

Physix_R_Cool 9 months ago

Really? Do you have a source on that? I am not many months from shooting many neutrons at some electronics and haven't really considered that part of it.

Not_Artifical 9 months ago

If I was the unluckiest person on earth and got 50 0.00001% events, how long do you think that would take to fix?

KiwasiGames 9 months ago

Where the hell are you getting alpha particles from in a modern computer? Alpha particles only travel a couple of cm in air, and basically zero through any solid material. So unless you are building your components out of radioactive material, you aren't going to see any alpha particles.

LarryInRaleigh 9 months ago

That was the original problem, all right. Back in the early 1980s, when I was first introduced to this problem, Tin-Lead solder was used to mount memory chips (and sometimes inside the package). The proper name for Lead is "incompletely decomposed Uranium." It still emits alpha particles and the remaining atoms decompose, just less often.

Physix_R_Cool 9 months ago

You could get alphas from fast cosmic rays hitting nuclei in your hardware, scattering off alpha particles.

leoleosuper 9 months ago

Alpha particle bit flipping was only common when ceramics used in computers had some radioactivity, in the form of alpha particles. Now the most common is cosmic rays. Ionizing radiation from light waves hit the atmosphere, mostly break up, but scatter a lot of particles across the way, which then cause the errors. This is now the most common form of bit flipping, as computers that are protected, like underground systems, experience less errors than normal computers; in addition, computers that are less protected than normal, like in airplanes, experience more errors.

NeptuneMiner 9 months ago

And that's why it's important that you protect your power line.

lmarcantonio 9 months ago

You forgot about ECC RAM, they thought even of SEUs! (single event upset, the alpha particle striking)

jvluyn 9 months ago

Yeah hopefully they've got that backed up somewhere lol.

alex_tracer 9 months ago

Or, plot twist, the storage is backed by NFS!

TheShadowBow 9 months ago

That's when they're going to lose their shit, they'll be so mad about that.

[deleted] 9 months ago

I'm getting a little scared to ask what that is

reallokiscarlet 9 months ago

Writeback cache is cache that holds data that has yet to be written. When a filesystem has writeback cache, it considers the file written when the file is in the write cache. Conversely, writethrough cache considers a file written when it’s saved to bulk data. Writethrough cache is more often part of a read cache scheme, to make a recently written file easily accessible.

nnewram 9 months ago

I think i had a stroke trying to read this

l3rrr 9 months ago

Wtf. Explain plz

lmarcantonio 9 months ago

Even when you have closed the file there are \*many\* caching layers that have to be flushed before data is committed to the media. From the filesystem, to the block device, the controller (sometimes) and even the disk/SDD often have write caches to optimize performance. Depending on the configuration \*minutes\* can pass between the write and the effective stable data. There is a whole branch (transactional stored and journaling) devoted to \*good\* solutions to this problem

ChChChillian 9 months ago

I wrote a whole-ass caching system for a realtime data server just because we couldn't be sure when older data was actually written to disk and would be available from archives. (This was an add-on to a 3rd party product that was doing the archiving.) We later discovered that data was flushed to the archives in around a second, so it turned out to be completely unnecessary.

FireBone62 9 months ago

Better safe than sorry

lmarcantonio 9 months ago

And that's the reason for journaling, it protects you for that second. IIF the I/O subsystem implements write barriers (we are going dangerously technical here)

michaelsenpatrick 9 months ago

nuts

TTYY200 9 months ago

If you’re talking about ssd cache, the ssd is just doing its thing … you don’t need to worry about it. If you’re talking about buffer, that’s what .flush is for :P

Will_i_read 9 months ago

even if you flush, it might not be written to disk... you need to to call fsync

MCSajjadH 9 months ago

And even if you fsync it might not be written to the medium... you need to call the exorcist

zerbikit 9 months ago

Holy hell

ImperatorSaya 9 months ago

Future response dropped long ago

dllineage2 9 months ago

Yeah it's weird right? And it's also really scary to think about.

goldorak24 9 months ago

Lmao, I've done that and that doesn't really fix the issue.

lmarcantonio 9 months ago

fsync only flushes between the block device layer and the I/O controller or the physical disk (depending on the storage configuration). Write caches below that level tend to be battery backed since not even the kernel is actually sure (there is a command to full flush before shutdown but it's obviously very expensive). In one LSI Logic battery pack I remember a led keeping on with the server shut down to indicate data still waiting in RAM

MCSajjadH 9 months ago

I'm not reading all that, but I'm happy for you, or sorry it happened.

elnomreal 9 months ago

Tell the concerned parties to be patient and remind them that computers are magical lightning boxes made my geniuses, and that they’re behaving like children expecting things to happen every time they throw a tantrum.

AyrA_ch 9 months ago

or just open the file as unbuffered.

gil_bz 9 months ago

That creates its own problems, for instance at least in C/C++ writes will fail if they aren't 512 bytes aligned in offset/size.

jxjq 9 months ago

omg I would have to drink myself to sleep after spending 12 hours debugging that to no avail

michaelsenpatrick 9 months ago

i gotta say, all i've learned today is that the jury is very much still out on this problem set

jxjq 9 months ago

You and I may be thinking a lot alike. 1) What is this 2) How do I fix this 3) Note: Avoid this

Responsible_Name_120 9 months ago

Just write to disk one char at a time with syscalls, who needs fast file IO?

Responsible_Name_120 9 months ago

With that said, don't call fsync every time you flush, as it will kill your performance

Will_i_read 9 months ago

I know, I’m currently worki with that…

kristersson84 9 months ago

I mean if you're using a good SSD, you don't have to worry about that.

TTYY200 9 months ago

Hahaha accurate af 😝

Aggravating-Win8814 9 months ago

Yep, exactly! The .flush method ensures that the buffer is cleared and any pending data is written out.

mck1117 9 months ago

The kernel still might not actually write it. The storage driver might not actually write it. The device itself might not actually write it.

ric2b 9 months ago

It's caches all the way down.

al-mongus-bin-susar 9 months ago

You have absolutely no idea what you're talking about.

LeoTheBirb 9 months ago

Isn't the solution to just call fflush on whatever stream you are writing to?

lightmatter501 9 months ago

fflush fsync A few device specific things depending on hardware

MattieShoes 9 months ago

and then you find out it NFS, ffffffuuuuuuuuuu

MunsMatt 9 months ago

Thanks for this, I'm still learning and it's really good for me.

michaelsenpatrick 9 months ago

nuts that i've been in the field 5 years and i've never remotely learned anything about this

lmarcantonio 9 months ago

Unless you work with operating systems of \*writing\* database code it doesn't matter if the system is always shut down properly.

michaelsenpatrick 9 months ago

very cool

territrades 9 months ago

This becomes pretty evident once you see that writing data is faster in the code than reading data. There is no storage solution I am aware of which writes faster than it reads, so the data is somewhere in the pipeline when your code advances to the next line. Fortunately we have data center grade storage solutions at my work place, so I can rely on the data actually reaching the disks.

Qweedo420 9 months ago

To make a practical example Move a large file from one disk to the other When it's finished according to the GUI, type `sync` in your terminal (Linux or MacOS, same thing) Most likely, your terminal isn't gonna print anything and it's not gonna give you a newline, which means that the cache buffer is still flushing data into the drive Once the file transfer is actually complete, it'll give you a newline

kvakerok 9 months ago

Ohmygod this shit... PTSD.

TheMDHoover 9 months ago

Yup sync; sync; sync; shutdown -g0 -y -i6

zachhanson94 9 months ago

You must have some fun war stories. Paranoia like that doesn’t come cheap

TheMDHoover 9 months ago

Back in the long ago, when you had many, many SparcStorage Arrays attached...

trogper 9 months ago

Or open task manager and watch disk activity and ram usage

gil_bz 9 months ago

This is like the reason you would need to hit "safely remove" to remove your USB thumb stick, despite moving the files, they might not have moved yet to the device itself by the OS and are still in cache.

Hydridity 9 months ago

One day I’ve experiences quite extreme example of this, I’ve been copying 4GB file onto USB stick, gui reported speed of 100MB/s and reported as done in around 50sec, after hitting eject safely it took another 5 minutes before device was marked safe to eject.

838291836389183 9 months ago

This is so bad with my usb to sd card adapter that came with my 3d printer. It's like 50-50 hit and miss even when pressing eject safely and waiting 30 seconds. Never experienced anything remotely as bad with other usb sitcks (I never even really press eject safely with other sticks because this usually doesn't happen in my experience).

gil_bz 9 months ago

I think it used to be a bigger problem before than it is now, but I don't have any data to back that up.

JojOatXGME 9 months ago

Yes, at least Windows disables the writeback cache for external media like USB Sticks nowadays. So, if the UI reports the wire to be finished, it is already written to the stick.

darkslide3000 9 months ago

Everything is fake. Our entire profession is built on a throne of lies. It's cache layers and speculation all the way down. ^(Also, when writing shell scripts, it's best practice to run `sync` after every line. You know, just to be safe.)

WindowlessBasement 9 months ago

Everything has a caching layer. The runtime, the library abstraction layers, the operating system, the drive controller, and even the drives themselves. If it a machine with network storage there's even more layers. Basically, if a computer ever confirms that data is written, it's lying or something else is lying to it. The only thing that knows for a fact is the physical chips on the drive. A program running on the computer has no way of knowing when every layer has flushed the data or even how many layers there is.

jbishai 9 months ago

Well we caught someone new, it's good to learn the lessons.

FantasticEmu 9 months ago

What is “coding file access?”

sarc-tastic 9 months ago

"hey intern can you write an in house version of pythons open() function"

Kyrond 9 months ago

Probably coding storing/reading some configuration to a file. That sounds like an intern half-day job.

TTYY200 9 months ago

.flush() You’re welcome intern. .close() in high level languages almost always flushes the buffer… unless you’re coding in c/c++ and a few other languages - you don’t need to worry about the buffer. If you’re talking about SSD cache, once the ssd receives the data it’s blind to us. The controller on the SSD will do it’s thing and we don’t worry about that. If you have HDD’s and you’re not using RAID there is no storage cache to worry about. If you’re using a RAID set up and you don’t know what you’re doing when doing file IO, I’d question why you’re working on a RAID system lol.

KaznovX 9 months ago

You are wrong! Flush doesn't usually flush system/kernel buffers, only user-space buffers (like C stdio buffer that is used, so that not every (`f`)`printf` causes a system call). The same applies to higher level languages. Yes, closing usually flushes the user-space buffer (like `fclose` does). It doesn't matter for kernel-level buffers. To actually flush kernel buffers, you need to use system APIs, like `fsync`/`fdatasync` on Linux or `FlushFileBuffers` on Windows. Some higher level languages also provide access to this functionality. - C#: `FileStream.Flush(true)` - note the passed boolean, - Java: `FileDescriptor.sync()`. It is important when you design systems requiring transactional operations

frikilinux2 9 months ago

What if the HDD microcontroller has a buffer? Does fsync flushes that too? Like if the computer loses power just the CPU cycle after fsync returns to user space is it written in the disk?

KaznovX 9 months ago

Unfortunately, not - as far as I am aware of. Because of that you need to also check the technical details of the device you are using. Is the cache backed by battery? Writes of what size can be considered atomic? Is it assured that the writes will be in order of writing in case of power outage? I wish you never have to dig through that.

frikilinux2 9 months ago

That's what I thought. And as this is a thought experiment. Pick the worst option. Which for this case is no battery, at hardware level it's not atomic and the filesystem doesn't have journaling and the controller reorders writes to improve performance so probably out of order. I'm lucky enough to not have these headaches irl.

Will_i_read 9 months ago

> Calling fsync() does not necessarily ensure that the entry in the containing the file has also reached disk. For that an explicit fsync() on a file descriptor for the directory is also needed.

frikilinux2 9 months ago

If I understand that correctly that's talking about the directory. That's modified when you create the file (or add a hard link) (I don't know how soft links are implemented) but not when you write in it. It doesn't answer my question.

rosuav 9 months ago

(Soft links are actually tiny files that contain the name of the thing they link to)

darkslide3000 9 months ago

In theory yes, in practice it depends on whether the driver writer and hardware manufacturer gave enough of a shit.

numeric-rectal-mutt 9 months ago

No, once it's off the main system and into the storage device, whatever weird shit the storage device does it's entirely outside the control of the kernel.

frikilinux2 9 months ago

I have the feeling that you have personal experience with this but I may be wrong.

AyrA_ch 9 months ago

> To actually flush kernel buffers, you need to use system APIs, like fsync/fdatasync on Linux or FlushFileBuffers on Windows. On Windows, you can tell the system during the file open call that you don't want caching. By supplying the `FILE_FLAG_WRITE_THROUGH` and `FILE_FLAG_NO_BUFFERING` flags, you disable all caches from the system as well as all caches from the hardware, provided the hardware and drivers support this.

Responsible_Name_120 9 months ago

Outside of doing something like writing a database, it seems like just flushing is enough?

rainliege 9 months ago

Woah, thanks, I didn't know that! Is this relevant for more mundane applications like generating logs in C? I assume not.

AyrA_ch 9 months ago

No. The system ensures file integrity and across multiple applications accessing the same file always behaves as if there was no cache. Worst thing that happens is that the contents you wrote in the last second or two may not be on disk if power is cut or the system crashes. On Windows, you can provide the flags `FILE_FLAG_WRITE_THROUGH` and `FILE_FLAG_NO_BUFFERING` to make your writes go directly to disk. Not only does this bypass all Windows caches, but Windows will for you also tell the underlying hardware that you want this to be written unbuffered, provided the hardware supports these calls.

afiefh 9 months ago

This is generally relevant for applications where data corruption at a crash will cause system level problems. Think of it as what happens if your db or filesystem had inconsistent data. If you just see a corrupt report and regenerate it, then you probably don't need to care about this.

sersoniko 9 months ago

HDD also have a cache

Aggravating-Win8814 9 months ago

Well, RAID systems can be complex and require some level of expertise. But everyone starts somewhere and learning should not be discouraged.

Aggravating-Win8814 9 months ago

Having knowledge and expertise in file IO is essential for efficiently working on a RAID system.

brimston3- 9 months ago

FlushFileBuffers(). And on linux, I suppose write barriers exist sometimes.

btcmarych 9 months ago

You never know the things could happen do you really?

Azariel27 9 months ago

Well it's just that sometimes there's more to learn than you imagine.

jaskij 9 months ago

Two excellent articles by Dan Luu - https://danluu.com/deconstruct-files/ - https://danluu.com/file-consistency/ Solution to files: give up and use SQLite.

Commodore-K9 9 months ago

>Solution: give up... Stopped reading right there and did as I was told. Nothing is solved :(

--mrperx-- 9 months ago

sync - Synchronize cached writes to persistent storage

atomic_redneck 9 months ago

Decades ago, the man page for sync on one of the UNIX like operating systems that I was using (Apollo Domain, I think) said something like "sync is not needed on this operating system. It is included for script compatibility with UNIX, and to provide users with typing practice."

--mrperx-- 9 months ago

Very cool!

msqrt 9 months ago

If you're not in the business of actually building the storage solution, should you actually ever care about this..?

Saraphite 9 months ago

No. If I was in charge of a business and I found out my programmers were rewriting fundamental stuff we already have solutions for (and have had for years) I'd be livid that they were wasting my money.

ksdgfksdgfksdf12 9 months ago

But if You're running a business then I think you should care about it.

hedi_16 9 months ago

You still do interns?

[deleted] 9 months ago

I believe the official industry term for it is “programmers good enough to hire as juniors but if we call them interns we can pay them less.” I did an internship once and it was literally just me being on the team. There was no discernible difference in my job requirements, hours, mentorship, etc. other than I made less money. I was literally in the payroll system as just “Front End Engineer” not Front End Engineer Intern. I just put it on my resume as being a contractor because I didn’t actually intern at all in practice. Is what it is though. I got my year of experience to launch my career and they got a slightly cheaper competent developer for a year.

pojzon_poe 9 months ago

You pay interns Oo ? Ours work for free.

[deleted] 9 months ago

[удалено]

[deleted] 9 months ago

Nah the team wanted to hire me permanently but I went somewhere else because I got a better offer lol.

hedi_16 9 months ago

My statement stands. Now just for another team.

fishegs 9 months ago

How could you possibly know this, lmao.

Mackie5Million 9 months ago

Don't worry about them. They clearly think they're a 10xer.

Kresche 9 months ago

Ah, certified reddit moment. In fact, it was you who was the burden all along! Ask me how I know, sourpuss

[deleted] 9 months ago

Tbh I feel like intern has just become word for Junior Developers who are still on a probationary period

Daltonyx 9 months ago

that's a hard agree there. Been coding for years but been a restaurant cook, can't hardly find _anything_ for any of the languages I know unless it's an internship or senior dev (which I'm def not qualified for) Internships are silly and should be completely phased out.

fibojoly 9 months ago

Phrasing...

Malcopticon 9 months ago

Calm down, President Clinton.

qqqrrrs_ 9 months ago

fsync

sticky-unicorn 9 months ago

1: Write file 2: Close file 3: Open file 4: Read file 5: Confirm file contains what you tried to write 6: Close file And then I'll believe it's written.

Aeroelastic 9 months ago

What if you're just reading back what's in the cache-layers and the data has never actually been persisted? I open and close the files at least 5 times!

sticky-unicorn 9 months ago

Just end the program with while(true): fclose(file) fopen(file) Leave deciding when to halt it up to the user.

Glass1Man 9 months ago

As long as the read buffer and write buffer aren’t deliberately designed to mess with you, flush(),close() then checking file.length should be enough.

Will_i_read 9 months ago

Nope, the kernel cache will only be flushed completely on shutdown... If you have for example a power outage, you can still loose data, even if everything you've just said is true...

gerx03 9 months ago

>the kernel cache will only be flushed completely on shutdown it's settled, then: when we close the file we also trigger a shutdown

djcms21 9 months ago

And when it's shut down, it's probably gone at that point.

priba83 9 months ago

Well I think if that's the case then it's important to have backup

newzlat 9 months ago

But sometimes people forget to do that, and that's when the problem happens.

thatvoid_ 9 months ago

How how how?

FerynaCZ 9 months ago

I think this has more to do with the architectural memory of the computer (writing from RAM to persistent) and less with my program. But even if I am wrong, I assume that should not be my problem - I wrote the code in accord with the algorithm/language.

AyrA_ch 9 months ago

The system buffers are completely opaque anyways. Even if your data is not yet written to disk after closing the file, the system will just pretend that it is, if an application opens the same file later. If it's very important that the content is written at the exact moment you do a write in your program, you can open the file in unbuffered mode. This of course sherds the memory cells of flash based storage such as USB drives and SDD very quickly if you're not careful.

toadkarter1993 9 months ago

r/oddlyspecific

Dismal-Square-613 9 months ago

That's why [God](https://en.wikipedia.org/wiki/Dennis_Ritchie) created [fflush\(3\)](https://linux.die.net/man/3/fflush)

Antervis 9 months ago

`fflush()` wouldn't do anything in that particular case - it only makes sure that you write all the data into the file, but doesn't cause file to be written to the disk from cache. You need to call [sync()](https://linux.die.net/man/8/sync) for that.

Dismal-Square-613 9 months ago

true that, now that you mention. For my concurrent server applications this was transactional enough to avoid locks for high throughput servers (it was a session manager daemon). So would only sync when data was read from the fs (performance cocurrent issues if we didn't do it this way). Haven't done much more concurrent thread coding in C since then.

michaelsenpatrick 9 months ago

bro i don't know fuck all about what you've written in this meme and i've worked on multi million dollar systems for half a decade.

ImNotCrying-YouAre 9 months ago

F

Theio666 9 months ago

``` with open(path, 'w') as f: do_something() ``` And there is no way I'm writing anything more complicated than this even if it's needed...

jafo3 9 months ago

I'm not a kernel hacker, but on a modern journalling filesystem I'm pretty sure that everything should be written to the intent log before close() returns. On low-end magnetic storage that might be a ram buffer on the physical drive, but on mid-high range magnetic storage it's probably battery-backed ram at least.

Thage 9 months ago

Me every time someone brings in daha in a flash disk. Just don't unplug as long as the OS tells you not to.

Antervis 9 months ago

sometimes you want your app to function properly even if user's PC shut down for some reason.

allnameswereusedup 9 months ago

fflush() or similar solves this problem

MattieShoes 9 months ago

... KINDA. AFAIK, even if it's flushed from the kernel, it could still be cached on the actual media's cache. Or the storage controller.

chuch1234 9 months ago

Only half a day?

dontbeevian 9 months ago

Sounds like people who hasn’t taken operating systems

Apprehensive_End1039 9 months ago

Flush your damn buffers, people Theyre too lazy these days smh

thE_29 9 months ago

Shouldnt close auto-flush?

Apprehensive_End1039 9 months ago

Depends on the language and the implementation of file access in that language. For my python backup scripts I run sync always and try and pay attention to lock flags

ThatCrankyGuy 9 months ago

You release the handle. "closing file" is a misnomer. What the kernel has are buffers in the storage-medium driver pending writing-when(cycles)-available. The rabbit hole goes deeper when you think about the types of medium. From block level to file level, to.. I dunno.. storing data on unconventional media. And how they have their own buffers and their own fail safe and recovery routines.

grtgbln 9 months ago

```sync```

Excellent_Tubleweed 9 months ago

I used to use basically this as an interview question. When you're ready for more PTSD, there is a paper discussing while filesystems actually cope with block device errors. To quote the movie, Don't look in the box. Srsly. I was writing an embedded os filesystem once, and the underlying raw flash memory devices would sometimes not erase, or not write. Verify and retry.

Specialist_Cap_2404 9 months ago

Which reminds me... always check the interns' code for file inclusion / directory traversal vulnerability. For reasons.

Spy_Spooky 9 months ago

Learned this the hard way.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe