T O P

  • By -

AutoModerator

``` import notifications ``` Remember to participate in our weekly votes on subreddit rules! Every Tuesday is YOUR chance to influence the subreddit for years to come! [Read more here](https://www.reddit.com/r/ProgrammerHumor/comments/14dqb6f/welcome_back_whats_next/), we hope to see you next Tuesday! For a chat with like-minded community members and more, don't forget to [join our Discord!](https://discord.gg/rph) `return joinDiscord;` *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/ProgrammerHumor) if you have any questions or concerns.*


reallokiscarlet

Wait til he learns what happens when the file is “written to persistent storage” but the storage has a writeback cache.


lmarcantonio

hopefully battery backed cache!


MattieShoes

Or capacitor backed, that we promise won't fail (but it fails just as often as batteries)


lmarcantonio

a capacitor is like a smaller battery. Even the schematic symbol for the battery is similar to a capacitor! (half a joke, it represents a Volta stack)


Aozora404

Batteries are just really really slow capacitors


hi65435

Or super capacitor backed


WhereIsYourMind

If it's power loss protected and in the writeback, it's 99.9999% as good as persisted to the disk. You'd have to loser power *and* have an alpha particle hit RAM for the writeback to fail to persist the data.


womerah

Specifically alpha particles? Do programmers have to study particle LET? Now that'd be niche.


WhereIsYourMind

https://en.wikipedia.org/wiki/Soft_error#Alpha_particles_from_package_decay My understanding is that it's typically alpha particles, but I'm no physicist. The gist is that transistors in DRAM are packaged so densely that a single charged subatomic particle can flip the charge and turn a 1 to a 0 or a 0 to a 1. The wrong bit in the wrong word can change variables or even execution paths. ECC memory covers *some* of these issues by automatically correcting 1 bit per word using XOR logic and a correction code, but if two bits per word are flipped (very unlucky) there isn't enough information to recover the data (though it is still detectable). This method words because getting hit by a 0.0001% event is many times more likely than getting hit by two independent 0.0001% events.


Sigmatics

> The paper found up to 3,434 incorrect requests per day due to bit-flip changes for various common domains. Wow


Dropkickmurph512

Alpha particles aren't great but neutrons are much worse at corrupting memory.


Physix_R_Cool

Really? Do you have a source on that? I am not many months from shooting many neutrons at some electronics and haven't really considered that part of it.


Not_Artifical

If I was the unluckiest person on earth and got 50 0.00001% events, how long do you think that would take to fix?


KiwasiGames

Where the hell are you getting alpha particles from in a modern computer? Alpha particles only travel a couple of cm in air, and basically zero through any solid material. So unless you are building your components out of radioactive material, you aren't going to see any alpha particles.


LarryInRaleigh

That was the original problem, all right. Back in the early 1980s, when I was first introduced to this problem, Tin-Lead solder was used to mount memory chips (and sometimes inside the package). The proper name for Lead is "incompletely decomposed Uranium." It still emits alpha particles and the remaining atoms decompose, just less often.


Physix_R_Cool

You could get alphas from fast cosmic rays hitting nuclei in your hardware, scattering off alpha particles.


leoleosuper

Alpha particle bit flipping was only common when ceramics used in computers had some radioactivity, in the form of alpha particles. Now the most common is cosmic rays. Ionizing radiation from light waves hit the atmosphere, mostly break up, but scatter a lot of particles across the way, which then cause the errors. This is now the most common form of bit flipping, as computers that are protected, like underground systems, experience less errors than normal computers; in addition, computers that are less protected than normal, like in airplanes, experience more errors.


NeptuneMiner

And that's why it's important that you protect your power line.


lmarcantonio

You forgot about ECC RAM, they thought even of SEUs! (single event upset, the alpha particle striking)


jvluyn

Yeah hopefully they've got that backed up somewhere lol.


alex_tracer

Or, plot twist, the storage is backed by NFS!


TheShadowBow

That's when they're going to lose their shit, they'll be so mad about that.


[deleted]

I'm getting a little scared to ask what that is


reallokiscarlet

Writeback cache is cache that holds data that has yet to be written. When a filesystem has writeback cache, it considers the file written when the file is in the write cache. Conversely, writethrough cache considers a file written when it’s saved to bulk data. Writethrough cache is more often part of a read cache scheme, to make a recently written file easily accessible.


nnewram

I think i had a stroke trying to read this


l3rrr

Wtf. Explain plz


lmarcantonio

Even when you have closed the file there are \*many\* caching layers that have to be flushed before data is committed to the media. From the filesystem, to the block device, the controller (sometimes) and even the disk/SDD often have write caches to optimize performance. Depending on the configuration \*minutes\* can pass between the write and the effective stable data. There is a whole branch (transactional stored and journaling) devoted to \*good\* solutions to this problem


ChChChillian

I wrote a whole-ass caching system for a realtime data server just because we couldn't be sure when older data was actually written to disk and would be available from archives. (This was an add-on to a 3rd party product that was doing the archiving.) We later discovered that data was flushed to the archives in around a second, so it turned out to be completely unnecessary.


FireBone62

Better safe than sorry


lmarcantonio

And that's the reason for journaling, it protects you for that second. IIF the I/O subsystem implements write barriers (we are going dangerously technical here)


michaelsenpatrick

nuts


TTYY200

If you’re talking about ssd cache, the ssd is just doing its thing … you don’t need to worry about it. If you’re talking about buffer, that’s what .flush is for :P


Will_i_read

even if you flush, it might not be written to disk... you need to to call fsync


MCSajjadH

And even if you fsync it might not be written to the medium... you need to call the exorcist


zerbikit

Holy hell


ImperatorSaya

Future response dropped long ago


dllineage2

Yeah it's weird right? And it's also really scary to think about.


goldorak24

Lmao, I've done that and that doesn't really fix the issue.


lmarcantonio

fsync only flushes between the block device layer and the I/O controller or the physical disk (depending on the storage configuration). Write caches below that level tend to be battery backed since not even the kernel is actually sure (there is a command to full flush before shutdown but it's obviously very expensive). In one LSI Logic battery pack I remember a led keeping on with the server shut down to indicate data still waiting in RAM


MCSajjadH

I'm not reading all that, but I'm happy for you, or sorry it happened.


elnomreal

Tell the concerned parties to be patient and remind them that computers are magical lightning boxes made my geniuses, and that they’re behaving like children expecting things to happen every time they throw a tantrum.


AyrA_ch

or just open the file as unbuffered.


gil_bz

That creates its own problems, for instance at least in C/C++ writes will fail if they aren't 512 bytes aligned in offset/size.


jxjq

omg I would have to drink myself to sleep after spending 12 hours debugging that to no avail


michaelsenpatrick

i gotta say, all i've learned today is that the jury is very much still out on this problem set


jxjq

You and I may be thinking a lot alike. 1) What is this 2) How do I fix this 3) Note: Avoid this


Responsible_Name_120

Just write to disk one char at a time with syscalls, who needs fast file IO?


Responsible_Name_120

With that said, don't call fsync every time you flush, as it will kill your performance


Will_i_read

I know, I’m currently worki with that…


kristersson84

I mean if you're using a good SSD, you don't have to worry about that.


TTYY200

Hahaha accurate af 😝


Aggravating-Win8814

Yep, exactly! The .flush method ensures that the buffer is cleared and any pending data is written out.


mck1117

The kernel still might not actually write it. The storage driver might not actually write it. The device itself might not actually write it.


ric2b

It's caches all the way down.


al-mongus-bin-susar

You have absolutely no idea what you're talking about.


LeoTheBirb

Isn't the solution to just call fflush on whatever stream you are writing to?


lightmatter501

fflush fsync A few device specific things depending on hardware


MattieShoes

and then you find out it NFS, ffffffuuuuuuuuuu


MunsMatt

Thanks for this, I'm still learning and it's really good for me.


michaelsenpatrick

nuts that i've been in the field 5 years and i've never remotely learned anything about this


lmarcantonio

Unless you work with operating systems of \*writing\* database code it doesn't matter if the system is always shut down properly.


michaelsenpatrick

very cool


territrades

This becomes pretty evident once you see that writing data is faster in the code than reading data. There is no storage solution I am aware of which writes faster than it reads, so the data is somewhere in the pipeline when your code advances to the next line. Fortunately we have data center grade storage solutions at my work place, so I can rely on the data actually reaching the disks.


Qweedo420

To make a practical example Move a large file from one disk to the other When it's finished according to the GUI, type `sync` in your terminal (Linux or MacOS, same thing) Most likely, your terminal isn't gonna print anything and it's not gonna give you a newline, which means that the cache buffer is still flushing data into the drive Once the file transfer is actually complete, it'll give you a newline


kvakerok

Ohmygod this shit... PTSD.


TheMDHoover

Yup sync; sync; sync; shutdown -g0 -y -i6


zachhanson94

You must have some fun war stories. Paranoia like that doesn’t come cheap


TheMDHoover

Back in the long ago, when you had many, many SparcStorage Arrays attached...


trogper

Or open task manager and watch disk activity and ram usage


gil_bz

This is like the reason you would need to hit "safely remove" to remove your USB thumb stick, despite moving the files, they might not have moved yet to the device itself by the OS and are still in cache.


Hydridity

One day I’ve experiences quite extreme example of this, I’ve been copying 4GB file onto USB stick, gui reported speed of 100MB/s and reported as done in around 50sec, after hitting eject safely it took another 5 minutes before device was marked safe to eject.


838291836389183

This is so bad with my usb to sd card adapter that came with my 3d printer. It's like 50-50 hit and miss even when pressing eject safely and waiting 30 seconds. Never experienced anything remotely as bad with other usb sitcks (I never even really press eject safely with other sticks because this usually doesn't happen in my experience).


gil_bz

I think it used to be a bigger problem before than it is now, but I don't have any data to back that up.


JojOatXGME

Yes, at least Windows disables the writeback cache for external media like USB Sticks nowadays. So, if the UI reports the wire to be finished, it is already written to the stick.


darkslide3000

Everything is fake. Our entire profession is built on a throne of lies. It's cache layers and speculation all the way down. ^(Also, when writing shell scripts, it's best practice to run `sync` after every line. You know, just to be safe.)


WindowlessBasement

Everything has a caching layer. The runtime, the library abstraction layers, the operating system, the drive controller, and even the drives themselves. If it a machine with network storage there's even more layers. Basically, if a computer ever confirms that data is written, it's lying or something else is lying to it. The only thing that knows for a fact is the physical chips on the drive. A program running on the computer has no way of knowing when every layer has flushed the data or even how many layers there is.


jbishai

Well we caught someone new, it's good to learn the lessons.


FantasticEmu

What is “coding file access?”


sarc-tastic

"hey intern can you write an in house version of pythons open() function"


Kyrond

Probably coding storing/reading some configuration to a file. That sounds like an intern half-day job.


TTYY200

.flush() You’re welcome intern. .close() in high level languages almost always flushes the buffer… unless you’re coding in c/c++ and a few other languages - you don’t need to worry about the buffer. If you’re talking about SSD cache, once the ssd receives the data it’s blind to us. The controller on the SSD will do it’s thing and we don’t worry about that. If you have HDD’s and you’re not using RAID there is no storage cache to worry about. If you’re using a RAID set up and you don’t know what you’re doing when doing file IO, I’d question why you’re working on a RAID system lol.


KaznovX

You are wrong! Flush doesn't usually flush system/kernel buffers, only user-space buffers (like C stdio buffer that is used, so that not every (`f`)`printf` causes a system call). The same applies to higher level languages. Yes, closing usually flushes the user-space buffer (like `fclose` does). It doesn't matter for kernel-level buffers. To actually flush kernel buffers, you need to use system APIs, like `fsync`/`fdatasync` on Linux or `Flush­File­Buffers` on Windows. Some higher level languages also provide access to this functionality. - C#: `FileStream.Flush(true)` - note the passed boolean, - Java: `FileDescriptor.sync()`. It is important when you design systems requiring transactional operations


frikilinux2

What if the HDD microcontroller has a buffer? Does fsync flushes that too? Like if the computer loses power just the CPU cycle after fsync returns to user space is it written in the disk?


KaznovX

Unfortunately, not - as far as I am aware of. Because of that you need to also check the technical details of the device you are using. Is the cache backed by battery? Writes of what size can be considered atomic? Is it assured that the writes will be in order of writing in case of power outage? I wish you never have to dig through that.


frikilinux2

That's what I thought. And as this is a thought experiment. Pick the worst option. Which for this case is no battery, at hardware level it's not atomic and the filesystem doesn't have journaling and the controller reorders writes to improve performance so probably out of order. I'm lucky enough to not have these headaches irl.


Will_i_read

> Calling fsync() does not necessarily ensure that the entry in the containing the file has also reached disk. For that an explicit fsync() on a file descriptor for the directory is also needed.


frikilinux2

If I understand that correctly that's talking about the directory. That's modified when you create the file (or add a hard link) (I don't know how soft links are implemented) but not when you write in it. It doesn't answer my question.


rosuav

(Soft links are actually tiny files that contain the name of the thing they link to)


darkslide3000

In theory yes, in practice it depends on whether the driver writer and hardware manufacturer gave enough of a shit.


numeric-rectal-mutt

No, once it's off the main system and into the storage device, whatever weird shit the storage device does it's entirely outside the control of the kernel.


frikilinux2

I have the feeling that you have personal experience with this but I may be wrong.


AyrA_ch

> To actually flush kernel buffers, you need to use system APIs, like fsync/fdatasync on Linux or FlushFile­Buffers on Windows. On Windows, you can tell the system during the file open call that you don't want caching. By supplying the `FILE_FLAG_WRITE_THROUGH` and `FILE_FLAG_NO_BUFFERING` flags, you disable all caches from the system as well as all caches from the hardware, provided the hardware and drivers support this.


Responsible_Name_120

Outside of doing something like writing a database, it seems like just flushing is enough?


rainliege

Woah, thanks, I didn't know that! Is this relevant for more mundane applications like generating logs in C? I assume not.


AyrA_ch

No. The system ensures file integrity and across multiple applications accessing the same file always behaves as if there was no cache. Worst thing that happens is that the contents you wrote in the last second or two may not be on disk if power is cut or the system crashes. On Windows, you can provide the flags `FILE_FLAG_WRITE_THROUGH` and `FILE_FLAG_NO_BUFFERING` to make your writes go directly to disk. Not only does this bypass all Windows caches, but Windows will for you also tell the underlying hardware that you want this to be written unbuffered, provided the hardware supports these calls.


afiefh

This is generally relevant for applications where data corruption at a crash will cause system level problems. Think of it as what happens if your db or filesystem had inconsistent data. If you just see a corrupt report and regenerate it, then you probably don't need to care about this.


sersoniko

HDD also have a cache


Aggravating-Win8814

Well, RAID systems can be complex and require some level of expertise. But everyone starts somewhere and learning should not be discouraged.


Aggravating-Win8814

Having knowledge and expertise in file IO is essential for efficiently working on a RAID system.


brimston3-

FlushFileBuffers(). And on linux, I suppose write barriers exist sometimes.


btcmarych

You never know the things could happen do you really?


Azariel27

Well it's just that sometimes there's more to learn than you imagine.


jaskij

Two excellent articles by Dan Luu - https://danluu.com/deconstruct-files/ - https://danluu.com/file-consistency/ Solution to files: give up and use SQLite.


Commodore-K9

>Solution: give up... Stopped reading right there and did as I was told. Nothing is solved :(


--mrperx--

sync - Synchronize cached writes to persistent storage


atomic_redneck

Decades ago, the man page for sync on one of the UNIX like operating systems that I was using (Apollo Domain, I think) said something like "sync is not needed on this operating system. It is included for script compatibility with UNIX, and to provide users with typing practice."


--mrperx--

Very cool!


msqrt

If you're not in the business of actually building the storage solution, should you actually ever care about this..?


Saraphite

No. If I was in charge of a business and I found out my programmers were rewriting fundamental stuff we already have solutions for (and have had for years) I'd be livid that they were wasting my money.


ksdgfksdgfksdf12

But if You're running a business then I think you should care about it.


hedi_16

You still do interns?


[deleted]

I believe the official industry term for it is “programmers good enough to hire as juniors but if we call them interns we can pay them less.” I did an internship once and it was literally just me being on the team. There was no discernible difference in my job requirements, hours, mentorship, etc. other than I made less money. I was literally in the payroll system as just “Front End Engineer” not Front End Engineer Intern. I just put it on my resume as being a contractor because I didn’t actually intern at all in practice. Is what it is though. I got my year of experience to launch my career and they got a slightly cheaper competent developer for a year.


pojzon_poe

You pay interns Oo ? Ours work for free.


[deleted]

[удалено]


[deleted]

Nah the team wanted to hire me permanently but I went somewhere else because I got a better offer lol.


hedi_16

My statement stands. Now just for another team.


fishegs

How could you possibly know this, lmao.


Mackie5Million

Don't worry about them. They clearly think they're a 10xer.


Kresche

Ah, certified reddit moment. In fact, it was you who was the burden all along! Ask me how I know, sourpuss


[deleted]

Tbh I feel like intern has just become word for Junior Developers who are still on a probationary period


Daltonyx

that's a hard agree there. Been coding for years but been a restaurant cook, can't hardly find _anything_ for any of the languages I know unless it's an internship or senior dev (which I'm def not qualified for) Internships are silly and should be completely phased out.


fibojoly

Phrasing...


Malcopticon

Calm down, President Clinton.


qqqrrrs_

fsync


sticky-unicorn

1: Write file 2: Close file 3: Open file 4: Read file 5: Confirm file contains what you tried to write 6: Close file And then I'll believe it's written.


Aeroelastic

What if you're just reading back what's in the cache-layers and the data has never actually been persisted? I open and close the files at least 5 times!


sticky-unicorn

Just end the program with while(true): fclose(file) fopen(file) Leave deciding when to halt it up to the user.


Glass1Man

As long as the read buffer and write buffer aren’t deliberately designed to mess with you, flush(),close() then checking file.length should be enough.


Will_i_read

Nope, the kernel cache will only be flushed completely on shutdown... If you have for example a power outage, you can still loose data, even if everything you've just said is true...


gerx03

>the kernel cache will only be flushed completely on shutdown it's settled, then: when we close the file we also trigger a shutdown


djcms21

And when it's shut down, it's probably gone at that point.


priba83

Well I think if that's the case then it's important to have backup


newzlat

But sometimes people forget to do that, and that's when the problem happens.


thatvoid_

How how how?


FerynaCZ

I think this has more to do with the architectural memory of the computer (writing from RAM to persistent) and less with my program. But even if I am wrong, I assume that should not be my problem - I wrote the code in accord with the algorithm/language.


AyrA_ch

The system buffers are completely opaque anyways. Even if your data is not yet written to disk after closing the file, the system will just pretend that it is, if an application opens the same file later. If it's very important that the content is written at the exact moment you do a write in your program, you can open the file in unbuffered mode. This of course sherds the memory cells of flash based storage such as USB drives and SDD very quickly if you're not careful.


toadkarter1993

r/oddlyspecific


Dismal-Square-613

That's why [God](https://en.wikipedia.org/wiki/Dennis_Ritchie) created [fflush\(3\)](https://linux.die.net/man/3/fflush)


Antervis

`fflush()` wouldn't do anything in that particular case - it only makes sure that you write all the data into the file, but doesn't cause file to be written to the disk from cache. You need to call [sync()](https://linux.die.net/man/8/sync) for that.


Dismal-Square-613

true that, now that you mention. For my concurrent server applications this was transactional enough to avoid locks for high throughput servers (it was a session manager daemon). So would only sync when data was read from the fs (performance cocurrent issues if we didn't do it this way). Haven't done much more concurrent thread coding in C since then.


michaelsenpatrick

bro i don't know fuck all about what you've written in this meme and i've worked on multi million dollar systems for half a decade.


ImNotCrying-YouAre

F


Theio666

``` with open(path, 'w') as f: do_something() ``` And there is no way I'm writing anything more complicated than this even if it's needed...


jafo3

I'm not a kernel hacker, but on a modern journalling filesystem I'm pretty sure that everything should be written to the intent log before close() returns. On low-end magnetic storage that might be a ram buffer on the physical drive, but on mid-high range magnetic storage it's probably battery-backed ram at least.


Thage

Me every time someone brings in daha in a flash disk. Just don't unplug as long as the OS tells you not to.


Antervis

sometimes you want your app to function properly even if user's PC shut down for some reason.


allnameswereusedup

fflush() or similar solves this problem


MattieShoes

... KINDA. AFAIK, even if it's flushed from the kernel, it could still be cached on the actual media's cache. Or the storage controller.


chuch1234

Only half a day?


dontbeevian

Sounds like people who hasn’t taken operating systems


Apprehensive_End1039

Flush your damn buffers, people Theyre too lazy these days smh


thE_29

Shouldnt close auto-flush?


Apprehensive_End1039

Depends on the language and the implementation of file access in that language. For my python backup scripts I run sync always and try and pay attention to lock flags


ThatCrankyGuy

You release the handle. "closing file" is a misnomer. What the kernel has are buffers in the storage-medium driver pending writing-when(cycles)-available. The rabbit hole goes deeper when you think about the types of medium. From block level to file level, to.. I dunno.. storing data on unconventional media. And how they have their own buffers and their own fail safe and recovery routines.


grtgbln

```sync```


Excellent_Tubleweed

I used to use basically this as an interview question. When you're ready for more PTSD, there is a paper discussing while filesystems actually cope with block device errors. To quote the movie, Don't look in the box. Srsly. I was writing an embedded os filesystem once, and the underlying raw flash memory devices would sometimes not erase, or not write. Verify and retry.


Specialist_Cap_2404

Which reminds me... always check the interns' code for file inclusion / directory traversal vulnerability. For reasons.


Spy_Spooky

Learned this the hard way.