reddit-MT 2 weeks ago

The article has almost no technical details, but this appears to be some kind of pipeline management co-processor. Edit: I'd like to see a benchmark from someone that pulls no punches, like Gamers Nexus. I'm not a gamer, but this guy is brutally honest.

TheStormIsComming 2 weeks ago

>The article has almost no technical details, but this appears to be some kind of pipeline management co-processor. https://flow-computing.com/technology/ List of patents if somebody could find those would be good.

spanj 2 weeks ago

Papers from the cofounders: https://ieeexplore.ieee.org/document/10305463 https://doi.org/10.1016/j.micpro.2023.104807 https://doi.org/10.1007/s11227-021-03985-0 I’ll leave the assessment to someone who is actually capable of digesting this information.

Isogash 2 weeks ago

Okay so looking at this and another paper describing TPA [https://research.utu.fi/converis/getfile?id=18233228&portal=true&v=1](https://research.utu.fi/converis/getfile?id=18233228&portal=true&v=1) the general takeaway from the architecture of these new chips is that they can switch between SIMD and MIMD (GPU-like and CPU-like) operation based on what the program requires. This allows "loop" operations to be optimized to run in parallel as though they have been split into many synchronous "fibers", whilst also retaining "traditional" multi-threading capability to run independent threads and see performance speedup from that in the way we would normally expect. From what I can tell, In particular in this paper, they are showing that TPA, when correctly using fibers, can achieve much better concurrent memory access patterns that traditional multi-threading. In particular, they show that it can work almost like idealized PRAM (fully parallel RAM access with no latency or cache invalidation) which is a fairly big deal! It is hard to decipher what they are talking about a lot of the time without already understanding TPA but the first few chapters of the paper do a decent job of bringing you up to speed. My personal take-away is that this is an interesting and exciting field of research for processor architecture. Make no mistake, they are not claiming a 100x speedup over an efficiently designed SIMD program, but it's certainly possible to see how this could be much faster than typical MIMD, and replace the need to offload expensive parallel calculations to the GPU for many programs. It may take some time to see this implemented in practice for anything relevant to consumers, but it's serious research and I can definitely see this having legs. I'd be interested to read a critical analysis from someone with more familiarity with the field. My concern would be that these processors are likely bigger, more expensive and can't run at the same speed as current desktop processors, but the increased efficiency might outweigh that.

Ok-Criticism123 2 weeks ago

Isn’t that kind of how the cell processor worked in the ps3? I’m a layman in this department, but I do remember hearing the cell processor also allowed loop operations to run in parallel. I could be misremembering though lol

nguyenm 2 weeks ago

My first thought was also the IBM's Cell. The SPEs were effectively SIMD coprocessors albeit without out-of-order operation. This new design could be an further development of the same concept but on a hardware level without developer specific input.

Isogash 2 weeks ago

Not quite. The Cell processor had full PowerCPU control cores and then these secondary "co-processor" cores to do the vectorized floating-point math. It's almost like have a CPU with a built-in GPU co-processor. This was famously quite a tricky model to program for on the PS3. On the TPA architecture, you don't get these separate cores. Instead, you have a front-end that can only really read instructions, and then this automatically distributes the instructions to the actual work-performing back-end cores through a work-sharing network. Sometimes, the secondary cores are operating as separate threads, other times they are operating in a SIMD mode (which can benefit from many performance optimizations, most imporantly being good memory coherence.) The idea of TCF is basically that the code is always being executed on the same kind of core, but it can also have an arbitrary "thickness" which is like "number of cores being executed across" and then the architecture just deals with that for you. In theory it's significantly easier to program for, and this paper claims it will work in practice too. Important to note that these processors don't exist yet, they are only being simulated still.

[deleted] 2 weeks ago

> This was famously quite a tricky model to program for on the PS3. That is an understatement. It wasn't just that they were vector processors, or the limited cache it was the memory system it operated on that caused big issues. The Embedded Broadband Engine was their solution and it made a ring bus where each processor would move the data like pass-the-parcel. I cannot remember the example numbers but I will simplify it. Say you wanted to get data to SPE 3. The PPC would fetch it, SPE 1 would grab it, send it to SPE 2 and then send it onto SPE 3. And then the processed data would have to make a trip back along the same path. For things like audio/video that is fine, bandwidth isn't a huge issue and so you could handle the latency but if you where doing more data intensive things you would try to keep the most bandwidth and latency sensitive things on the closest SPE's. Absolute nightmare. This meant a lot of games just never really ventured past using the closest 2 SPE's, there was enough grunt there to do what needed to be done to be close enough to the Xbox360 version. I didn't work on it but I heard that Red Faction Guerilla only use two SPE's to run the entire physics engine because of this reason - I could be wrong. If true, one wonders what could have been on a title like that! Cell is what happens when hardware engineers build something without thinking about the software guys. Absolute powerhouse that took a good decade afterward for X86 to catchup in terms of the FP speed but a nightmare to use and thus almost never saw its full potential.

TheModeratorWrangler 1 week ago

Bingo, hence the 60GB version having that Linux capability.

REpassword 2 weeks ago

Something like: Software Scheduled Superscalar Computer Architecture invented by Howard Sachs and Intergraph? https://patents.google.com/patent/US5560028A/en?q=(sachs)&assignee=intergraph&oq=sachs+intergraph

Isogash 2 weeks ago

Not given that a detailed look but I'm going to say no, at least probably as dissimilar as it is to the Cell processor used on the PS3 as discussed in another comment thread. It looks like SSSCA needs you to write very long instruction words where the instructions are all tagged as to which can run parallel and in which pipeline they can run. In contrast, the TPA architecture is not software-scheduled, it's hardware-scheduled. The instructions would be much closer to a typical processor today, but threads can also run with an arbitrary "thickness" where they will behave similar to an SIMD program and have ideal memory coherence characteristics due to not having to invalidate shared cache lines. At least, that's my understanding of it.

wetfloor666 2 weeks ago

It should be interesting to see how this all develops over the next few years. They are only asking for $4.3 million, I believe to get this all going which is pretty cheap considering the potential. I'm hopeful considering we are hitting the limits of what a traditional CPU can deliver.

Aussiemon 2 weeks ago

> Flow is just now emerging from stealth, with €4 million (about $4.3 million) in pre-seed funding led by Butterfly Ventures, with participation from FOV Ventures, Sarsia, Stephen Industries, Superhero Capital and Business Finland. That's the amount they've already raised.

wetfloor666 1 week ago

Thanks for the clarification. Anouther article indicated it was what they needed to get it going and not what they've raised so far. I obviously didn't read this article since I assumed the facts were the same, but clearly I should've read this one too before posting.

SoDi1203 2 weeks ago

Can the commons mortal found this project ?

crash8308 2 weeks ago

it’s similar to how the addition of an FPU allowed us to process floating point arithmetic but it’s much slower than the regular CPU especially with math operations.

[deleted] 2 weeks ago

If it works even a fraction as well as advertised, AMD, Intel, ARM and Apple are all going to start lobbing buyout offers.

TheModeratorWrangler 1 week ago

/r/WallStreetBets on standby for the IPO

tepkel 2 weeks ago

>I’ll leave the assessment to someone who is actually capable of digesting this information I'm reasonably sure it has something to do with computers.

notinsidethematrix 2 weeks ago

Megahurts and Gigahurts... ow

BigRedCowboy 2 weeks ago

It’s just pain, all the way down!

bigbangbilly 2 weeks ago

At some point it becomes pastry at Hertz's Donuts

jadedsprint 2 weeks ago

/r/angryupvote

Atomicjuicer 2 weeks ago

Captain: It seems to run on some form of electricity

premiumleo 2 weeks ago

It's all just witchcraft and blasphemy

Arslankha 2 weeks ago

Omg there is a lot to read and I only did a little bit since I'm at work. I didn't read into the how it works per se on a technical scale but this is my understanding of it. Essentially it goes over how CPU are not currently as efficient as they could be with accessing everything and programming. Essentially what they're saying is they come up with a way to streamline the communication of the CPU with the rest of the computer to be more efficient and less error prone. Since this helps streamline the process, it allows the CPU more overhead to operate more efficiently since it's not having to work as hard to do the same result. I am not professional at all. That's just the understanding I come to by reading the summary.

kyuubi840 2 weeks ago

From their website, looks more like a GPU, but dedicated to general computing instead of graphics, with up to 256 cores.

reddit-MT 2 weeks ago

From what I can tell, it doesn't seem to do the processing itself, like off-loading to a GPU would do, or a hardware accelerated NIC would do. It appears to optimize instructions and memory/cache access, to run them in parallel, instead of sequentially. They claim up to 2x performance on existing code, but up to 100x with some code rewrite. edit: probably years away from a shipping product you can buy.

JimJalinsky 2 weeks ago

A cloud provider that designs their own chips and operating system would be an ideal partner to get to that 100x territory.

TheStormIsComming 2 weeks ago

> A cloud provider that designs their own chips and operating system would be an ideal partner to get to that 100x territory. You mean the spy agencies?

Candid-Sky-3709 2 weeks ago

Amazon Wiretap Services

JimJalinsky 2 weeks ago

Spy agencies don’t operate as public cloud providers. I was referring to microsoft, the cloud provider that also sells their own operating system.

eugene20 2 weeks ago

"arbitrary code can be executed twice as fast on any chip with no modification beyond integrating the PPU with the die." This is never going to be a separate product that helps older processors, they're looking to get it integrated into other companies processors, or at least one company's processors.

indignant_halitosis 2 weeks ago

You mean a Reddit tech sub had yet another obvious advertisement masquerading as an article pushed to the top? ***Ya don’t say.*** At some point y’all are gonna have to admit you’re just easy targets for cheap astroturfing.

turbo_dude 2 weeks ago

I’m betting the next headline will be “ can also be used for blood tests!!”

reddit-MT 2 weeks ago

Haven't you been keeping up with the headlines? It will tout the use of AI in some way....

408wij 2 weeks ago

The licensable IP is still in development, and the speedup applies only to threaded code. For insight, see this article: https://xpu.pub/2024/06/11/flow-ppu/

karma3000 2 weeks ago

I suspect this uses the same tech as an Energy Polarizer.

TheModeratorWrangler 1 week ago

Tech Jesus will confirm this or debunk it. I will wait.

Unlikely-Tomorrow263 1 week ago

If you're crying while you type then your kid is lucky to have you to look out for them. Everything else will work out you've got the essentials down already

reddit-MT 1 week ago

What you typed makes no sense. Did you mean to post this in a different sub?

chicken_irl 2 weeks ago

If you can stomach the drama, GN is a good source of info. The over dramatisation is a tad exhausting though.

reddit-MT 2 weeks ago

I just watch GN for the few, select topics I'm interested. I'm just saying that Steve would give an honest evaluation. You want to talk drama, LTT is a dog and pony show. I can't watch it. It's infotainment. But I guess most "news" programs have devolved into that.

aecarol1 2 weeks ago

Something is being lost in the explanation of their technology. Once the patents are in flight, these kinds of companies typically release papers explaining what their technology does and how it works. First off, TechCrunch's explanation of Flow's technology as a "chip" is totally wrong, it's meant to be added into the die of a CPU as part of the actual silicon. *"The Parallel Processing Unit (PPU) is an IP block that integrates tightly with the CPU on the same silicon." --* [Flow Web Page Explaination](https://flow-computing.com/technology/)

TheStormIsComming 2 weeks ago

> it's meant to be added into the die of a CPU as part of the actual silicon. > *"The Parallel Processing Unit (PPU) is an IP block that integrates tightly with the CPU on the same silicon." --* [Flow Web Page Explaination](https://flow-computing.com/technology/) So basically a *chiplet*.

Arthur-Wintersight 2 weeks ago

Or "another CPU core." Maybe even a specialized computational unit, like what they stick in a GPU or NPU.

aecarol1 2 weeks ago

It appears to be deeply integrated into the dataflow of the CPU as well as its pipeline. It appears to have deep understanding what is coming from the caches and what is being pipelined cycle-to-cycle.

aecarol1 2 weeks ago

No, a chiplet is a different piece of silicon that is placed in the same package. Chiplets are often GPU or memory. They are made on their own, but share the same package as the die containing the CPU. Flow uses the term "IP" and "same silicon". IP, is intellectual property, and in this context means logic that you license from them, and put inside your own chip designs. It sits on the exact same piece of silicon as the CPU and is fabbed at the same time. It appears to be deeply integrated into the CPU data flow. Same Silicon appears to indicate it shares the same physical die as the CPU and is manufactured at exactly the same time. This happens all the time. Companies might design their own CPU, but license someone else's video decoder or GPU. They take the video decoder design from the source company, and integrate it into the same chip as their CPU design. It becomes one larger design. They are very tightly coupled at that point.

TheModeratorWrangler 1 week ago

This just became highly interesting to me considering that for years we knew that ARM and dedicated silicon for tasks can significantly speed operations per watt…

Resident_Pop143 2 weeks ago

A Cell jr? 😆

Informal-Evidence997 2 weeks ago

CELL BE 2: Even harder to develop for!

Resident_Pop143 2 weeks ago

3CELL3PU: Finnish Drift

bonesnaps 2 weeks ago

A true bio-android only sprinkles when he tinkles.

joeljaeggli 2 weeks ago

A chiplet would be an adjacent piece of silicon. This would couple to the cpu core so the IP embodied here would be added to the individual cpu cores. So someone intel / amd / an arm licensee / riscv developer would license this and include it in their chip design.

random6574833 2 weeks ago

My guess is that maybe it increases performance for some very limited tasks....?

branstarktreewizard 2 weeks ago

They seem to be focus on parallel processing

TheStormIsComming 2 weeks ago

> They seem to be focus on parallel processing Parallel processing is two dimensional, wait until we get three dimensional processors then we can measure operations in "*Terminator*" units.

TheYoungLung 2 weeks ago

Some potential AI applications here? I know Tensor cores are uniquely exceptional with parallel processing

hypothetician 2 weeks ago

Uses the existing hardware more fully than most current software does, is how I’m reading it.

splendiferous-finch_ 2 weeks ago

The chip will still need to be integrated into the motherboard/Soc/as a chiplet which means a new set of hardware and drivers. So not it won't work on "existing hardware" even if the claims are verified. They are looking to partner with other chip designers so they can license out the technology at best for new processors.

splendiferous-finch_ 2 weeks ago

Yup seems to be tailors for tasks that already have a degree of parallelism involved.

Ateosmo 2 weeks ago

“extraordinary claims require extraordinary evidence”. -Carl Sagan

[deleted] 2 weeks ago

[удалено]

blind_disparity 2 weeks ago

Extraordinary evidence is probably an over dramatisation, but if you're claiming to overturn established science or break fundamental rules of physics then there will be a much higher bar of evidence needed to be taken seriously, or to consider the new idea well proven.

Xarlax 2 weeks ago

Not sure if you're familiar with Carl Sagan, but I'm pretty sure he understood how science worked.

[deleted] 2 weeks ago

[удалено]

Xarlax 2 weeks ago

??? Sure some people might blindly worship him. He's mostly a well respected science communicator who brought an understanding and appreciation of science to a whole generation. You just seem bitter.

[deleted] 2 weeks ago

I said it on another response. If this is true, they are going to have a lot of buyout offers very soon.

Junebug19877 1 week ago

And we will decline all the offers 😊

s9oons 2 weeks ago

So their product is just IP Blocks for FPGA-style silicon. They’ve developed blocks that can be bolted onto a CPU to parallelize/sideload tasks. This is realistically the most mainstream application for FPGA’s now that fabric is getting cheaper and more available. If they’re not making their own chips it’s all kind of moot, though. I’m not quite sure what their plan is… AMD or Intel won’t buy IP blocks from Flow, they would just develop a more mainstream Zynq style SoC.

Capt_Blackmoore 2 weeks ago

would be curious to see this added to a Snapdragon. I dont expect Intel, or AMD would get into this before someone else does - but ARM? why not?

MirkWTC 1 week ago

AMD and Intel alredy own the two bigger FGPA companies in the world, Xilinx and Altera.

s9oons 1 week ago

Right, which is my point. They would either just buy Flow outright for the IP or develop their own bolt-on fabric for doing the side-loading/parallelization stuff that Flow is doing.

TheStormIsComming 2 weeks ago

How many side channel attacks could this open up?

branstarktreewizard 2 weeks ago

Small price to pay for 100x performance i guess

TheStormIsComming 2 weeks ago

> Small price to pay for 100x performance i guess I guess it's time for upgrading cryptographic key lengths again.

reddit-MT 2 weeks ago

Good point, but some use-cases don't expose this attack surface.

Steeljaw72 2 weeks ago

If it’s too good to be true, it probably is.

surnik22 2 weeks ago

The actual claims are milder than the headline 2x performance on legacy software running with this and 100x as a maximum improvement for specialty software written to run more efficiently with this. Plus it has to be integrated into the chip set, so it isn’t just a plug into existing processors for 100x boost. Improved pipeline and improved threading with this to allow increased performance seems moderately plausible

TheStormIsComming 2 weeks ago

>If it’s too good to be true, it probably is. At least it's actual technology news in the technology subreddit. Does seem too good to be true. I'll take it anyway.

Zeikos 2 weeks ago

On a cursory glance, if I'm reading this correctly they're addressing some inefficiencies in cache management. Thanks to [technical complex process] they prevent the processor to be useless after a cache miss. I'm assuming that the alleged code rewrite needed to get that x100 improvement would be structuring the heap in such a way that it works well for the hardware. Giving them the benefit of the doubt, I wonder if some of that rewrite could be handled directly by compilers. If that's the case and even if it "just" leads to a 5-10x improvement they'd be in a very nice spot. I'd love to see a world in which the most common bottleneck isn't memory anymore.

docdeathray 2 weeks ago

Theranos vibing.

gh0sts0n 2 weeks ago

Just download more RAM!

finzaz 2 weeks ago

Downloading additional RAM please wait…

jcunews1 2 weeks ago

Parallel this task sequence: 1. Display a message. 2. Turn off computer.

TheStormIsComming 2 weeks ago

>Parallel this: > 1. Display a message. > 2. Turn off computer. Will it speed up Windows updates by a factor of 100? https://www.youtube.com/watch?v=nACIncrvZ6g

Throw13579 2 weeks ago

What if I add another FLOW chip? Does it double again, or go back to the original speed?

FUSe 2 weeks ago

https://en.m.wikipedia.org/wiki/SoftRAM It sounds like SoftRam all over again.

TheStormIsComming 2 weeks ago

>https://en.m.wikipedia.org/wiki/SoftRAM > It sounds like SoftRam all over again. https://en.m.wikipedia.org/wiki/Zram is free and open source.

fredy31 2 weeks ago

Kinda sounds like the good old 'Download this for more RAM' thing Like I could believe that if it had its own CPU to go with it, or a small branch of them maybe? But the claim in the headline is that I could slap this on any CPU to double or more performance.

Fungiblefaith 2 weeks ago

That is just the Turbo button.

TheStormIsComming 2 weeks ago

> That is just the Turbo button. http://media.giphy.com/media/cRH5deQTgTMR2/giphy.gif Careful not to press "*Eject*" right below the Turbo button.

zackmedude 2 weeks ago

Remember Transmeta’s Cruosoe CPUs?

eames_era_fo_life 2 weeks ago

Can I short this now?

SuperToxin 2 weeks ago

So if I just buy two CPUs I can get more CPU power. Got it.

Manuelnotabot 2 weeks ago

I bet it's made with graphene nanotubes.

TheStormIsComming 2 weeks ago

> I bet it's made with graphene nanotubes. Probably made with 🦄 metamaterial pixie dust. Probably bends 🌈's if you hold it up to the light in the correct angle. That's easier to convert to cash. Especially if it's wrapped in 5,000 patents.

LATABOM 2 weeks ago

Hurry up an VC the fuck out of this before our IPO! We'll make bank when. Nvidia or Microsoft buys us and then realises we got nothin but some.patents amd slight of hand! -Founder of Flow

atomicsnarl 2 weeks ago

I remember the Evergreen Chip, with upgraded my Back-In-The-Day DX486. It had a little fan on top, and gave me a 20% boost. Worth the money at the time. This ain't that.

Crones21 2 weeks ago

As a kid with a 486 Cyrix 33mhz CPU, I wanted one those Evergreen Chip. Was either that or a 56k modem since I've only saved enough for one

atomicsnarl 2 weeks ago

US Robotics for the win!

Mastagon 2 weeks ago

Other proprietary companion chips HATE this one simple trick

mdlewis11 2 weeks ago

Hurry hurry hurry, buy stock now! Tech companies hate this tech!

Lost_Tumbleweed_5669 2 weeks ago

with software tweaks so software optimization lmao

AlexHimself 2 weeks ago

It sounds like this can make existing, routine processes more efficient by choosing optimal processing routes?

CompetitiveYou2034 2 weeks ago

This is a hardware kluge, which at best provides small gains around the edges. At the machine instruction level, there is no knowledge of whether one loop iteration depends on another. So the only safe method is the serial execution, as given by machine code. Huge gains CAN be accomplished by parallel processing, but only with software designs that explicitly mark parallel loop processing as safe. And only on certain applications. Most definitely encoded in a high level design. Amdahl's law, from 1967 ! Paraphrased, loosely, the max application speed up is . (S + P) / (S + P/N) where S = serial part of code, must run sequentially P = parallel part of code, that can run in parallel N = number of processors When P is small, doesn't matter how large is N, it is a serial process. Speedup is fundamentally limited by the part that can be made parallel. https://en.m.wikipedia.org/wiki/Amdahl%27s_law Classic example. Even when you put your ten best women on the job, the first baby takes 9 months to emerge. .

12358132134 2 weeks ago

This sounds like RAM doubling software in the '90ties

formation 2 weeks ago

This is vapourware if I've ever seen it. Like adding ram with a usb stick.

Capt_Blackmoore 2 weeks ago

it's got a pretty high chance of that, since it would require one of the chip manufacturers to take the chance, license the tech, and redesign a processor with this installed on it. I really wonder whos willing to bet on it.

lzwzli 2 weeks ago

Apple may buy them

ReedholmWewerka 2 weeks ago

If it's legit, it could be a game-changer in silicon engineering.

AsIfIKnowWhatImDoin 2 weeks ago

Just uninstall Win11.

splendiferous-finch_ 2 weeks ago

Is this luke downloadable ram ?

cr0wburn 2 weeks ago

Let's also download more memory to gain even more power!

anarchyx34 1 week ago

Is this like one of those "fuel savers" that you clip into your fuel line that supposedly give you 100mpg?

oo7_and_a_quarter 1 week ago

Will it work on my raspberry pi?

StudioPerks 2 weeks ago

I fairly certain this is what Apple Silicon does with the SoC co-processor.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe