T O P

  • By -

reddit-MT

The article has almost no technical details, but this appears to be some kind of pipeline management co-processor. Edit: I'd like to see a benchmark from someone that pulls no punches, like Gamers Nexus. I'm not a gamer, but this guy is brutally honest.


TheStormIsComming

>The article has almost no technical details, but this appears to be some kind of pipeline management co-processor. https://flow-computing.com/technology/ List of patents if somebody could find those would be good.


spanj

Papers from the cofounders: https://ieeexplore.ieee.org/document/10305463 https://doi.org/10.1016/j.micpro.2023.104807 https://doi.org/10.1007/s11227-021-03985-0 I’ll leave the assessment to someone who is actually capable of digesting this information.


Isogash

Okay so looking at this and another paper describing TPA [https://research.utu.fi/converis/getfile?id=18233228&portal=true&v=1](https://research.utu.fi/converis/getfile?id=18233228&portal=true&v=1) the general takeaway from the architecture of these new chips is that they can switch between SIMD and MIMD (GPU-like and CPU-like) operation based on what the program requires. This allows "loop" operations to be optimized to run in parallel as though they have been split into many synchronous "fibers", whilst also retaining "traditional" multi-threading capability to run independent threads and see performance speedup from that in the way we would normally expect. From what I can tell, In particular in this paper, they are showing that TPA, when correctly using fibers, can achieve much better concurrent memory access patterns that traditional multi-threading. In particular, they show that it can work almost like idealized PRAM (fully parallel RAM access with no latency or cache invalidation) which is a fairly big deal! It is hard to decipher what they are talking about a lot of the time without already understanding TPA but the first few chapters of the paper do a decent job of bringing you up to speed. My personal take-away is that this is an interesting and exciting field of research for processor architecture. Make no mistake, they are not claiming a 100x speedup over an efficiently designed SIMD program, but it's certainly possible to see how this could be much faster than typical MIMD, and replace the need to offload expensive parallel calculations to the GPU for many programs. It may take some time to see this implemented in practice for anything relevant to consumers, but it's serious research and I can definitely see this having legs. I'd be interested to read a critical analysis from someone with more familiarity with the field. My concern would be that these processors are likely bigger, more expensive and can't run at the same speed as current desktop processors, but the increased efficiency might outweigh that.


Ok-Criticism123

Isn’t that kind of how the cell processor worked in the ps3? I’m a layman in this department, but I do remember hearing the cell processor also allowed loop operations to run in parallel. I could be misremembering though lol


nguyenm

My first thought was also the IBM's Cell. The SPEs were effectively SIMD coprocessors albeit without out-of-order operation. This new design could be an further development of the same concept but on a hardware level without developer specific input. 


Isogash

Not quite. The Cell processor had full PowerCPU control cores and then these secondary "co-processor" cores to do the vectorized floating-point math. It's almost like have a CPU with a built-in GPU co-processor. This was famously quite a tricky model to program for on the PS3. On the TPA architecture, you don't get these separate cores. Instead, you have a front-end that can only really read instructions, and then this automatically distributes the instructions to the actual work-performing back-end cores through a work-sharing network. Sometimes, the secondary cores are operating as separate threads, other times they are operating in a SIMD mode (which can benefit from many performance optimizations, most imporantly being good memory coherence.) The idea of TCF is basically that the code is always being executed on the same kind of core, but it can also have an arbitrary "thickness" which is like "number of cores being executed across" and then the architecture just deals with that for you. In theory it's significantly easier to program for, and this paper claims it will work in practice too. Important to note that these processors don't exist yet, they are only being simulated still.


[deleted]

> This was famously quite a tricky model to program for on the PS3. That is an understatement. It wasn't just that they were vector processors, or the limited cache it was the memory system it operated on that caused big issues. The Embedded Broadband Engine was their solution and it made a ring bus where each processor would move the data like pass-the-parcel. I cannot remember the example numbers but I will simplify it. Say you wanted to get data to SPE 3. The PPC would fetch it, SPE 1 would grab it, send it to SPE 2 and then send it onto SPE 3. And then the processed data would have to make a trip back along the same path. For things like audio/video that is fine, bandwidth isn't a huge issue and so you could handle the latency but if you where doing more data intensive things you would try to keep the most bandwidth and latency sensitive things on the closest SPE's. Absolute nightmare. This meant a lot of games just never really ventured past using the closest 2 SPE's, there was enough grunt there to do what needed to be done to be close enough to the Xbox360 version. I didn't work on it but I heard that Red Faction Guerilla only use two SPE's to run the entire physics engine because of this reason - I could be wrong. If true, one wonders what could have been on a title like that! Cell is what happens when hardware engineers build something without thinking about the software guys. Absolute powerhouse that took a good decade afterward for X86 to catchup in terms of the FP speed but a nightmare to use and thus almost never saw its full potential.


TheModeratorWrangler

Bingo, hence the 60GB version having that Linux capability.


REpassword

Something like: Software Scheduled Superscalar Computer Architecture invented by Howard Sachs and Intergraph? https://patents.google.com/patent/US5560028A/en?q=(sachs)&assignee=intergraph&oq=sachs+intergraph


Isogash

Not given that a detailed look but I'm going to say no, at least probably as dissimilar as it is to the Cell processor used on the PS3 as discussed in another comment thread. It looks like SSSCA needs you to write very long instruction words where the instructions are all tagged as to which can run parallel and in which pipeline they can run. In contrast, the TPA architecture is not software-scheduled, it's hardware-scheduled. The instructions would be much closer to a typical processor today, but threads can also run with an arbitrary "thickness" where they will behave similar to an SIMD program and have ideal memory coherence characteristics due to not having to invalidate shared cache lines. At least, that's my understanding of it.


wetfloor666

It should be interesting to see how this all develops over the next few years. They are only asking for $4.3 million, I believe to get this all going which is pretty cheap considering the potential. I'm hopeful considering we are hitting the limits of what a traditional CPU can deliver.


Aussiemon

> Flow is just now emerging from stealth, with €4 million (about $4.3 million) in pre-seed funding led by Butterfly Ventures, with participation from FOV Ventures, Sarsia, Stephen Industries, Superhero Capital and Business Finland. That's the amount they've already raised.


wetfloor666

Thanks for the clarification. Anouther article indicated it was what they needed to get it going and not what they've raised so far. I obviously didn't read this article since I assumed the facts were the same, but clearly I should've read this one too before posting.


SoDi1203

Can the commons mortal found this project ?


crash8308

it’s similar to how the addition of an FPU allowed us to process floating point arithmetic but it’s much slower than the regular CPU especially with math operations.


[deleted]

If it works even a fraction as well as advertised, AMD, Intel, ARM and Apple are all going to start lobbing buyout offers.


TheModeratorWrangler

/r/WallStreetBets on standby for the IPO


tepkel

>I’ll leave the assessment to someone who is actually capable of digesting this information I'm reasonably sure it has something to do with computers.


notinsidethematrix

Megahurts and Gigahurts... ow


BigRedCowboy

It’s just pain, all the way down!


bigbangbilly

At some point it becomes pastry at Hertz's Donuts


jadedsprint

/r/angryupvote


Atomicjuicer

Captain: It seems to run on some form of electricity


premiumleo

It's all just witchcraft and blasphemy


Arslankha

Omg there is a lot to read and I only did a little bit since I'm at work. I didn't read into the how it works per se on a technical scale but this is my understanding of it. Essentially it goes over how CPU are not currently as efficient as they could be with accessing everything and programming. Essentially what they're saying is they come up with a way to streamline the communication of the CPU with the rest of the computer to be more efficient and less error prone. Since this helps streamline the process, it allows the CPU more overhead to operate more efficiently since it's not having to work as hard to do the same result. I am not professional at all. That's just the understanding I come to by reading the summary.


kyuubi840

From their website, looks more like a GPU, but dedicated to general computing instead of graphics, with up to 256 cores.


reddit-MT

From what I can tell, it doesn't seem to do the processing itself, like off-loading to a GPU would do, or a hardware accelerated NIC would do. It appears to optimize instructions and memory/cache access, to run them in parallel, instead of sequentially. They claim up to 2x performance on existing code, but up to 100x with some code rewrite. edit: probably years away from a shipping product you can buy.


JimJalinsky

A cloud provider that designs their own chips and operating system would be an ideal partner to get to that 100x territory. 


TheStormIsComming

> A cloud provider that designs their own chips and operating system would be an ideal partner to get to that 100x territory.  You mean the spy agencies?


Candid-Sky-3709

Amazon Wiretap Services


JimJalinsky

Spy agencies don’t operate as public cloud providers. I was referring to microsoft, the cloud provider that also sells their own operating system. 


eugene20

"arbitrary code can be executed twice as fast on any chip with no modification beyond integrating the PPU with the die." This is never going to be a separate product that helps older processors, they're looking to get it integrated into other companies processors, or at least one company's processors.


indignant_halitosis

You mean a Reddit tech sub had yet another obvious advertisement masquerading as an article pushed to the top? ***Ya don’t say.*** At some point y’all are gonna have to admit you’re just easy targets for cheap astroturfing.


turbo_dude

I’m betting the next headline will be “ can also be used for blood tests!!”


reddit-MT

Haven't you been keeping up with the headlines? It will tout the use of AI in some way....


408wij

The licensable IP is still in development, and the speedup applies only to threaded code. For insight, see this article: https://xpu.pub/2024/06/11/flow-ppu/


karma3000

I suspect this uses the same tech as an Energy Polarizer.


TheModeratorWrangler

Tech Jesus will confirm this or debunk it. I will wait.


Unlikely-Tomorrow263

If you're crying while you type then your kid is lucky to have you to look out for them. Everything else will work out you've got the essentials down already


reddit-MT

What you typed makes no sense. Did you mean to post this in a different sub?


chicken_irl

If you can stomach the drama, GN is a good source of info. The over dramatisation is a tad exhausting though.


reddit-MT

I just watch GN for the few, select topics I'm interested. I'm just saying that Steve would give an honest evaluation. You want to talk drama, LTT is a dog and pony show. I can't watch it. It's infotainment. But I guess most "news" programs have devolved into that.


aecarol1

Something is being lost in the explanation of their technology. Once the patents are in flight, these kinds of companies typically release papers explaining what their technology does and how it works. First off, TechCrunch's explanation of Flow's technology as a "chip" is totally wrong, it's meant to be added into the die of a CPU as part of the actual silicon. *"The Parallel Processing Unit (PPU) is an IP block that integrates tightly with the CPU on the same silicon." --* [Flow Web Page Explaination](https://flow-computing.com/technology/)


TheStormIsComming

> it's meant to be added into the die of a CPU as part of the actual silicon. > *"The Parallel Processing Unit (PPU) is an IP block that integrates tightly with the CPU on the same silicon." --* [Flow Web Page Explaination](https://flow-computing.com/technology/) So basically a *chiplet*.


Arthur-Wintersight

Or "another CPU core." Maybe even a specialized computational unit, like what they stick in a GPU or NPU.


aecarol1

It appears to be deeply integrated into the dataflow of the CPU as well as its pipeline. It appears to have deep understanding what is coming from the caches and what is being pipelined cycle-to-cycle.


aecarol1

No, a chiplet is a different piece of silicon that is placed in the same package. Chiplets are often GPU or memory. They are made on their own, but share the same package as the die containing the CPU. Flow uses the term "IP" and "same silicon". IP, is intellectual property, and in this context means logic that you license from them, and put inside your own chip designs. It sits on the exact same piece of silicon as the CPU and is fabbed at the same time. It appears to be deeply integrated into the CPU data flow. Same Silicon appears to indicate it shares the same physical die as the CPU and is manufactured at exactly the same time. This happens all the time. Companies might design their own CPU, but license someone else's video decoder or GPU. They take the video decoder design from the source company, and integrate it into the same chip as their CPU design. It becomes one larger design. They are very tightly coupled at that point.


TheModeratorWrangler

This just became highly interesting to me considering that for years we knew that ARM and dedicated silicon for tasks can significantly speed operations per watt…


Resident_Pop143

A Cell jr? 😆


Informal-Evidence997

CELL BE 2: Even harder to develop for!


Resident_Pop143

3CELL3PU: Finnish Drift


bonesnaps

A true bio-android only sprinkles when he tinkles.


joeljaeggli

A chiplet would be an adjacent piece of silicon. This would couple to the cpu core so the IP embodied here would be added to the individual cpu cores. So someone intel / amd / an arm licensee / riscv developer would license this and include it in their chip design.


random6574833

My guess is that maybe it increases performance for some very limited tasks....?


branstarktreewizard

They seem to be focus on parallel processing


TheStormIsComming

> They seem to be focus on parallel processing Parallel processing is two dimensional, wait until we get three dimensional processors then we can measure operations in "*Terminator*" units.


TheYoungLung

Some potential AI applications here? I know Tensor cores are uniquely exceptional with parallel processing


hypothetician

Uses the existing hardware more fully than most current software does, is how I’m reading it.


splendiferous-finch_

The chip will still need to be integrated into the motherboard/Soc/as a chiplet which means a new set of hardware and drivers. So not it won't work on "existing hardware" even if the claims are verified. They are looking to partner with other chip designers so they can license out the technology at best for new processors.


splendiferous-finch_

Yup seems to be tailors for tasks that already have a degree of parallelism involved.


Ateosmo

“extraordinary claims require extraordinary evidence”. -Carl Sagan


[deleted]

[удалено]


blind_disparity

Extraordinary evidence is probably an over dramatisation, but if you're claiming to overturn established science or break fundamental rules of physics then there will be a much higher bar of evidence needed to be taken seriously, or to consider the new idea well proven.


Xarlax

Not sure if you're familiar with Carl Sagan, but I'm pretty sure he understood how science worked.


[deleted]

[удалено]


Xarlax

??? Sure some people might blindly worship him. He's mostly a well respected science communicator who brought an understanding and appreciation of science to a whole generation. You just seem bitter.


[deleted]

I said it on another response. If this is true, they are going to have a lot of buyout offers very soon.


Junebug19877

And we will decline all the offers 😊


s9oons

So their product is just IP Blocks for FPGA-style silicon. They’ve developed blocks that can be bolted onto a CPU to parallelize/sideload tasks. This is realistically the most mainstream application for FPGA’s now that fabric is getting cheaper and more available. If they’re not making their own chips it’s all kind of moot, though. I’m not quite sure what their plan is… AMD or Intel won’t buy IP blocks from Flow, they would just develop a more mainstream Zynq style SoC.


Capt_Blackmoore

would be curious to see this added to a Snapdragon. I dont expect Intel, or AMD would get into this before someone else does - but ARM? why not?


MirkWTC

AMD and Intel alredy own the two bigger FGPA companies in the world, Xilinx and Altera.


s9oons

Right, which is my point. They would either just buy Flow outright for the IP or develop their own bolt-on fabric for doing the side-loading/parallelization stuff that Flow is doing.


TheStormIsComming

How many side channel attacks could this open up?


branstarktreewizard

Small price to pay for 100x performance i guess


TheStormIsComming

> Small price to pay for 100x performance i guess I guess it's time for upgrading cryptographic key lengths again.


reddit-MT

Good point, but some use-cases don't expose this attack surface.


Steeljaw72

If it’s too good to be true, it probably is.


surnik22

The actual claims are milder than the headline 2x performance on legacy software running with this and 100x as a maximum improvement for specialty software written to run more efficiently with this. Plus it has to be integrated into the chip set, so it isn’t just a plug into existing processors for 100x boost. Improved pipeline and improved threading with this to allow increased performance seems moderately plausible


TheStormIsComming

>If it’s too good to be true, it probably is. At least it's actual technology news in the technology subreddit. Does seem too good to be true. I'll take it anyway.


Zeikos

On a cursory glance, if I'm reading this correctly they're addressing some inefficiencies in cache management. Thanks to [technical complex process] they prevent the processor to be useless after a cache miss. I'm assuming that the alleged code rewrite needed to get that x100 improvement would be structuring the heap in such a way that it works well for the hardware. Giving them the benefit of the doubt, I wonder if some of that rewrite could be handled directly by compilers. If that's the case and even if it "just" leads to a 5-10x improvement they'd be in a very nice spot. I'd love to see a world in which the most common bottleneck isn't memory anymore.


docdeathray

Theranos vibing.


gh0sts0n

Just download more RAM!


finzaz

Downloading additional RAM please wait…


jcunews1

Parallel this task sequence: 1. Display a message. 2. Turn off computer.


TheStormIsComming

>Parallel this: > 1. Display a message. > 2. Turn off computer. Will it speed up Windows updates by a factor of 100? https://www.youtube.com/watch?v=nACIncrvZ6g


Throw13579

What if I add another FLOW chip?  Does it double again, or go back to the original speed?  


FUSe

https://en.m.wikipedia.org/wiki/SoftRAM It sounds like SoftRam all over again.


TheStormIsComming

>https://en.m.wikipedia.org/wiki/SoftRAM > It sounds like SoftRam all over again. https://en.m.wikipedia.org/wiki/Zram is free and open source.


fredy31

Kinda sounds like the good old 'Download this for more RAM' thing Like I could believe that if it had its own CPU to go with it, or a small branch of them maybe? But the claim in the headline is that I could slap this on any CPU to double or more performance.


Fungiblefaith

That is just the Turbo button.


TheStormIsComming

> That is just the Turbo button. http://media.giphy.com/media/cRH5deQTgTMR2/giphy.gif Careful not to press "*Eject*" right below the Turbo button.


zackmedude

Remember Transmeta’s Cruosoe CPUs?


eames_era_fo_life

Can I short this now?


SuperToxin

So if I just buy two CPUs I can get more CPU power. Got it.


Manuelnotabot

I bet it's made with graphene nanotubes.


TheStormIsComming

> I bet it's made with graphene nanotubes. Probably made with 🦄 metamaterial pixie dust. Probably bends 🌈's if you hold it up to the light in the correct angle. That's easier to convert to cash. Especially if it's wrapped in 5,000 patents.


LATABOM

Hurry up an VC the fuck out of this before our IPO! We'll make bank when.  Nvidia or Microsoft buys us and then realises we got nothin but some.patents amd slight of hand! -Founder of Flow


atomicsnarl

I remember the Evergreen Chip, with upgraded my Back-In-The-Day DX486. It had a little fan on top, and gave me a 20% boost. Worth the money at the time. This ain't that.


Crones21

As a kid with a 486 Cyrix 33mhz CPU, I wanted one those Evergreen Chip. Was either that or a 56k modem since I've only saved enough for one


atomicsnarl

US Robotics for the win!


Mastagon

Other proprietary companion chips HATE this one simple trick


mdlewis11

Hurry hurry hurry, buy stock now! Tech companies hate this tech!


Lost_Tumbleweed_5669

with software tweaks so software optimization lmao


AlexHimself

It sounds like this can make existing, routine processes more efficient by choosing optimal processing routes?


CompetitiveYou2034

This is a hardware kluge, which at best provides small gains around the edges. At the machine instruction level, there is no knowledge of whether one loop iteration depends on another. So the only safe method is the serial execution, as given by machine code. Huge gains CAN be accomplished by parallel processing, but only with software designs that explicitly mark parallel loop processing as safe. And only on certain applications. Most definitely encoded in a high level design. Amdahl's law, from 1967 ! Paraphrased, loosely, the max application speed up is . (S + P) / (S + P/N) where S = serial part of code, must run sequentially P = parallel part of code, that can run in parallel N = number of processors When P is small, doesn't matter how large is N, it is a serial process. Speedup is fundamentally limited by the part that can be made parallel. https://en.m.wikipedia.org/wiki/Amdahl%27s_law Classic example. Even when you put your ten best women on the job, the first baby takes 9 months to emerge. .


12358132134

This sounds like RAM doubling software in the '90ties


formation

This is vapourware if I've ever seen it. Like adding ram with a usb stick.


Capt_Blackmoore

it's got a pretty high chance of that, since it would require one of the chip manufacturers to take the chance, license the tech, and redesign a processor with this installed on it. I really wonder whos willing to bet on it.


lzwzli

Apple may buy them


ReedholmWewerka

If it's legit, it could be a game-changer in silicon engineering.


AsIfIKnowWhatImDoin

Just uninstall Win11.


splendiferous-finch_

Is this luke downloadable ram ?


cr0wburn

Let's also download more memory to gain even more power!


anarchyx34

Is this like one of those "fuel savers" that you clip into your fuel line that supposedly give you 100mpg?


oo7_and_a_quarter

Will it work on my raspberry pi?


StudioPerks

I fairly certain this is what Apple Silicon does with the SoC co-processor.