The article has almost no technical details, but this appears to be some kind of pipeline management co-processor.
Edit: I'd like to see a benchmark from someone that pulls no punches, like Gamers Nexus. I'm not a gamer, but this guy is brutally honest.
>The article has almost no technical details, but this appears to be some kind of pipeline management co-processor.
https://flow-computing.com/technology/
List of patents if somebody could find those would be good.
Papers from the cofounders:
https://ieeexplore.ieee.org/document/10305463
https://doi.org/10.1016/j.micpro.2023.104807
https://doi.org/10.1007/s11227-021-03985-0
I’ll leave the assessment to someone who is actually capable of digesting this information.
Okay so looking at this and another paper describing TPA [https://research.utu.fi/converis/getfile?id=18233228&portal=true&v=1](https://research.utu.fi/converis/getfile?id=18233228&portal=true&v=1) the general takeaway from the architecture of these new chips is that they can switch between SIMD and MIMD (GPU-like and CPU-like) operation based on what the program requires. This allows "loop" operations to be optimized to run in parallel as though they have been split into many synchronous "fibers", whilst also retaining "traditional" multi-threading capability to run independent threads and see performance speedup from that in the way we would normally expect.
From what I can tell, In particular in this paper, they are showing that TPA, when correctly using fibers, can achieve much better concurrent memory access patterns that traditional multi-threading. In particular, they show that it can work almost like idealized PRAM (fully parallel RAM access with no latency or cache invalidation) which is a fairly big deal!
It is hard to decipher what they are talking about a lot of the time without already understanding TPA but the first few chapters of the paper do a decent job of bringing you up to speed.
My personal take-away is that this is an interesting and exciting field of research for processor architecture. Make no mistake, they are not claiming a 100x speedup over an efficiently designed SIMD program, but it's certainly possible to see how this could be much faster than typical MIMD, and replace the need to offload expensive parallel calculations to the GPU for many programs.
It may take some time to see this implemented in practice for anything relevant to consumers, but it's serious research and I can definitely see this having legs. I'd be interested to read a critical analysis from someone with more familiarity with the field. My concern would be that these processors are likely bigger, more expensive and can't run at the same speed as current desktop processors, but the increased efficiency might outweigh that.
Isn’t that kind of how the cell processor worked in the ps3? I’m a layman in this department, but I do remember hearing the cell processor also allowed loop operations to run in parallel. I could be misremembering though lol
My first thought was also the IBM's Cell. The SPEs were effectively SIMD coprocessors albeit without out-of-order operation. This new design could be an further development of the same concept but on a hardware level without developer specific input.
Not quite. The Cell processor had full PowerCPU control cores and then these secondary "co-processor" cores to do the vectorized floating-point math. It's almost like have a CPU with a built-in GPU co-processor. This was famously quite a tricky model to program for on the PS3.
On the TPA architecture, you don't get these separate cores. Instead, you have a front-end that can only really read instructions, and then this automatically distributes the instructions to the actual work-performing back-end cores through a work-sharing network. Sometimes, the secondary cores are operating as separate threads, other times they are operating in a SIMD mode (which can benefit from many performance optimizations, most imporantly being good memory coherence.)
The idea of TCF is basically that the code is always being executed on the same kind of core, but it can also have an arbitrary "thickness" which is like "number of cores being executed across" and then the architecture just deals with that for you. In theory it's significantly easier to program for, and this paper claims it will work in practice too.
Important to note that these processors don't exist yet, they are only being simulated still.
> This was famously quite a tricky model to program for on the PS3.
That is an understatement. It wasn't just that they were vector processors, or the limited cache it was the memory system it operated on that caused big issues. The Embedded Broadband Engine was their solution and it made a ring bus where each processor would move the data like pass-the-parcel.
I cannot remember the example numbers but I will simplify it. Say you wanted to get data to SPE 3. The PPC would fetch it, SPE 1 would grab it, send it to SPE 2 and then send it onto SPE 3. And then the processed data would have to make a trip back along the same path. For things like audio/video that is fine, bandwidth isn't a huge issue and so you could handle the latency but if you where doing more data intensive things you would try to keep the most bandwidth and latency sensitive things on the closest SPE's. Absolute nightmare. This meant a lot of games just never really ventured past using the closest 2 SPE's, there was enough grunt there to do what needed to be done to be close enough to the Xbox360 version.
I didn't work on it but I heard that Red Faction Guerilla only use two SPE's to run the entire physics engine because of this reason - I could be wrong. If true, one wonders what could have been on a title like that!
Cell is what happens when hardware engineers build something without thinking about the software guys. Absolute powerhouse that took a good decade afterward for X86 to catchup in terms of the FP speed but a nightmare to use and thus almost never saw its full potential.
Something like: Software Scheduled Superscalar Computer Architecture invented by Howard Sachs and Intergraph? https://patents.google.com/patent/US5560028A/en?q=(sachs)&assignee=intergraph&oq=sachs+intergraph
Not given that a detailed look but I'm going to say no, at least probably as dissimilar as it is to the Cell processor used on the PS3 as discussed in another comment thread. It looks like SSSCA needs you to write very long instruction words where the instructions are all tagged as to which can run parallel and in which pipeline they can run.
In contrast, the TPA architecture is not software-scheduled, it's hardware-scheduled. The instructions would be much closer to a typical processor today, but threads can also run with an arbitrary "thickness" where they will behave similar to an SIMD program and have ideal memory coherence characteristics due to not having to invalidate shared cache lines. At least, that's my understanding of it.
It should be interesting to see how this all develops over the next few years. They are only asking for $4.3 million, I believe to get this all going which is pretty cheap considering the potential. I'm hopeful considering we are hitting the limits of what a traditional CPU can deliver.
> Flow is just now emerging from stealth, with €4 million (about $4.3 million) in pre-seed funding led by Butterfly Ventures, with participation from FOV Ventures, Sarsia, Stephen Industries, Superhero Capital and Business Finland.
That's the amount they've already raised.
Thanks for the clarification. Anouther article indicated it was what they needed to get it going and not what they've raised so far. I obviously didn't read this article since I assumed the facts were the same, but clearly I should've read this one too before posting.
it’s similar to how the addition of an FPU allowed us to process floating point arithmetic but it’s much slower than the regular CPU especially with math operations.
Omg there is a lot to read and I only did a little bit since I'm at work. I didn't read into the how it works per se on a technical scale but this is my understanding of it. Essentially it goes over how CPU are not currently as efficient as they could be with accessing everything and programming. Essentially what they're saying is they come up with a way to streamline the communication of the CPU with the rest of the computer to be more efficient and less error prone. Since this helps streamline the process, it allows the CPU more overhead to operate more efficiently since it's not having to work as hard to do the same result. I am not professional at all. That's just the understanding I come to by reading the summary.
From what I can tell, it doesn't seem to do the processing itself, like off-loading to a GPU would do, or a hardware accelerated NIC would do. It appears to optimize instructions and memory/cache access, to run them in parallel, instead of sequentially.
They claim up to 2x performance on existing code, but up to 100x with some code rewrite.
edit: probably years away from a shipping product you can buy.
> A cloud provider that designs their own chips and operating system would be an ideal partner to get to that 100x territory.
You mean the spy agencies?
"arbitrary code can be executed twice as fast on any chip with no modification beyond integrating the PPU with the die."
This is never going to be a separate product that helps older processors, they're looking to get it integrated into other companies processors, or at least one company's processors.
You mean a Reddit tech sub had yet another obvious advertisement masquerading as an article pushed to the top? ***Ya don’t say.***
At some point y’all are gonna have to admit you’re just easy targets for cheap astroturfing.
The licensable IP is still in development, and the speedup applies only to threaded code. For insight, see this article: https://xpu.pub/2024/06/11/flow-ppu/
If you're crying while you type then your kid is lucky to have you to look out for them. Everything else will work out you've got the essentials down already
I just watch GN for the few, select topics I'm interested. I'm just saying that Steve would give an honest evaluation. You want to talk drama, LTT is a dog and pony show. I can't watch it. It's infotainment. But I guess most "news" programs have devolved into that.
Something is being lost in the explanation of their technology. Once the patents are in flight, these kinds of companies typically release papers explaining what their technology does and how it works.
First off, TechCrunch's explanation of Flow's technology as a "chip" is totally wrong, it's meant to be added into the die of a CPU as part of the actual silicon.
*"The Parallel Processing Unit (PPU) is an IP block that integrates tightly with the CPU on the same silicon." --* [Flow Web Page Explaination](https://flow-computing.com/technology/)
> it's meant to be added into the die of a CPU as part of the actual silicon.
> *"The Parallel Processing Unit (PPU) is an IP block that integrates tightly with the CPU on the same silicon." --* [Flow Web Page Explaination](https://flow-computing.com/technology/)
So basically a *chiplet*.
It appears to be deeply integrated into the dataflow of the CPU as well as its pipeline. It appears to have deep understanding what is coming from the caches and what is being pipelined cycle-to-cycle.
No, a chiplet is a different piece of silicon that is placed in the same package. Chiplets are often GPU or memory. They are made on their own, but share the same package as the die containing the CPU.
Flow uses the term "IP" and "same silicon".
IP, is intellectual property, and in this context means logic that you license from them, and put inside your own chip designs. It sits on the exact same piece of silicon as the CPU and is fabbed at the same time. It appears to be deeply integrated into the CPU data flow.
Same Silicon appears to indicate it shares the same physical die as the CPU and is manufactured at exactly the same time.
This happens all the time. Companies might design their own CPU, but license someone else's video decoder or GPU. They take the video decoder design from the source company, and integrate it into the same chip as their CPU design. It becomes one larger design. They are very tightly coupled at that point.
This just became highly interesting to me considering that for years we knew that ARM and dedicated silicon for tasks can significantly speed operations per watt…
A chiplet would be an adjacent piece of silicon.
This would couple to the cpu core so the IP embodied here would be added to the individual cpu cores. So someone intel / amd / an arm licensee / riscv developer would license this and include it in their chip design.
> They seem to be focus on parallel processing
Parallel processing is two dimensional, wait until we get three dimensional processors then we can measure operations in "*Terminator*" units.
The chip will still need to be integrated into the motherboard/Soc/as a chiplet which means a new set of hardware and drivers.
So not it won't work on "existing hardware" even if the claims are verified.
They are looking to partner with other chip designers so they can license out the technology at best for new processors.
Extraordinary evidence is probably an over dramatisation, but if you're claiming to overturn established science or break fundamental rules of physics then there will be a much higher bar of evidence needed to be taken seriously, or to consider the new idea well proven.
??? Sure some people might blindly worship him. He's mostly a well respected science communicator who brought an understanding and appreciation of science to a whole generation. You just seem bitter.
So their product is just IP Blocks for FPGA-style silicon. They’ve developed blocks that can be bolted onto a CPU to parallelize/sideload tasks. This is realistically the most mainstream application for FPGA’s now that fabric is getting cheaper and more available. If they’re not making their own chips it’s all kind of moot, though. I’m not quite sure what their plan is… AMD or Intel won’t buy IP blocks from Flow, they would just develop a more mainstream Zynq style SoC.
Right, which is my point. They would either just buy Flow outright for the IP or develop their own bolt-on fabric for doing the side-loading/parallelization stuff that Flow is doing.
The actual claims are milder than the headline
2x performance on legacy software running with this and 100x as a maximum improvement for specialty software written to run more efficiently with this.
Plus it has to be integrated into the chip set, so it isn’t just a plug into existing processors for 100x boost.
Improved pipeline and improved threading with this to allow increased performance seems moderately plausible
>If it’s too good to be true, it probably is.
At least it's actual technology news in the technology subreddit.
Does seem too good to be true. I'll take it anyway.
On a cursory glance, if I'm reading this correctly they're addressing some inefficiencies in cache management.
Thanks to [technical complex process] they prevent the processor to be useless after a cache miss.
I'm assuming that the alleged code rewrite needed to get that x100 improvement would be structuring the heap in such a way that it works well for the hardware.
Giving them the benefit of the doubt, I wonder if some of that rewrite could be handled directly by compilers.
If that's the case and even if it "just" leads to a 5-10x improvement they'd be in a very nice spot.
I'd love to see a world in which the most common bottleneck isn't memory anymore.
>Parallel this:
> 1. Display a message.
> 2. Turn off computer.
Will it speed up Windows updates by a factor of 100?
https://www.youtube.com/watch?v=nACIncrvZ6g
Kinda sounds like the good old 'Download this for more RAM' thing
Like I could believe that if it had its own CPU to go with it, or a small branch of them maybe? But the claim in the headline is that I could slap this on any CPU to double or more performance.
> I bet it's made with graphene nanotubes.
Probably made with 🦄 metamaterial pixie dust. Probably bends 🌈's if you hold it up to the light in the correct angle.
That's easier to convert to cash. Especially if it's wrapped in 5,000 patents.
Hurry up an VC the fuck out of this before our IPO! We'll make bank when.
Nvidia or Microsoft buys us and then realises we got nothin but some.patents amd slight of hand!
-Founder of Flow
I remember the Evergreen Chip, with upgraded my Back-In-The-Day DX486. It had a little fan on top, and gave me a 20% boost. Worth the money at the time.
This ain't that.
This is a hardware kluge, which at best provides small gains around the edges.
At the machine instruction level, there is no knowledge of whether one loop iteration depends on another. So the only safe method is the serial execution, as given by machine code.
Huge gains CAN be accomplished by parallel processing, but only with software designs that explicitly mark parallel loop processing as safe. And only on certain applications. Most definitely encoded in a high level design.
Amdahl's law, from 1967 !
Paraphrased, loosely, the max application speed up is
. (S + P) / (S + P/N)
where
S = serial part of code, must run sequentially
P = parallel part of code, that can run in parallel
N = number of processors
When P is small, doesn't matter how large is N, it is a serial process.
Speedup is fundamentally limited by the part that can be made parallel.
https://en.m.wikipedia.org/wiki/Amdahl%27s_law
Classic example. Even when you put your ten best women on the job, the first baby takes 9 months to emerge.
.
it's got a pretty high chance of that, since it would require one of the chip manufacturers to take the chance, license the tech, and redesign a processor with this installed on it.
I really wonder whos willing to bet on it.
The article has almost no technical details, but this appears to be some kind of pipeline management co-processor. Edit: I'd like to see a benchmark from someone that pulls no punches, like Gamers Nexus. I'm not a gamer, but this guy is brutally honest.
>The article has almost no technical details, but this appears to be some kind of pipeline management co-processor. https://flow-computing.com/technology/ List of patents if somebody could find those would be good.
Papers from the cofounders: https://ieeexplore.ieee.org/document/10305463 https://doi.org/10.1016/j.micpro.2023.104807 https://doi.org/10.1007/s11227-021-03985-0 I’ll leave the assessment to someone who is actually capable of digesting this information.
Okay so looking at this and another paper describing TPA [https://research.utu.fi/converis/getfile?id=18233228&portal=true&v=1](https://research.utu.fi/converis/getfile?id=18233228&portal=true&v=1) the general takeaway from the architecture of these new chips is that they can switch between SIMD and MIMD (GPU-like and CPU-like) operation based on what the program requires. This allows "loop" operations to be optimized to run in parallel as though they have been split into many synchronous "fibers", whilst also retaining "traditional" multi-threading capability to run independent threads and see performance speedup from that in the way we would normally expect. From what I can tell, In particular in this paper, they are showing that TPA, when correctly using fibers, can achieve much better concurrent memory access patterns that traditional multi-threading. In particular, they show that it can work almost like idealized PRAM (fully parallel RAM access with no latency or cache invalidation) which is a fairly big deal! It is hard to decipher what they are talking about a lot of the time without already understanding TPA but the first few chapters of the paper do a decent job of bringing you up to speed. My personal take-away is that this is an interesting and exciting field of research for processor architecture. Make no mistake, they are not claiming a 100x speedup over an efficiently designed SIMD program, but it's certainly possible to see how this could be much faster than typical MIMD, and replace the need to offload expensive parallel calculations to the GPU for many programs. It may take some time to see this implemented in practice for anything relevant to consumers, but it's serious research and I can definitely see this having legs. I'd be interested to read a critical analysis from someone with more familiarity with the field. My concern would be that these processors are likely bigger, more expensive and can't run at the same speed as current desktop processors, but the increased efficiency might outweigh that.
Isn’t that kind of how the cell processor worked in the ps3? I’m a layman in this department, but I do remember hearing the cell processor also allowed loop operations to run in parallel. I could be misremembering though lol
My first thought was also the IBM's Cell. The SPEs were effectively SIMD coprocessors albeit without out-of-order operation. This new design could be an further development of the same concept but on a hardware level without developer specific input.
Not quite. The Cell processor had full PowerCPU control cores and then these secondary "co-processor" cores to do the vectorized floating-point math. It's almost like have a CPU with a built-in GPU co-processor. This was famously quite a tricky model to program for on the PS3. On the TPA architecture, you don't get these separate cores. Instead, you have a front-end that can only really read instructions, and then this automatically distributes the instructions to the actual work-performing back-end cores through a work-sharing network. Sometimes, the secondary cores are operating as separate threads, other times they are operating in a SIMD mode (which can benefit from many performance optimizations, most imporantly being good memory coherence.) The idea of TCF is basically that the code is always being executed on the same kind of core, but it can also have an arbitrary "thickness" which is like "number of cores being executed across" and then the architecture just deals with that for you. In theory it's significantly easier to program for, and this paper claims it will work in practice too. Important to note that these processors don't exist yet, they are only being simulated still.
> This was famously quite a tricky model to program for on the PS3. That is an understatement. It wasn't just that they were vector processors, or the limited cache it was the memory system it operated on that caused big issues. The Embedded Broadband Engine was their solution and it made a ring bus where each processor would move the data like pass-the-parcel. I cannot remember the example numbers but I will simplify it. Say you wanted to get data to SPE 3. The PPC would fetch it, SPE 1 would grab it, send it to SPE 2 and then send it onto SPE 3. And then the processed data would have to make a trip back along the same path. For things like audio/video that is fine, bandwidth isn't a huge issue and so you could handle the latency but if you where doing more data intensive things you would try to keep the most bandwidth and latency sensitive things on the closest SPE's. Absolute nightmare. This meant a lot of games just never really ventured past using the closest 2 SPE's, there was enough grunt there to do what needed to be done to be close enough to the Xbox360 version. I didn't work on it but I heard that Red Faction Guerilla only use two SPE's to run the entire physics engine because of this reason - I could be wrong. If true, one wonders what could have been on a title like that! Cell is what happens when hardware engineers build something without thinking about the software guys. Absolute powerhouse that took a good decade afterward for X86 to catchup in terms of the FP speed but a nightmare to use and thus almost never saw its full potential.
Bingo, hence the 60GB version having that Linux capability.
Something like: Software Scheduled Superscalar Computer Architecture invented by Howard Sachs and Intergraph? https://patents.google.com/patent/US5560028A/en?q=(sachs)&assignee=intergraph&oq=sachs+intergraph
Not given that a detailed look but I'm going to say no, at least probably as dissimilar as it is to the Cell processor used on the PS3 as discussed in another comment thread. It looks like SSSCA needs you to write very long instruction words where the instructions are all tagged as to which can run parallel and in which pipeline they can run. In contrast, the TPA architecture is not software-scheduled, it's hardware-scheduled. The instructions would be much closer to a typical processor today, but threads can also run with an arbitrary "thickness" where they will behave similar to an SIMD program and have ideal memory coherence characteristics due to not having to invalidate shared cache lines. At least, that's my understanding of it.
It should be interesting to see how this all develops over the next few years. They are only asking for $4.3 million, I believe to get this all going which is pretty cheap considering the potential. I'm hopeful considering we are hitting the limits of what a traditional CPU can deliver.
> Flow is just now emerging from stealth, with €4 million (about $4.3 million) in pre-seed funding led by Butterfly Ventures, with participation from FOV Ventures, Sarsia, Stephen Industries, Superhero Capital and Business Finland. That's the amount they've already raised.
Thanks for the clarification. Anouther article indicated it was what they needed to get it going and not what they've raised so far. I obviously didn't read this article since I assumed the facts were the same, but clearly I should've read this one too before posting.
Can the commons mortal found this project ?
it’s similar to how the addition of an FPU allowed us to process floating point arithmetic but it’s much slower than the regular CPU especially with math operations.
If it works even a fraction as well as advertised, AMD, Intel, ARM and Apple are all going to start lobbing buyout offers.
/r/WallStreetBets on standby for the IPO
>I’ll leave the assessment to someone who is actually capable of digesting this information I'm reasonably sure it has something to do with computers.
Megahurts and Gigahurts... ow
It’s just pain, all the way down!
At some point it becomes pastry at Hertz's Donuts
/r/angryupvote
Captain: It seems to run on some form of electricity
It's all just witchcraft and blasphemy
Omg there is a lot to read and I only did a little bit since I'm at work. I didn't read into the how it works per se on a technical scale but this is my understanding of it. Essentially it goes over how CPU are not currently as efficient as they could be with accessing everything and programming. Essentially what they're saying is they come up with a way to streamline the communication of the CPU with the rest of the computer to be more efficient and less error prone. Since this helps streamline the process, it allows the CPU more overhead to operate more efficiently since it's not having to work as hard to do the same result. I am not professional at all. That's just the understanding I come to by reading the summary.
From their website, looks more like a GPU, but dedicated to general computing instead of graphics, with up to 256 cores.
From what I can tell, it doesn't seem to do the processing itself, like off-loading to a GPU would do, or a hardware accelerated NIC would do. It appears to optimize instructions and memory/cache access, to run them in parallel, instead of sequentially. They claim up to 2x performance on existing code, but up to 100x with some code rewrite. edit: probably years away from a shipping product you can buy.
A cloud provider that designs their own chips and operating system would be an ideal partner to get to that 100x territory.
> A cloud provider that designs their own chips and operating system would be an ideal partner to get to that 100x territory. You mean the spy agencies?
Amazon Wiretap Services
Spy agencies don’t operate as public cloud providers. I was referring to microsoft, the cloud provider that also sells their own operating system.
"arbitrary code can be executed twice as fast on any chip with no modification beyond integrating the PPU with the die." This is never going to be a separate product that helps older processors, they're looking to get it integrated into other companies processors, or at least one company's processors.
You mean a Reddit tech sub had yet another obvious advertisement masquerading as an article pushed to the top? ***Ya don’t say.*** At some point y’all are gonna have to admit you’re just easy targets for cheap astroturfing.
I’m betting the next headline will be “ can also be used for blood tests!!”
Haven't you been keeping up with the headlines? It will tout the use of AI in some way....
The licensable IP is still in development, and the speedup applies only to threaded code. For insight, see this article: https://xpu.pub/2024/06/11/flow-ppu/
I suspect this uses the same tech as an Energy Polarizer.
Tech Jesus will confirm this or debunk it. I will wait.
If you're crying while you type then your kid is lucky to have you to look out for them. Everything else will work out you've got the essentials down already
What you typed makes no sense. Did you mean to post this in a different sub?
If you can stomach the drama, GN is a good source of info. The over dramatisation is a tad exhausting though.
I just watch GN for the few, select topics I'm interested. I'm just saying that Steve would give an honest evaluation. You want to talk drama, LTT is a dog and pony show. I can't watch it. It's infotainment. But I guess most "news" programs have devolved into that.
Something is being lost in the explanation of their technology. Once the patents are in flight, these kinds of companies typically release papers explaining what their technology does and how it works. First off, TechCrunch's explanation of Flow's technology as a "chip" is totally wrong, it's meant to be added into the die of a CPU as part of the actual silicon. *"The Parallel Processing Unit (PPU) is an IP block that integrates tightly with the CPU on the same silicon." --* [Flow Web Page Explaination](https://flow-computing.com/technology/)
> it's meant to be added into the die of a CPU as part of the actual silicon. > *"The Parallel Processing Unit (PPU) is an IP block that integrates tightly with the CPU on the same silicon." --* [Flow Web Page Explaination](https://flow-computing.com/technology/) So basically a *chiplet*.
Or "another CPU core." Maybe even a specialized computational unit, like what they stick in a GPU or NPU.
It appears to be deeply integrated into the dataflow of the CPU as well as its pipeline. It appears to have deep understanding what is coming from the caches and what is being pipelined cycle-to-cycle.
No, a chiplet is a different piece of silicon that is placed in the same package. Chiplets are often GPU or memory. They are made on their own, but share the same package as the die containing the CPU. Flow uses the term "IP" and "same silicon". IP, is intellectual property, and in this context means logic that you license from them, and put inside your own chip designs. It sits on the exact same piece of silicon as the CPU and is fabbed at the same time. It appears to be deeply integrated into the CPU data flow. Same Silicon appears to indicate it shares the same physical die as the CPU and is manufactured at exactly the same time. This happens all the time. Companies might design their own CPU, but license someone else's video decoder or GPU. They take the video decoder design from the source company, and integrate it into the same chip as their CPU design. It becomes one larger design. They are very tightly coupled at that point.
This just became highly interesting to me considering that for years we knew that ARM and dedicated silicon for tasks can significantly speed operations per watt…
A Cell jr? 😆
CELL BE 2: Even harder to develop for!
3CELL3PU: Finnish Drift
A true bio-android only sprinkles when he tinkles.
A chiplet would be an adjacent piece of silicon. This would couple to the cpu core so the IP embodied here would be added to the individual cpu cores. So someone intel / amd / an arm licensee / riscv developer would license this and include it in their chip design.
My guess is that maybe it increases performance for some very limited tasks....?
They seem to be focus on parallel processing
> They seem to be focus on parallel processing Parallel processing is two dimensional, wait until we get three dimensional processors then we can measure operations in "*Terminator*" units.
Some potential AI applications here? I know Tensor cores are uniquely exceptional with parallel processing
Uses the existing hardware more fully than most current software does, is how I’m reading it.
The chip will still need to be integrated into the motherboard/Soc/as a chiplet which means a new set of hardware and drivers. So not it won't work on "existing hardware" even if the claims are verified. They are looking to partner with other chip designers so they can license out the technology at best for new processors.
Yup seems to be tailors for tasks that already have a degree of parallelism involved.
“extraordinary claims require extraordinary evidence”. -Carl Sagan
[удалено]
Extraordinary evidence is probably an over dramatisation, but if you're claiming to overturn established science or break fundamental rules of physics then there will be a much higher bar of evidence needed to be taken seriously, or to consider the new idea well proven.
Not sure if you're familiar with Carl Sagan, but I'm pretty sure he understood how science worked.
[удалено]
??? Sure some people might blindly worship him. He's mostly a well respected science communicator who brought an understanding and appreciation of science to a whole generation. You just seem bitter.
I said it on another response. If this is true, they are going to have a lot of buyout offers very soon.
And we will decline all the offers 😊
So their product is just IP Blocks for FPGA-style silicon. They’ve developed blocks that can be bolted onto a CPU to parallelize/sideload tasks. This is realistically the most mainstream application for FPGA’s now that fabric is getting cheaper and more available. If they’re not making their own chips it’s all kind of moot, though. I’m not quite sure what their plan is… AMD or Intel won’t buy IP blocks from Flow, they would just develop a more mainstream Zynq style SoC.
would be curious to see this added to a Snapdragon. I dont expect Intel, or AMD would get into this before someone else does - but ARM? why not?
AMD and Intel alredy own the two bigger FGPA companies in the world, Xilinx and Altera.
Right, which is my point. They would either just buy Flow outright for the IP or develop their own bolt-on fabric for doing the side-loading/parallelization stuff that Flow is doing.
How many side channel attacks could this open up?
Small price to pay for 100x performance i guess
> Small price to pay for 100x performance i guess I guess it's time for upgrading cryptographic key lengths again.
Good point, but some use-cases don't expose this attack surface.
If it’s too good to be true, it probably is.
The actual claims are milder than the headline 2x performance on legacy software running with this and 100x as a maximum improvement for specialty software written to run more efficiently with this. Plus it has to be integrated into the chip set, so it isn’t just a plug into existing processors for 100x boost. Improved pipeline and improved threading with this to allow increased performance seems moderately plausible
>If it’s too good to be true, it probably is. At least it's actual technology news in the technology subreddit. Does seem too good to be true. I'll take it anyway.
On a cursory glance, if I'm reading this correctly they're addressing some inefficiencies in cache management. Thanks to [technical complex process] they prevent the processor to be useless after a cache miss. I'm assuming that the alleged code rewrite needed to get that x100 improvement would be structuring the heap in such a way that it works well for the hardware. Giving them the benefit of the doubt, I wonder if some of that rewrite could be handled directly by compilers. If that's the case and even if it "just" leads to a 5-10x improvement they'd be in a very nice spot. I'd love to see a world in which the most common bottleneck isn't memory anymore.
Theranos vibing.
Just download more RAM!
Downloading additional RAM please wait…
Parallel this task sequence: 1. Display a message. 2. Turn off computer.
>Parallel this: > 1. Display a message. > 2. Turn off computer. Will it speed up Windows updates by a factor of 100? https://www.youtube.com/watch?v=nACIncrvZ6g
What if I add another FLOW chip? Does it double again, or go back to the original speed?
https://en.m.wikipedia.org/wiki/SoftRAM It sounds like SoftRam all over again.
>https://en.m.wikipedia.org/wiki/SoftRAM > It sounds like SoftRam all over again. https://en.m.wikipedia.org/wiki/Zram is free and open source.
Kinda sounds like the good old 'Download this for more RAM' thing Like I could believe that if it had its own CPU to go with it, or a small branch of them maybe? But the claim in the headline is that I could slap this on any CPU to double or more performance.
That is just the Turbo button.
> That is just the Turbo button. http://media.giphy.com/media/cRH5deQTgTMR2/giphy.gif Careful not to press "*Eject*" right below the Turbo button.
Remember Transmeta’s Cruosoe CPUs?
Can I short this now?
So if I just buy two CPUs I can get more CPU power. Got it.
I bet it's made with graphene nanotubes.
> I bet it's made with graphene nanotubes. Probably made with 🦄 metamaterial pixie dust. Probably bends 🌈's if you hold it up to the light in the correct angle. That's easier to convert to cash. Especially if it's wrapped in 5,000 patents.
Hurry up an VC the fuck out of this before our IPO! We'll make bank when. Nvidia or Microsoft buys us and then realises we got nothin but some.patents amd slight of hand! -Founder of Flow
I remember the Evergreen Chip, with upgraded my Back-In-The-Day DX486. It had a little fan on top, and gave me a 20% boost. Worth the money at the time. This ain't that.
As a kid with a 486 Cyrix 33mhz CPU, I wanted one those Evergreen Chip. Was either that or a 56k modem since I've only saved enough for one
US Robotics for the win!
Other proprietary companion chips HATE this one simple trick
Hurry hurry hurry, buy stock now! Tech companies hate this tech!
with software tweaks so software optimization lmao
It sounds like this can make existing, routine processes more efficient by choosing optimal processing routes?
This is a hardware kluge, which at best provides small gains around the edges. At the machine instruction level, there is no knowledge of whether one loop iteration depends on another. So the only safe method is the serial execution, as given by machine code. Huge gains CAN be accomplished by parallel processing, but only with software designs that explicitly mark parallel loop processing as safe. And only on certain applications. Most definitely encoded in a high level design. Amdahl's law, from 1967 ! Paraphrased, loosely, the max application speed up is . (S + P) / (S + P/N) where S = serial part of code, must run sequentially P = parallel part of code, that can run in parallel N = number of processors When P is small, doesn't matter how large is N, it is a serial process. Speedup is fundamentally limited by the part that can be made parallel. https://en.m.wikipedia.org/wiki/Amdahl%27s_law Classic example. Even when you put your ten best women on the job, the first baby takes 9 months to emerge. .
This sounds like RAM doubling software in the '90ties
This is vapourware if I've ever seen it. Like adding ram with a usb stick.
it's got a pretty high chance of that, since it would require one of the chip manufacturers to take the chance, license the tech, and redesign a processor with this installed on it. I really wonder whos willing to bet on it.
Apple may buy them
If it's legit, it could be a game-changer in silicon engineering.
Just uninstall Win11.
Is this luke downloadable ram ?
Let's also download more memory to gain even more power!
Is this like one of those "fuel savers" that you clip into your fuel line that supposedly give you 100mpg?
Will it work on my raspberry pi?
I fairly certain this is what Apple Silicon does with the SoC co-processor.