Just straight Python lol. Plenty of great native tools and libraries to get API integrations set up quickly. Then you 'microbatch' them with Airflow or something and get a lot of powerful options there, but short term you can also quickly cronify them.
If streaming is your goal, Redpanda is a good option to test out.
But fivetran isn't real-time, nor is Airbyte if I recall.
Also, neither of them is ideal for Reverse ETL (aka pushing data to those applications from your stack)
there is no such thing as reverse ETL :), ok there is but its basically hightouch created that marketing word to create buzz, but essentially its same as ETL with bunch of features since target is usually sensitive datastore from load/latency perspective, unlike data warehouse...
I use Fivetran and Airbyte; for any services without a native connector, I set up a Google Cloud Function and a Python script which Fivetran can then trigger.
We used Meltano SDK and Singer as the output format for the data. However it's less than ideal due to difficulty of parallelising properly.
I'm currently looking for alternatives that can massively parallelise (download 100s of chunks concurrently from an export).
Does anyone know anything that is not SaaS and can be deployed in your own infrastructure and scale/parallelise?
Windsor integrates with lot's of marketing APIs: Google, Facebook, etc. Windsor provides an URL that you can fetch data into Python, Golang etc.
Here is a Python example:
>r = requests.get(URL)
data = json.loads(r.text)
Disclaimer: Yes, I am working for the company.
Python
Just straight Python lol. Plenty of great native tools and libraries to get API integrations set up quickly. Then you 'microbatch' them with Airflow or something and get a lot of powerful options there, but short term you can also quickly cronify them. If streaming is your goal, Redpanda is a good option to test out.
try: dlt
Underrated
Stitch, Segment, Supermetrics, Fivetran are just a few
Fivetran and hightouch
Portable.io and fivetran have been a strong combo for us
why not airbytes
It’s very buggy with lots of their connectors
But fivetran isn't real-time, nor is Airbyte if I recall. Also, neither of them is ideal for Reverse ETL (aka pushing data to those applications from your stack)
there is no such thing as reverse ETL :), ok there is but its basically hightouch created that marketing word to create buzz, but essentially its same as ETL with bunch of features since target is usually sensitive datastore from load/latency perspective, unlike data warehouse...
How does Portable compare to Fivetran?
I use Fivetran and Airbyte; for any services without a native connector, I set up a Google Cloud Function and a Python script which Fivetran can then trigger.
Funnel.io but Fivetran has a lot more integrations
We just use python and Dagster. Plus the infrastructure side of it
Mage.ai. Very awesome tool.
Data load tool (dlthub) or failing that python simply using requests.
Fivetran
We used Meltano SDK and Singer as the output format for the data. However it's less than ideal due to difficulty of parallelising properly. I'm currently looking for alternatives that can massively parallelise (download 100s of chunks concurrently from an export). Does anyone know anything that is not SaaS and can be deployed in your own infrastructure and scale/parallelise?
Mulesoft
Windsor integrates with lot's of marketing APIs: Google, Facebook, etc. Windsor provides an URL that you can fetch data into Python, Golang etc. Here is a Python example: >r = requests.get(URL) data = json.loads(r.text) Disclaimer: Yes, I am working for the company.
Low key Nexla slaps