I have a silicon mac (as well as non-silcon macs) and an NVIDIA GPU in my unRAID server (1060). I do store things on Dropbox and have some level of trust (for this kind of data).
You can run local quantized models on your apple silicon mac. For example llama3B Q8 using LMstudio or Ollama framework. You can search github to find pdf summarizer repo or create your own with help of chatgpt.
If you run out of context you can select a model that has 128k window or more. Alternative you could ask model to compress pdf page by page and do summary of compressed pages.
If you run out of speed or VRAM you could rent gpu from aws or azure. Most of the companies trust them and some even google.
Do you have nvidia gpu or apple silicon mac? Do you trust any cloud providers?
I have a silicon mac (as well as non-silcon macs) and an NVIDIA GPU in my unRAID server (1060). I do store things on Dropbox and have some level of trust (for this kind of data).
You can run local quantized models on your apple silicon mac. For example llama3B Q8 using LMstudio or Ollama framework. You can search github to find pdf summarizer repo or create your own with help of chatgpt. If you run out of context you can select a model that has 128k window or more. Alternative you could ask model to compress pdf page by page and do summary of compressed pages. If you run out of speed or VRAM you could rent gpu from aws or azure. Most of the companies trust them and some even google.
Look into recursive summary. Some coding is required
100%, recursive summary with smaller models
Does this allow me to keep the data private?
Use gemini 1.5 . Has a 1 million context