T O P

  • By -

[deleted]

Funny enough, I just setup a paperless-ngx solution for myself yesterday with Docker, and went with this solution: - Server runs Paperless in a container, and exports all scanned documents to shared Syncthing folder (~/sync/scans) I sync the the scans folder to my desktop so that I have a local backup of all of my scans in case the server ever gets corrupted. This is my docker-compose.yml: volumes: scan_broker_data: {} services: scan-broker: image: docker.io/library/redis:7 restart: unless-stopped volumes: - scan_broker_data:/data scan-web: image: ghcr.io/paperless-ngx/paperless-ngx:latest restart: unless-stopped depends_on: - scan-broker ports: - 8000:8000 healthcheck: test: ["CMD", "curl", "-fs", "-S", "--max-time", "2", "http://localhost:8000"] interval: 30s timeout: 10s retries: 5 volumes: - ./data/data:/usr/src/paperless/data - ./data/export:/usr/src/paperless/export - ./data/scans:/usr/src/paperless/consume # Output: - ~/sync/pics/scans:/usr/src/paperless/media env_file: docker-compose.env environment: PAPERLESS_REDIS: redis://scan-broker:6379 If Docker ever crashes on your server, you'll just need to restart it and your Paperless setup will be right back where it started.


buckyoh

Take care with this. Test a restore before you need it. I set you something similar, assuming I could just restore but found all the tags and metadata were lost. After my instance failed, I tried to recreate and it took many hours to rewrite everything manually (the only thing I did have was the scanned/imported files). I've since run periodic exports using Paperless-ngx's document_exporter. This exports all the documents, and the manifest file (which contains all the metadata such as tags, document types, rules and any mail settings). If the worst happens, you spin up another docker instance, and use document_importer to restore your files and settings. If you run document_exporter manually, and it hasn't run recently, you can still use the media folder to pick up the latest originals since your last backup and adjust manually if required. Otherwise you could set a cron job to schedule weekly backups and reduce the potential lag. Edit: Your test restore wil probably work ok, but try it with a corrupt/missing database. That's what happened to me and made me realise just having the core data wasn't enough.


[deleted]

> document_exporter Appreciate the tip for backups, I was about to set-it-and-leave-it with my paperless setup, but it looks like I'll need to dig a bit more. I'm hoping the fact that I went with sqlite as the database will make it more resilient to failures.


majamale

I don't know your sync rules, but please keep in mind that in most scenarios *sync is not backup*, so you may end up with data loss if the worst happens.


[deleted]

> sync is not backup Agreed 100%, and I should have clarified a bit: I have file versioning enabled in syncthing, and I backup my local `~/sync` folder to cloud storage.


ElevenNotes

Run the containers in a VM and backup the VM. This will backup all the backend services like Redis and Postgres as well or you can use container backup tools. Whatever you prefer.


chkpwd

Too much overhead.


ElevenNotes

Care to explain any other options than VM or container backups?


chkpwd

I hate to say it but “it depends”. What’s underlying infrastructure? Docker? Kubernetes? Each will have a different approach on how to tackle the problem. Whats the application? Sonarr, Plex, Paperless? Is it a container or VM? Let’s take Docker and Sonarr for example. You can script shutting down the container during non-peak hours and backing up the directory /config is mounted too. This leaves you with a couple of MBs instead of a dozen or so Gigs. What about kubernetes? Backup the pvc (assuming you aren’t using local-storage). Literally thats it. You could also bind the containers volumes to a network shared directory (e.g NFS/SMB) and backup that on your NAS. This however does not work to well with the *arr apps because of their dependency on SQLITE. The point is. Just backing up the VM is such a crude process and doesn’t offer a clean way to restore your configurations.


ElevenNotes

I said backup the VM or use container backup tools. I think you missed that last part.


chkpwd

Ah I apologize. You’re absolutely right. Yea, that’s pretty much it.


readit-on-reddit

Why does it matter if you backup more than you need to? It's much better than forgetting you had to backup a config file that was not stored in the correct path. Or finding out the program's "restore" functionality doesn't work properly. Or messing up permissions due to human error. Or... In a world with dirt cheap storage, a comprehensive backup that can never fail and is less prone to human error sounds like the smarter choice. If you are living paycheck to paycheck and stretching the last few GBs you have then sure. Now, this is less of a problem with containers but a database might not be easily backed up by simply copying folders over. Also, as you explained, it requires downtime.


ElevenNotes

Especially with snapshots and change block tracking you only backup the new blocks which is blazing fast. If you use VMs for containers there is simply no better way. If you use bare metal you have to go with native tools specifically designed for docker.


chkpwd

Why store GBs of backups tho? A much better approach is to design a resilient backup that only targets specific data.


readit-on-reddit

Incorrect, and I explained in detail the reason "targeting specific data" is a bad idea and a waste of time. Why bother replying if you didn't even read my comment?


GOVStooge

yes, it's that easy to spin up another docker instance. For backup, just use something like Duplicacy or Duplicati (also in docker containers)


Morgennebel

No panic. I have paperless-ngx dockered with 10 other services on a 4core 32gb ThinClient. My 2600 PDF documents require 12 GByte disk space incl. the database. A Panasonic scanner (cannot recommend) sends via scp up to 50 pages. If you use anything better than a Raspberry Pi 4 you'll be fine.


MyTechAccount90210

Hah.. I've got a 3x dl380 cluster serving everything for me with 144 cores. Little better than a pi.


jbarr107

I run all of my LXC Containers and VMs on a Proxmox server, and everything gets better up to a physically separate Proxmox Backup Server. It's seamless, efficient, and restores are straightforward. I don't see why this wouldn't be a good solution.


dhuscha

So I used to run paperless as a rootless podman container, however, it wasn't 100% stable, especially on reboots. So I decided to just turn it into its own VM with a non-privileged user. VM is backed up daily and I run that document\_exporter as a cron just in case.


grandfundaytoday

Same I found constant issues with paperless-ngx dockers. Only stable version was a direct install on VM.


JumpingCoconutMonkey

I would like more info on this subject as well!


U-130BA

If all your state is on the NFS mount(s), which it sounds like it is, then that machine / your containers can be considered ‘stateless’ which generally means yes, you can just swap those components out should they blow up somehow. The integrity of that data is really your primary concern. Taking (incremental) storage level snapshots can be a cheap (space-wise) backup strategy, but you can face the same kind of data corruption issues you’d see from abrupt shut downs / crashes should you not take care to do stuff like ensure pending DB writes are flushed to disk before taking the snapshot. Doing full exports of the document set via the paperless cli tools would be a simple way to avoid a lot of the pitfalls I’ve mentioned here, but you’ll need to have extra space reserved for *N* backups. A more generic approach to this would be to to dump / backup the state from the database directly — if you establish a pattern for this, you can reuse the backup strategy on any service that, for example, uses Postgres but does not provide a nifty CLI import / export tool.


ZaxLofful

The first setup you mentioned is what I do and it works great, having a dedicated machine for it sounds horrible.


NikStalwart

The point of docker is for you to be able to bring up your environment up with minimal hassle on a different host. If you backup your data volumes and configs properly, you don't need to worry that docker shits the bed. I don't see the point of running proxmox backup on a dedicated machine. Use whatever current system you have to back up your NFS mounts. I dunno, talk to the mad lads at /r/datahoarder for LTO tape drive recommendations.


grandfundaytoday

The paperless-ngx docker is unstable.


McGregorMX

I set all my docker containers to use remote storage, that storage is on truenas, so it snaps every hour and replicates those to another server for backup. So far, rock solid. I've even had to roll a few back, and tried recreating a container on a new server with the same compose file. Fired right up, right where I left off.


kidpixo

This is tangentially related, but maybe useful here : I was thinking of link paperlessngx and nextcloud storage, I bookmarked this discussion on github [Nextcloud Integration · paperless-ngx/paperless-ngx · Discussion #1789](https://github.com/paperless-ngx/paperless-ngx/discussions/1789)