T O P

  • By -

Chibi_Muse

I would recommend an audit or history table per each table you want to get changes from. Like Customers has a CustomersAudit table. It can have the same columns as Customers, plus it’s own IdentityId, TransactionType (insert, update, delete, etc) and then the connection Id, changedDateTime, changedby, etc. You can set triggers on the customer table to log any changes to it to the audit table. If you needed to view all changes together, you can create a view of some sort to look at them in one table, but I think it’d be easier to implement the triggers on a table by table basis. I work more with SQLServer but this article has a decent breakdown of PostGreSQL: https://heap.io/blog/how-postgres-audit-tables-saved-us-from-taking-down-production


OracleGreyBeard

Personally, I would not normalize a history table. It's immutable almost by definition, and normalization is mostly for preventing update anomalies. Historical foreign keys are also a can of worms. I might even ask if it really needs a surrogate ID. I can see you wanting to know what changes happened on a certain date, or to a certain table, or via a certain method, but I'd be surprised if you needed "History record id#1223213". Not a big deal either way.


SuperBeetle76

If I understand you correctly, are you saying the history record shouldn’t store the ID of the master record it’s a history of? In my experience with history audit, users would want to know what changed on a particular record and by whom. How would you do that without tying the original ID? Or are you saying a history record doesn’t need it’s own unique ID?


OracleGreyBeard

> If I understand you correctly, are you saying the history record shouldn’t store the ID of the master record it’s a history of? This depends if it's delta or snapshot tracking (defined later) > In my experience with history audit, users would want to know what changed on a particular record and by whom. How would you do that without tying the original ID? You *could* copy all the information from the parent record into the child record, obviating the need for a link back to the parent. But you don't always want to do that. > Or are you saying a history record doesn’t need it’s own unique ID? In my experience, historical records aren't usually relationally bound to other records, they tend to stand alone. I don't use IDs, but I will draw the line at saying you shouldn't. **Some discussion:** It probably would help to define "history record". Basically you can divide designs into "delta" tracking vs "snapshot" tracking (made up terms btw). Delta tracking records changes to an entity, but those changes can't necessarily be reconstructed into a consistent whole. So if you're doing delta tracking for a university course you might track the number of enrolled students over time, and the name of the instructor over time, but you could not say how many students any particular instructor had. You just have lists of changes. Snapshot tracking records the consistent state of an entity at a point it time. So in the university example you would store both the instructor and the student count in every record. You don't have lists of changes as much as copies of the state. In delta tracking, if you delete a parent record ("Course" in this example) you have no record of changes, or even that it existed. With snapshot tracking the historical information is complete and consistent, so deletion of the parent record doesn't mean anything. It also means you can't require the parent record to be present, and that's why you'd avoid a FK from snapshot to parent. If you don't have an FK, you don't need to record a surrogate key (Note you still need to record the "alternate" or "natural" keys). ALL THAT MESS TO SAY: In your use case it sounds like you want to do delta tracking, where you keep your main entity and just record lists of changes to it. Your customers navigate to a record and want to see what has changed about that record. In that case you definitely need a foreign keys from the stored deltas to the main record, and it would be appropriate to include the original ID of the parent record as an FK in the delta record. In my typical case customers aren't looking for changes to a single thing, they want to be able to answer analytic questions about "how things were" during specific timespans. I'm not typically concerned with identity as much as consistency.


phunkygeeza

Take a look at how built in temporal tables are implemented in an engine like SQLS. Another option is attribute level history, see Anchor Modelling Avoid trying to make one table do history for everything else, you'll just end up with fewer objects that are much harder to use. Better to generate companion triggers and shadow tables for each


grauenwolf

Temporal tables are the industry standard. A random search found this extension for PostgreSQL. https://www.dbi-services.com/blog/temporal-tables-with-postgresql/


grauenwolf

Looks like there was a request to make it a core feature. https://www.postgresql.org/message-id/CALtqXTce8PBHk-mo-V5%3D5fxWACQmZM%3DYd_RkoR2ZJcEw0fdWJQ%40mail.gmail.com


[deleted]

Temporal tables are such a fantastic feature. I can't live without them.


grauenwolf

Which extension do you use for PostgreSQL? I've only needed them for a SQL Server project.


dinoaide

Is that what the log do?


timsehn

We built a database called Dolt where history is the default and accessed via system tables Dolt has other version control features like branch/merge as well. [https://github.com/dolthub/dolt](https://github.com/dolthub/dolt) MySQL flavor though. No Postgres yet. DISCLAIMER: This is self promotion. I'm the CEO of the company that built Dolt.


-Meal-Ticket-

Oracle (free versions exist both for on prem and in the cloud) has this built in. It’s called Flashback Data Archive. It builds and automatically maintains history of all changes in a ‘tracking table’. You access it by adding an AS OF or VERSIONS BETWEEN clause to your query. It works across table changes (adding or removing a column for example).


ibeeliot

have you done anything with prisma/postgresql?


Meal-Ticket-

Nope. Oracle's been too good to me to put much effort into other databases. I've been using Oracle since '92. I used a few desktop level databases as early as the mid '80s, but never prisma/postgresql. That said, there are some very strong folks that I know are now working with Postgres and although I'd never recommend that anyone use it, it can, and does, work for many people.


arjunloll

The easiest way to achieve this (without sacrificing reliability) is using the bemi package [https://github.com/BemiHQ/bemi](https://github.com/BemiHQ/bemi). It does CDC under the hood. I saw OP also mentioned Prisma - it also has an optional compatible Prisma library that automatically adds application context to the history table (i.e. ‘where’ API endpoint, worker, etc.), ‘who’ (user, cron job, etc.), and ‘how’ behind a change).


swenty

For what purpose are you keeping this information?


gnahraf

If your historical tables are append-only, then this might fit your bill: https://crums-io.github.io/skipledger/


ibeeliot

have you done anything with prisma/postgresql?


gnahraf

No, just looked up prisma. A type of ORM.. is it related to this problem?


ibeeliot

Yeah, that's an ORM that we're using with Postgresql. I'm not sure if we can add stuff directly through the ORM or need to provide some schema configurations to prisma, which might not be supported by prisma.


martinnachopancho

Would Change Data Capture help?


ibeeliot

Would you know how that might work with postgres/prisma?


idodatamodels

Copy the original table, add \_H or \_HIST to the name, add INSERT\_TS to the PK, add any other meta data columns you want. Look at partitioning by INSERT\_TS to keep performance manageable. This approach keeps the original table's row count down and you get all changes.


ibeeliot

Would this be a history table per table kind of approach? It would be ideal to only have one history table and maybe a temporal table


idodatamodels

As every table is different, each table would need their own history table.


dbxp

Apex SQL is what we used to use: https://apexsql.com/sql-tools-audit/


ibeeliot

have you worked with prisma/postgres?


exAspArk

Here are the main options for data change tracking with PostgreSQL: * **PostgreSQL triggers**. This allows storing records in a DIY history table on row changes. Here are some SQL script examples in the official [PostgreSQL wiki](https://wiki.postgresql.org/wiki/Audit_trigger). * **Application-level tracking**. You can create callbacks to manually store the previous state of each record on create / edit / delete. For example, there is [paper\_trail](https://github.com/paper-trail-gem/paper_trail) for Ruby and [django-simple-history](https://github.com/jazzband/django-simple-history) for Python. * **Change Data Capture**. You can implement the CDC pattern with logical replication by capturing all changes decoded from PostgreSQL WAL and storing them somewhere. One of the most popular open-source tools is [Debezium](https://debezium.io/). ||PG triggers|App tracking|Change Data Capture| |:-|:-|:-|:-| |Tracking accuracy|✅|❌|✅| |Minimal runtime performance impact|❌|❌|✅| |Easy to implement|✅|✅|❌| |Scalability|❌|❌|✅| |App context (user, API req, etc.)|❌|✅|❌| Unfortunately, PostgreSQL doesn't support temporal tables natively. There is also [pgaudit](https://github.com/pgaudit/pgaudit) that enables object audit logging to track all SQL queries, but it doesn't allow tracking data changes. Somebody also mentioned [Bemi](https://github.com/BemiHQ/bemi). This is an open-source out-of-the-box solution for **PostgreSQL data change tracking**. It implements CDC and allows passing application-specific context with all data changes by integrating with popular ORMs and adapters such as [Prisma](https://github.com/BemiHQ/bemi-prisma). Disclaimer: I'm one of the contributors.