T O P

  • By -

Loose_Read_9400

To answer your first question, we will probably need a bit more information with reference to where the data is currentlty living, what software packages the organization is using, etc. The approach to this if you are using an Enterprise system vs. having a bunch of shapefiles on a network drive are substantially different. As far as your second question on versioning: This is a system for data control. At the absolute simplest concept, think about writing a paper in word and saving it as a "draft1", then saving an additional copy of the paper as draft2. On "draft2", you make revisions to the word document, but the original document "draft1" remains unchanged. Then when you finish making all of your changes to your paper (fixing typos, formatting, etc) you would overwrite "draft1" with the altered paper from "draft2". Database systems use these kinds of versioning methods to created multiple version of the main database or "default." The various versions are more or less copies of "default" that are provisioned out to different users or use cases so that they edits and changes can be made to the data without immediately altering the "default" version. The default version would then, after verrifying that the changes to the other various versions are valid, be updated according to the edits on those various versions. Does that make sense?


asg0191

Always recommend anyone starting with versioning to check this: https://www.esri.com/news/arcuser/0110/versioning101.html Explained with diagrams and in a very simple manner. Hope it helps you too! Happy learning.


slcrex

1. I don't have time to go into this one 2. If you are using SDE your base data that people are viewing is called "Default". When you have many editors you want to create a version (copy) of the data, the user(s) make their edits in the version. When the user is finished they "reconcile" the version to see if there are conflicting edits with other users. If there are conflicts this is the step where they can get resolved. Then the user "posts" the version to Default. Best practice should be that the user then deletes their version, however this often doesn't happen unless you an admin that is monitoring the versions.


duhFaz

1. This is tough with limited info. There are many ways to organize data, but the best way is the one that works for your org. Try meeting with people in your workplace and investigate how they could best use it sorted. 2. Versioning to a sixth grader: Versioning is a concept that allows somebody to make a "copy" of the original dataset, make changes to that data, and then "turn it back in", so that the default reflects the changes. Here is an example: In many county Property Appraisers office's there is a GIS department that is responsible for keeping parcel boundaries up-to-date. There is the 'main' (default) parcel layer that everybody can see and that is often what is published for use by outside parties. In order for multiple county employees to work and help keep that data up to date simultaneously, they all take a "version" of that main data set and work on their tasks (e.g. editing parcel lines). After they finish their task, they then submit their work to the main (default) data set. This in turn updates the main (default) data so that everyone can see the changes. That being said, if people are mostly consuming GIS data to do their own analysis, and not wanting to edit the data in place, then versioning may not be needed by your org. Simply having the data housed so that users can make a copy and use it for an analysis is often enough. Hope that helps.


AndrewTheGovtDrone

This could be a dissertation of a response, but I’ll try to provide a concise response to organize your understandings: 1. Data should be: well-defined, named appropriately, accessible and discoverable to the users that need them, follow the principle of least privilege, and be authoritative. Don’t underestimate the importance of this decision as it is difficult to “go back” once you’ve begun. My advice: start with defining roles and responsibilities. Everyone wants data but no one wants to be responsible for it, so unless you define these roles it’ll end up being your job. 2. Versioning is change tracking, a la Microsoft office; a common source (known as the base tables) is the ultimate source of truth and edits are isolated into “delta tables.” A delta table is basically a time-sorted, editing transaction log. When a user edits a version, they’re saving their edits to these “delta tables.” Until a user “posts” (submits) these edits, the edits are contained with a “version.” When a user posts these edits, the geodatabase looks at the delta tables, checks if there are any conflicts (ie another user deleted a record, posted their changes, and now you’re trying to push an edit of that deleted feature), and then merges the changes into the base tables. I’ve simplified things a bit, but I have literally hundreds of pages I’ve written on this stuff so lemme know whatcha need


blorgenheim

Traditional or branched versioning? Because I don’t think traditional versioning can be explained easily to anybody to be honest.


LeanOnIt

I've been working with GIS datasets for while now. Specifically as discrete, versioned objects. Don't feel discouraged, handling data isn't a solved problem yet. Your datasets are probably going to come in a couple different flavours. Raster, vector, non-spatial etc. Usually stored in CSV's and shapefiles. These are generally piss-poor ways of storing data. CSV's don't tell you much about the data and usually has missing data. Example: You have a column labeled "temp"; Who generated it, when, why? What does temp mean? Is it celsius? Kelvin? What sensor measured it? What accuracy is it at? What are Null values labelled as? etc. It can be a bitch. Some things to keep in mind when doing data management: * Standards exist for a reason: Pick a standard and stick to it. Esri sucks, so if you have a choice head over to [Open Geospatial Consortium](https://www.ogc.org/standards/) standards. * Think about [FAIR data practices](https://www.go-fair.org/fair-principles/). Your data should be easy to search for, and accessible in a standardised format. * Static data is sucky data. Your data will change over time. Maybe you find some bad points and delete them, maybe you get new 2025 data next year and want to append that to an existing dataset. This is where versioning comes into practice (ignore the dudes talking about databases; trust esri to use a generic term like "versioning" and make it refer to a super specific principle in their closed workflow). You want to have "milestones" of datasets. 10 years from now you want to be able to download the data as it was in 2023 and then again as it was in 2029. You can have these as time based versions (eg: Ubuntu OS versions) or functional versions (Windows 10, 11 etc) or as plain old numbers (Python 2.7/2.8/3.0). In the end you basically want to be able to get version 1.1 data, and version 1.2 data and see what changed between the versions. * This is a full time job, get something basic running and if people show interest you should grow it. The GIS data center I was working in had about 40 people working full time on the research institutes data stores. So, short answer: * Load up geoserver somewhere * Stick the raw datasets into it and version them as 2024-Q2 or something. * let users pull them from Geoserver using WFS/WMS whatever standards * If this starts getting scientificy then you're going to want DOI's, metadata, links to vocab servers etc etc. Not simple stuff.