Next item in the upgrade is addressing some schema changes.
When we started development, we designed a large geodatabase using Visio & UML. Nice. Over the following years, we wrote the ArcMap tools and other applications which use this geodatabase. Still nice. As is the case with most development, some changes must be made to the database model. In most cases, we were good, and updated the Visio UML model, but it's been a few years, and we're pretty sure they are now out of sync.
The question is: How can we find these changes so we can easily update the model so it's back in sync with the geodatabase?Option 1: Manually check the UML against the contents of the geodatabase
Ok, this is not an option because I need to do this in hours, not weeks. And I'm pretty sure nobody on staff would actually do this!
Option 2: Reverse engineer the geodatabase back into UML.
Wishful thinking - this is still not possible.
Option 3: Compare XML Schemas
This is what I went with, and it should be straight-forward right? Just dump the workspace schema from the existing geodatabase, and from a geodatabase generated from the current UML model, and run them through a diff program, and see what's different? Oh were it so simple.
The Coded Value Domain Problem:As part of the workspace schema export, coded value domains are written out, and this caused me some problems. Things may have been easier if we followed ALL geodatabase UML design guidelines. And we do for most things - just not Coded Value Domains. It's not that we don't create the domains in the model, we do. We just don't populate all of them. In our experience we've found that the domain values you get from a client at the beginning of a project are different from those needed at the end. Instead of editing & re-applying the UML model a zillion times (which is a pain, and requires that everything be un-versioned), we simply load the domains out of SQL tables that hold the values.
Anyhow - the problem is that the "real" geodatabase has all the domain values which have been loaded from the tables, and in the "new" geodatabase, the domains are empty, which means I can't compare the XML files directly.
Transforming the XML Export:Since the XML schema has more information than I wanted, I figured I should be able to just extract out the information I was interested in. ( I should say here that I'm not an XML guru by any means).
After a little quality time with
XMLSpy, I cooked up an
xml transform (XSLT) that parses out the information I want - Featureclass name, the fields, their name, type, length, and domain if applicable.
After this, I thought I was good to go. Nope. Apparently the ESRI XML schema exporter writes out the xml in some order other than alphebetical by featureclass name. Thus, the order of the feature classes in the output files are not the same, so a simple diff will not work. XML.com had a
simple example of sorting in xsl, and after sorting the featureclasses by name, and the fields by name I had something to work with.
So, here's what it outputs:

At this point I started looking for a simple diff tool so I could locate the differences between the files. Before I got too far on that, I noticed that the XML from the "live" geodatabase was 89k, and the XML from the geodatabase generated from the model was 59k. Not good.
It turns out that we had a whole bunch of new tables registered with the geodatabase. Part of the system involves the collection of very detailed forest information - measurements of indivudal trees. Once this data is loaded into the system, a bunch of processing is applied to the raw tree values to calculate timber volumes. All the tables used to hold the outputs from these processes were not modeled in the UML. Uggh. :-|
Since we don't have time to spend days entering all these tables into the model, and updating the model is not a priority for our client, it was time to figure out how to "nicely" merge two UML models into one operational Geodatabase...