.. include:: /includes.rst.txt

.. _data_migration_best_practices:

*************************************
Best Practices for EDG Data Migration
*************************************

.. Version: 1.1
.. Date: 2025-03-06
   
Introduction
============

This document describes best practices for migrating data between EDG servers.
Most commonly, this supports promoting asset collections from a Development server,
to a Testing/Staging server, and finally to a Production server.

In many deployments:

* Ongoing stewardship and editing of operational data occurs in **Prod**
* New ontologies, taxonomies, ADS scripts, and EDG customizations are developed in **Dev**
* Some organizations use **Test / Staging** to validate changes prior to Prod deployment

These same migration approaches can also be used to **reset** Dev or Test by
reloading asset collections from Prod (e.g., to start a new development cycle).

This can be important when UUIDs appear in URIs, because creating the “same” thing
(e.g., a ``Product`` class) on multiple servers can result in different UUID-based URIs.

Several migration approaches are described below. The best choice depends on:

* Whether the target server already has a version of the collection(s)
* Whether **EDG Change History** must be migrated
* Whether you need “replace” vs “merge” behavior
* Whether you want repeatable automation vs a UI-driven process

All approaches described use only out-of-the-box EDG features.

.. note::

   These approaches assume execution by a **System Administrator** with access to
   the required asset collections on the server.

Approach 1: Send Projects
=========================

The **Send Projects** approach is best when you want to migrate complete EDG projects
and optionally include **Change History graphs**.

Send Projects using EDG UI
--------------------------

The *Send Projects to Another Server* feature is available under:

*Server Administration → Send Projects to Another Server*

Key steps:

* Configure the target server URL (for example ``https://testserver.company.com/edg/``)
* Provide target credentials
* Select the asset collections to send (expand the **Repositories** folder)

Best practices:

* **Do not** select the entire Repositories folder (data safety + performance)
* EDG does **not** automatically include collections referenced via *includes*
  (select the full set explicitly)
* When sending asset collections, select **Also send database triples**
* Prefer **replace** (clear destination) over merge when re-sending a collection:

  * Select: *Clear the destination project of triples before sending triples*

Change History migration:

* A collection’s Change History graph (team graph) may be sent alongside the collection
* Team graph identifiers typically appear as ``.tch`` plus underlying file type

.. note::

   The *Send Projects* feature does not automatically send included collections.
   Ensure you select the complete dependency set, including required team graphs.

Send Projects using Python script
---------------------------------

The Send Projects feature can be called programmatically via the ``sendProjects`` service.

The typical pattern:

* Maintain a JSON file listing the projects / graphs to send
* Invoke a Python script that posts to the source server’s ``/tbl/sendProjects`` endpoint

Example invocation:

* ``python sendProjects.py --url_source ... --url_target ... --sendTriples true --clearGraph true``

.. warning::

   EDG deployed on Tomcat commonly uses URLs that include ``/edg/`` while EDG Studio
   does not. Take care to use the correct base URL structure when testing locally.

Example ``parameters.json``
^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: json

   {
     "file-/Repositories/ontology_1.tch.xdb": "true",
     "file-/Repositories/ontology_1.xdb": "true",
     "file-/Repositories/taxonomy_1.tch.xdb": "true",
     "file-/Repositories/taxonomy_1.xdb": "true"
   }

Example Python script
^^^^^^^^^^^^^^^^^^^^^

.. code-block:: python

   import requests
   import json
   import argparse
   from requests.auth import HTTPBasicAuth

   parser = argparse.ArgumentParser(description='Send files/asset collections from one EDG server to another.')

   parser.add_argument('--url_source', required=True, help='The source URL, e.g. localhost:8083')
   parser.add_argument('--url_target', required=True, help='The target URL, e.g. localhost:8080')
   parser.add_argument('--source_username', required=True, help='Username for source authentication')
   parser.add_argument('--source_password', required=True, help='Password for source authentication')
   parser.add_argument('--target_username', required=True, help='Username for target authentication')
   parser.add_argument('--target_password', required=True, help='Password for target authentication')
   parser.add_argument('--sendTriples', required=True, help='Send triples (when sending a collection)')
   parser.add_argument('--clearGraph', required=True, help='Clear triples when sending to an existing asset collection')
   args = parser.parse_args()

   with open('parameters.json', 'r') as file:
       parameters = json.load(file)

   parameters['userName'] = args.source_username
   parameters['password'] = args.source_password
   parameters['serverURL'] = f"{args.url_target}/tbl"
   parameters['sendTriples'] = args.sendTriples
   parameters['clearGraph'] = args.clearGraph

   url = f"{args.url_source}/tbl/sendProjects"
   response = requests.post(url, data=parameters, auth=HTTPBasicAuth(args.target_username, args.target_password))

   print("Status Code:", response.status_code)
   print("Response Body:", response.text)

Download script here: :download:`sendProjects_approach1.py <python_scripts/sendProjects_approach1.py>`

Approach 2: Export/Import Zip for New Collection Sets
=====================================================

This approach is best when:

* You want to migrate **new** collections to a target environment
* You want to include *included* collections in the same package
* You do **not** need to migrate Change History graphs

Export/Import Zip using EDG UI
------------------------------

Use the Export tab of an asset collection:

* *Export <collection type> with Includes as a File*
* Choose *Zip File without system graphs*

This produces a ZIP containing each included collection as Turtle (TTL).

Best practices:

* Use the option **without system graphs**
* Be aware: **Change History is not exported** via this feature

Import is performed via:

* **New+ → Import Asset Collection from Trig or Zip File**

Notes:

* Collections that already exist on the target server are typically ignored
  (only “new” collections are created)
* For initial migrations, a common pattern is to create an **artificial umbrella collection**
  that includes all collections to be migrated, export that, import to target, then delete
  the umbrella collection

.. important::

   If Change History migration is required, use **Approach 1 (Send Projects)** or
   **Approach 3 (Export RDF / Import RDF)**.

Export/Import Asset Collection Zips using Python script
-------------------------------------------------------

Collections can be bulk exported and imported using EDG ZIP APIs:

* Export via ``/datasetZip`` from the source
* Import via the EDG upload service endpoint on the target

Example input file (one collection per line)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: text

   geo
   kennedy_family

Example Python script
^^^^^^^^^^^^^^^^^^^^^

.. code-block:: python

   import argparse
   import requests
   from requests.auth import HTTPBasicAuth
   import os

   def export_edg_zip(base_value, base_uri, username, password, includeSystemTriples, excludeSystemGraphs):
       filename = f"{base_value}.zip"
       endpoint = "/datasetZip"
       params = {
           "_base": f"urn:x-evn-master:{base_value}",
           "includeSystemTriples": includeSystemTriples,
           "excludeSystemGraphs": excludeSystemGraphs,
       }

       full_url = f"{base_uri}{endpoint}"

       try:
           response = requests.get(full_url, params=params, auth=HTTPBasicAuth(username, password), timeout=30)
           response.raise_for_status()
       except requests.exceptions.RequestException as e:
           print(f"Error: Failed to make the EDG API request: {e}")
           return

       with open(filename, "wb") as file:
           file.write(response.content)
       print(f'Response saved to "{filename}"')

   def import_zip_file(file_name, base_url, username, password):
       url = f"{base_url}/swp"
       file_path = os.path.join("", f"{file_name}.zip")

       if not os.path.exists(file_path):
           print(f"Error: File '{file_path}' not found.")
           return

       with open(file_path, 'rb') as file:
           files = {'filePath': (f"{file_name}.zip", file, 'application/zip; charset=utf-8')}
           data = {
               "_fileUpload": "true",
               "_viewClass": "http://topbraid.org/teamwork#ImportDatasetFileService",
               "trig": "false"
           }

           response = requests.post(
               url, files=files, data=data,
               auth=HTTPBasicAuth(username, password),
               timeout=30
           )

           if response.status_code == 200:
               print(f"Success: File '{file_name}.zip' uploaded successfully.")
           else:
               print(f"Error: Upload failed with status code {response.status_code}.")
               print("Target server response:", response.text)

       os.remove(file_path)
       print(f"Deleted the file '{file_name}.zip' after import.")

   def process_collections_list(collections_file, url_source, url_target,
                               source_username, source_password, target_username, target_password,
                               includeSystemTriples, excludeSystemGraphs):
       with open(collections_file, 'r') as file:
           collections = [line.strip() for line in file.readlines() if line.strip()]

       for base_value in collections:
           print(f"Processing collection: {base_value}")
           export_edg_zip(base_value, url_source, source_username, source_password, includeSystemTriples, excludeSystemGraphs)
           import_zip_file(base_value, url_target, target_username, target_password)

   if __name__ == "__main__":
       parser = argparse.ArgumentParser(description="Call EDG ZIP export and import APIs with basic authentication.")
       parser.add_argument("--collections_file", required=True)
       parser.add_argument("--url_source", required=True)
       parser.add_argument("--url_target", required=True)
       parser.add_argument("--source_username", required=True)
       parser.add_argument("--source_password", required=True)
       parser.add_argument("--target_username", required=True)
       parser.add_argument("--target_password", required=True)
       parser.add_argument("--includeSystemTriples", choices=['true', 'false'], default='false')
       parser.add_argument("--excludeSystemGraphs", choices=['true', 'false'], default='true')
       args = parser.parse_args()

       process_collections_list(
           args.collections_file,
           args.url_source,
           args.url_target,
           args.source_username,
           args.source_password,
           args.target_username,
           args.target_password,
           args.includeSystemTriples == 'true',
           args.excludeSystemGraphs == 'true'
       )

Download script here: :download:`sendProjects_approach2.py <python_scripts/sendProjects_approach2.py>`

.. warning::

   If multiple collections include the same large dependency collection, ZIP export
   may download that dependency repeatedly.

Approach 3: Export RDF / Import RDF for Incremental Updates
===========================================================

This approach is best when:

* The target already has corresponding collections (same graph IDs)
* You want **incremental updates** or controlled replacement
* You may need to preserve or create Change History depending on import options

Export RDF / Import RDF via Swagger
-----------------------------------

The Export RDF and Import RDF APIs are visible in the EDG Swagger interface
(typically accessible from an asset collection under the **Reports** tab).

Typical usage pattern:

* Export RDF from a source collection
* Ensure the target has an asset collection with the same graph ID
* Import RDF into the target collection

.. note::

   To capture both additions and deletions during import, prefer using
   import options such as *Replace previous contents* when appropriate.

Export RDF / Import RDF using EDG APIs (Python)
-----------------------------------------------

The following pattern is used:

* Determine each asset collection type
* Create matching collections on the target (if needed)
* Export RDF from the source
* Import RDF into the target

Example input file
^^^^^^^^^^^^^^^^^^

.. code-block:: text

   geography_ontology
   geo

Example Python script
^^^^^^^^^^^^^^^^^^^^^

.. code-block:: python

   import argparse
   import requests
   from requests.auth import HTTPBasicAuth

   def asset_collection_type(edg_url, graphId, username, password):
       url = f"{edg_url}/tbl/service/{graphId}/tbs/assetCollectionType"
       response = requests.post(url, headers={'accept': 'application/json'}, auth=HTTPBasicAuth(username, password))
       response.raise_for_status()
       return response.content

   def export_rdf(edg_url, graphID, username, password, exclude_values_rules, fmt, include_inferences, keep_edg_triples):
       url = f"{edg_url}/tbl/service/{graphID}/tbs/exportRDFFile"
       params = {
           "excludeValuesRules": str(exclude_values_rules).lower(),
           "format": fmt,
           "includeInferences": str(include_inferences).lower(),
           "keepEDGTriples": str(keep_edg_triples).lower(),
       }
       response = requests.get(url, auth=HTTPBasicAuth(username, password), params=params)
       response.raise_for_status()
       return response.content

   def create_asset_collection(edg_url, graphID, type_label, username, password):
       url = f"{edg_url}/tbl/service/_/tbs/createAssetCollection"
       params = {'defaultNamespace': 'www.temporary.org', 'id': graphID, 'name': graphID, 'typeLabel': type_label}
       response = requests.post(url, auth=HTTPBasicAuth(username, password), params=params)
       response.raise_for_status()
       return True

   def import_rdf(edg_url, graphID, username, password, rdf_content, fmt='turtle'):
       url = f"{edg_url}/tbl/service/{graphID}/tbs/importRDFFile"
       files = {
           "file": (f"{graphID}.ttl", rdf_content, "text/turtle"),
           "fileName": (None, f"{graphID}.ttl"),
           "format": (None, fmt),
       }
       response = requests.post(url, files=files, auth=HTTPBasicAuth(username, password), headers={"accept": "application/json"})
       response.raise_for_status()
       return True

   def process_list(collections_file, url_source, url_target,
                    source_username, source_password, target_username, target_password,
                    exclude_values_rules, fmt, include_inferences, keep_edg_triples):

       with open(collections_file, 'r') as file:
           collections = [line.strip() for line in file.readlines() if line.strip()]

       for graph_id in collections:
           rdf = export_rdf(url_source, graph_id, source_username, source_password, exclude_values_rules, fmt, include_inferences, keep_edg_triples)
           type_label = asset_collection_type(url_source, graph_id, source_username, source_password)
           create_asset_collection(url_target, graph_id, type_label, target_username, target_password)
           import_rdf(url_target, graph_id, target_username, target_password, rdf, fmt=fmt)
           print(f"Migrated: {graph_id}")

   if __name__ == "__main__":
       parser = argparse.ArgumentParser(description="Export RDF from EDG and import into another EDG instance.")
       parser.add_argument("--collections_file", required=True)
       parser.add_argument("--url_source", required=True)
       parser.add_argument("--url_target", required=True)
       parser.add_argument("--source_username", required=True)
       parser.add_argument("--source_password", required=True)
       parser.add_argument("--target_username", required=True)
       parser.add_argument("--target_password", required=True)
       parser.add_argument("--excludeValuesRules", type=bool, default=True)
       parser.add_argument("--format", type=str, default="turtle")
       parser.add_argument("--includeInferences", type=bool, default=True)
       parser.add_argument("--keepEDGTriples", type=bool, default=True)
       args = parser.parse_args()

       process_list(
           args.collections_file,
           args.url_source,
           args.url_target,
           args.source_username,
           args.source_password,
           args.target_username,
           args.target_password,
           args.excludeValuesRules,
           args.format,
           args.includeInferences,
           args.keepEDGTriples
       )

Download script here: :download:`sendProjects_approach3.py <python_scripts/sendProjects_approach3.py>`

.. warning::

   Included asset collections are not automatically migrated by this approach.
   Ensure the input list includes all required dependencies.

Approach 4: Git Integration (EDG 8.3+)
======================================

EDG 8.3 and later include improved Git support that can support migration and
promotion workflows.

A common pattern is to use:

* A single Git repository with multiple branches (e.g., Dev/Test/Prod)
* Push/pull from EDG to Git rather than server-to-server transfers

Git Configuration in EDG
------------------------

Configure Git repositories under:

*Product Configuration → Git Integration*

Best practices:

* Configure a separate repository entry per target (e.g., Dev and Prod)
* Add a Git repository password (token-based authentication)
* Configure which EDG users may access each repository

Linking an Asset Collection to Git in EDG
-----------------------------------------

After Git is configured:

* Open the asset collection to migrate
* Use the cloud icon to **Link to File on Git**
* Choose a new file name (e.g., ``geo.ttl``) or select an existing file
* Use **push** to write the collection to Git
* In the target EDG, create a corresponding collection and link it to the same file
* Use **pull** to populate the collection from Git

.. important::

   Only one asset collection can be connected to a given Git file at a time.

You can remove the connection via the Git integration instance (Modify → Delete),
and the form can also show metadata such as last push/pull execution.

Determining Changes Between Collections
=======================================

Collections can be compared in EDG or externally. Common validation use cases include:

* Verifying changes before migration
* Confirming a promotion package
* Reviewing additions and deletions between versions

The main approaches are:

* EDG Comparison Report
* EDG Workflow and Workflow Reports
* Git diff in GitHub

Comparison Report
-----------------

The **Comparison Report** is available under the Reports tab of an asset collection.

Typical usage:

* Open the source collection
* Navigate to *Reports → Comparison Report*
* Select the target collection from the dropdown
* Review additions and deletions detected by EDG

This approach is fast and requires no export.

EDG Workflow and Workflow Reports
---------------------------------

A workflow can be used to create a controlled, reviewable change set:

* Export the updated collection as Turtle (TTL)
* In the original collection, create a new workflow (e.g., Basic Workflow)
* Use *Make Changes → Import* within the workflow
* To capture additions and deletions, select *Replace previous contents* during import

Review locations:

* Workflow Reports panel: structured list of detected changes
* Workflow Preview panel: git-style diff view

This approach supports approve/reject governance before committing changes.

Git diff in GitHub
------------------

If you push versions to Git:

* Push the original collection to Git
* Apply updates (or import a new version), then push again
* Use GitHub diff between commits to review changes

This is especially useful for teams already using Git-based review processes.

.. seealso::

   EDG environment promotion patterns and governance controls are covered in
   :ref:`dev_test_prod_servers_best_practices`.