.. include:: /includes.rst.txt .. _data_migration_best_practices: ************************************* Best Practices for EDG Data Migration ************************************* .. Version: 1.1 .. Date: 2025-03-06 Introduction ============ This document describes best practices for migrating data between EDG servers. Most commonly, this supports promoting asset collections from a Development server, to a Testing/Staging server, and finally to a Production server. In many deployments: * Ongoing stewardship and editing of operational data occurs in **Prod** * New ontologies, taxonomies, ADS scripts, and EDG customizations are developed in **Dev** * Some organizations use **Test / Staging** to validate changes prior to Prod deployment These same migration approaches can also be used to **reset** Dev or Test by reloading asset collections from Prod (e.g., to start a new development cycle). This can be important when UUIDs appear in URIs, because creating the “same” thing (e.g., a ``Product`` class) on multiple servers can result in different UUID-based URIs. Several migration approaches are described below. The best choice depends on: * Whether the target server already has a version of the collection(s) * Whether **EDG Change History** must be migrated * Whether you need “replace” vs “merge” behavior * Whether you want repeatable automation vs a UI-driven process All approaches described use only out-of-the-box EDG features. .. note:: These approaches assume execution by a **System Administrator** with access to the required asset collections on the server. Approach 1: Send Projects ========================= The **Send Projects** approach is best when you want to migrate complete EDG projects and optionally include **Change History graphs**. Send Projects using EDG UI -------------------------- The *Send Projects to Another Server* feature is available under: *Server Administration → Send Projects to Another Server* Key steps: * Configure the target server URL (for example ``https://testserver.company.com/edg/``) * Provide target credentials * Select the asset collections to send (expand the **Repositories** folder) Best practices: * **Do not** select the entire Repositories folder (data safety + performance) * EDG does **not** automatically include collections referenced via *includes* (select the full set explicitly) * When sending asset collections, select **Also send database triples** * Prefer **replace** (clear destination) over merge when re-sending a collection: * Select: *Clear the destination project of triples before sending triples* Change History migration: * A collection’s Change History graph (team graph) may be sent alongside the collection * Team graph identifiers typically appear as ``.tch`` plus underlying file type .. note:: The *Send Projects* feature does not automatically send included collections. Ensure you select the complete dependency set, including required team graphs. Send Projects using Python script --------------------------------- The Send Projects feature can be called programmatically via the ``sendProjects`` service. The typical pattern: * Maintain a JSON file listing the projects / graphs to send * Invoke a Python script that posts to the source server’s ``/tbl/sendProjects`` endpoint Example invocation: * ``python sendProjects.py --url_source ... --url_target ... --sendTriples true --clearGraph true`` .. warning:: EDG deployed on Tomcat commonly uses URLs that include ``/edg/`` while EDG Studio does not. Take care to use the correct base URL structure when testing locally. Example ``parameters.json`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: json { "file-/Repositories/ontology_1.tch.xdb": "true", "file-/Repositories/ontology_1.xdb": "true", "file-/Repositories/taxonomy_1.tch.xdb": "true", "file-/Repositories/taxonomy_1.xdb": "true" } Example Python script ^^^^^^^^^^^^^^^^^^^^^ .. code-block:: python import requests import json import argparse from requests.auth import HTTPBasicAuth parser = argparse.ArgumentParser(description='Send files/asset collections from one EDG server to another.') parser.add_argument('--url_source', required=True, help='The source URL, e.g. localhost:8083') parser.add_argument('--url_target', required=True, help='The target URL, e.g. localhost:8080') parser.add_argument('--source_username', required=True, help='Username for source authentication') parser.add_argument('--source_password', required=True, help='Password for source authentication') parser.add_argument('--target_username', required=True, help='Username for target authentication') parser.add_argument('--target_password', required=True, help='Password for target authentication') parser.add_argument('--sendTriples', required=True, help='Send triples (when sending a collection)') parser.add_argument('--clearGraph', required=True, help='Clear triples when sending to an existing asset collection') args = parser.parse_args() with open('parameters.json', 'r') as file: parameters = json.load(file) parameters['userName'] = args.source_username parameters['password'] = args.source_password parameters['serverURL'] = f"{args.url_target}/tbl" parameters['sendTriples'] = args.sendTriples parameters['clearGraph'] = args.clearGraph url = f"{args.url_source}/tbl/sendProjects" response = requests.post(url, data=parameters, auth=HTTPBasicAuth(args.target_username, args.target_password)) print("Status Code:", response.status_code) print("Response Body:", response.text) Download script here: :download:`sendProjects_approach1.py ` Approach 2: Export/Import Zip for New Collection Sets ===================================================== This approach is best when: * You want to migrate **new** collections to a target environment * You want to include *included* collections in the same package * You do **not** need to migrate Change History graphs Export/Import Zip using EDG UI ------------------------------ Use the Export tab of an asset collection: * *Export with Includes as a File* * Choose *Zip File without system graphs* This produces a ZIP containing each included collection as Turtle (TTL). Best practices: * Use the option **without system graphs** * Be aware: **Change History is not exported** via this feature Import is performed via: * **New+ → Import Asset Collection from Trig or Zip File** Notes: * Collections that already exist on the target server are typically ignored (only “new” collections are created) * For initial migrations, a common pattern is to create an **artificial umbrella collection** that includes all collections to be migrated, export that, import to target, then delete the umbrella collection .. important:: If Change History migration is required, use **Approach 1 (Send Projects)** or **Approach 3 (Export RDF / Import RDF)**. Export/Import Asset Collection Zips using Python script ------------------------------------------------------- Collections can be bulk exported and imported using EDG ZIP APIs: * Export via ``/datasetZip`` from the source * Import via the EDG upload service endpoint on the target Example input file (one collection per line) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: text geo kennedy_family Example Python script ^^^^^^^^^^^^^^^^^^^^^ .. code-block:: python import argparse import requests from requests.auth import HTTPBasicAuth import os def export_edg_zip(base_value, base_uri, username, password, includeSystemTriples, excludeSystemGraphs): filename = f"{base_value}.zip" endpoint = "/datasetZip" params = { "_base": f"urn:x-evn-master:{base_value}", "includeSystemTriples": includeSystemTriples, "excludeSystemGraphs": excludeSystemGraphs, } full_url = f"{base_uri}{endpoint}" try: response = requests.get(full_url, params=params, auth=HTTPBasicAuth(username, password), timeout=30) response.raise_for_status() except requests.exceptions.RequestException as e: print(f"Error: Failed to make the EDG API request: {e}") return with open(filename, "wb") as file: file.write(response.content) print(f'Response saved to "{filename}"') def import_zip_file(file_name, base_url, username, password): url = f"{base_url}/swp" file_path = os.path.join("", f"{file_name}.zip") if not os.path.exists(file_path): print(f"Error: File '{file_path}' not found.") return with open(file_path, 'rb') as file: files = {'filePath': (f"{file_name}.zip", file, 'application/zip; charset=utf-8')} data = { "_fileUpload": "true", "_viewClass": "http://topbraid.org/teamwork#ImportDatasetFileService", "trig": "false" } response = requests.post( url, files=files, data=data, auth=HTTPBasicAuth(username, password), timeout=30 ) if response.status_code == 200: print(f"Success: File '{file_name}.zip' uploaded successfully.") else: print(f"Error: Upload failed with status code {response.status_code}.") print("Target server response:", response.text) os.remove(file_path) print(f"Deleted the file '{file_name}.zip' after import.") def process_collections_list(collections_file, url_source, url_target, source_username, source_password, target_username, target_password, includeSystemTriples, excludeSystemGraphs): with open(collections_file, 'r') as file: collections = [line.strip() for line in file.readlines() if line.strip()] for base_value in collections: print(f"Processing collection: {base_value}") export_edg_zip(base_value, url_source, source_username, source_password, includeSystemTriples, excludeSystemGraphs) import_zip_file(base_value, url_target, target_username, target_password) if __name__ == "__main__": parser = argparse.ArgumentParser(description="Call EDG ZIP export and import APIs with basic authentication.") parser.add_argument("--collections_file", required=True) parser.add_argument("--url_source", required=True) parser.add_argument("--url_target", required=True) parser.add_argument("--source_username", required=True) parser.add_argument("--source_password", required=True) parser.add_argument("--target_username", required=True) parser.add_argument("--target_password", required=True) parser.add_argument("--includeSystemTriples", choices=['true', 'false'], default='false') parser.add_argument("--excludeSystemGraphs", choices=['true', 'false'], default='true') args = parser.parse_args() process_collections_list( args.collections_file, args.url_source, args.url_target, args.source_username, args.source_password, args.target_username, args.target_password, args.includeSystemTriples == 'true', args.excludeSystemGraphs == 'true' ) Download script here: :download:`sendProjects_approach2.py ` .. warning:: If multiple collections include the same large dependency collection, ZIP export may download that dependency repeatedly. Approach 3: Export RDF / Import RDF for Incremental Updates =========================================================== This approach is best when: * The target already has corresponding collections (same graph IDs) * You want **incremental updates** or controlled replacement * You may need to preserve or create Change History depending on import options Export RDF / Import RDF via Swagger ----------------------------------- The Export RDF and Import RDF APIs are visible in the EDG Swagger interface (typically accessible from an asset collection under the **Reports** tab). Typical usage pattern: * Export RDF from a source collection * Ensure the target has an asset collection with the same graph ID * Import RDF into the target collection .. note:: To capture both additions and deletions during import, prefer using import options such as *Replace previous contents* when appropriate. Export RDF / Import RDF using EDG APIs (Python) ----------------------------------------------- The following pattern is used: * Determine each asset collection type * Create matching collections on the target (if needed) * Export RDF from the source * Import RDF into the target Example input file ^^^^^^^^^^^^^^^^^^ .. code-block:: text geography_ontology geo Example Python script ^^^^^^^^^^^^^^^^^^^^^ .. code-block:: python import argparse import requests from requests.auth import HTTPBasicAuth def asset_collection_type(edg_url, graphId, username, password): url = f"{edg_url}/tbl/service/{graphId}/tbs/assetCollectionType" response = requests.post(url, headers={'accept': 'application/json'}, auth=HTTPBasicAuth(username, password)) response.raise_for_status() return response.content def export_rdf(edg_url, graphID, username, password, exclude_values_rules, fmt, include_inferences, keep_edg_triples): url = f"{edg_url}/tbl/service/{graphID}/tbs/exportRDFFile" params = { "excludeValuesRules": str(exclude_values_rules).lower(), "format": fmt, "includeInferences": str(include_inferences).lower(), "keepEDGTriples": str(keep_edg_triples).lower(), } response = requests.get(url, auth=HTTPBasicAuth(username, password), params=params) response.raise_for_status() return response.content def create_asset_collection(edg_url, graphID, type_label, username, password): url = f"{edg_url}/tbl/service/_/tbs/createAssetCollection" params = {'defaultNamespace': 'www.temporary.org', 'id': graphID, 'name': graphID, 'typeLabel': type_label} response = requests.post(url, auth=HTTPBasicAuth(username, password), params=params) response.raise_for_status() return True def import_rdf(edg_url, graphID, username, password, rdf_content, fmt='turtle'): url = f"{edg_url}/tbl/service/{graphID}/tbs/importRDFFile" files = { "file": (f"{graphID}.ttl", rdf_content, "text/turtle"), "fileName": (None, f"{graphID}.ttl"), "format": (None, fmt), } response = requests.post(url, files=files, auth=HTTPBasicAuth(username, password), headers={"accept": "application/json"}) response.raise_for_status() return True def process_list(collections_file, url_source, url_target, source_username, source_password, target_username, target_password, exclude_values_rules, fmt, include_inferences, keep_edg_triples): with open(collections_file, 'r') as file: collections = [line.strip() for line in file.readlines() if line.strip()] for graph_id in collections: rdf = export_rdf(url_source, graph_id, source_username, source_password, exclude_values_rules, fmt, include_inferences, keep_edg_triples) type_label = asset_collection_type(url_source, graph_id, source_username, source_password) create_asset_collection(url_target, graph_id, type_label, target_username, target_password) import_rdf(url_target, graph_id, target_username, target_password, rdf, fmt=fmt) print(f"Migrated: {graph_id}") if __name__ == "__main__": parser = argparse.ArgumentParser(description="Export RDF from EDG and import into another EDG instance.") parser.add_argument("--collections_file", required=True) parser.add_argument("--url_source", required=True) parser.add_argument("--url_target", required=True) parser.add_argument("--source_username", required=True) parser.add_argument("--source_password", required=True) parser.add_argument("--target_username", required=True) parser.add_argument("--target_password", required=True) parser.add_argument("--excludeValuesRules", type=bool, default=True) parser.add_argument("--format", type=str, default="turtle") parser.add_argument("--includeInferences", type=bool, default=True) parser.add_argument("--keepEDGTriples", type=bool, default=True) args = parser.parse_args() process_list( args.collections_file, args.url_source, args.url_target, args.source_username, args.source_password, args.target_username, args.target_password, args.excludeValuesRules, args.format, args.includeInferences, args.keepEDGTriples ) Download script here: :download:`sendProjects_approach3.py ` .. warning:: Included asset collections are not automatically migrated by this approach. Ensure the input list includes all required dependencies. Approach 4: Git Integration (EDG 8.3+) ====================================== EDG 8.3 and later include improved Git support that can support migration and promotion workflows. A common pattern is to use: * A single Git repository with multiple branches (e.g., Dev/Test/Prod) * Push/pull from EDG to Git rather than server-to-server transfers Git Configuration in EDG ------------------------ Configure Git repositories under: *Product Configuration → Git Integration* Best practices: * Configure a separate repository entry per target (e.g., Dev and Prod) * Add a Git repository password (token-based authentication) * Configure which EDG users may access each repository Linking an Asset Collection to Git in EDG ----------------------------------------- After Git is configured: * Open the asset collection to migrate * Use the cloud icon to **Link to File on Git** * Choose a new file name (e.g., ``geo.ttl``) or select an existing file * Use **push** to write the collection to Git * In the target EDG, create a corresponding collection and link it to the same file * Use **pull** to populate the collection from Git .. important:: Only one asset collection can be connected to a given Git file at a time. You can remove the connection via the Git integration instance (Modify → Delete), and the form can also show metadata such as last push/pull execution. Determining Changes Between Collections ======================================= Collections can be compared in EDG or externally. Common validation use cases include: * Verifying changes before migration * Confirming a promotion package * Reviewing additions and deletions between versions The main approaches are: * EDG Comparison Report * EDG Workflow and Workflow Reports * Git diff in GitHub Comparison Report ----------------- The **Comparison Report** is available under the Reports tab of an asset collection. Typical usage: * Open the source collection * Navigate to *Reports → Comparison Report* * Select the target collection from the dropdown * Review additions and deletions detected by EDG This approach is fast and requires no export. EDG Workflow and Workflow Reports --------------------------------- A workflow can be used to create a controlled, reviewable change set: * Export the updated collection as Turtle (TTL) * In the original collection, create a new workflow (e.g., Basic Workflow) * Use *Make Changes → Import* within the workflow * To capture additions and deletions, select *Replace previous contents* during import Review locations: * Workflow Reports panel: structured list of detected changes * Workflow Preview panel: git-style diff view This approach supports approve/reject governance before committing changes. Git diff in GitHub ------------------ If you push versions to Git: * Push the original collection to Git * Apply updates (or import a new version), then push again * Use GitHub diff between commits to review changes This is especially useful for teams already using Git-based review processes. .. seealso:: EDG environment promotion patterns and governance controls are covered in :ref:`dev_test_prod_servers_best_practices`.