Virtual Metadata Example

A use case for this function might be when you want to access a large amount of customer information at one time, such as profile data, purchasing history, and so on. Storing and replicating all of this information "physically" in a Data Hub model presents potential barriers:
  • The volume of some data is too large or too widely distributed to be contained in a central hub.
  • The data changes frequently and would need to be replicated in the hub too often to be practical.
  • There are business or privacy concerns over sensitive information such as Social Security numbers, payroll figures, pricing data, and so on.

By storing some of the information externally and then accessing it virtually, you are still able to obtain a complete view of your customer while mitigating the concerns of duplicating data in the Data Hub.

The Data Hub Model

You have an existing model containing two types of entities, Customer and CustomerMaster, with a relationship of HasMasterRecord.

Customer entities contain the following information:
  • Customer ID (unique for each customer)
  • Name
  • Date of birth
  • Gold Club status (indicates whether the customer is a member of the Gold Club)
CustomerMaster entities contain the following information:
  • Master ID (unique for each household)
  • Customer ID
  • Name
  • Date of birth

The image below shows a model that is built using this metadata. Notice that George and Martha Washington have separate Customer entities, but they share the same CustomerMaster entity, because the master ID represents a household, not an individual.

The External Tables

We want to link customers to their purchase history, which we will create using three external tables—one for customers, one for purchases, and one for products.

We use the same information for the Customers table as we did to populate the Customer entities:

The Purchases table contains order numbers, product IDs, and purchase dates, and it indicates whether the purchase was made online. It ties this data to the customer ID, which was also used in the Customer table.

The Products table contains product names, model numbers, and descriptions. It ties this data to the product ID, which was also used in the Purchases table.

The Purchases table will serve as a join table that establishes connections between records in the Customers and Products tables. The image below illustrates the relationships among the various fields in the three external tables.

Integrating all of this information would produce the following record for Martha Washington:

Linking Physical and Virtual Data

To create relationships between entities in the Data Hub model and records in the external tables, the Data Hub entities must have properties that reference the external data. To link from the Data Hub model to records in the Customers table there must be a property in the external table that contains the same values that are stored in the entities' CustomerID field. You can see in the following image that the Martha Washington entity has a property called CustRef, which contains the same value as the CustomerID field in the Customers table.

To establish the link between the Customer entity and the external table, we must first create a virtual entity for the Customers table. The instructions for doing this start with Step 5 in the previous topic, "Using Virtual Metadata in Models." The following image shows the completed screen for the new entity type. Notice that "CustomerID" is selected as the Primary Key, which is the property used to link between the virtual and physical entities.

After the virtual entity is added, we need to add a relationship linking that entity to the physical entity already in the model. The instructions for doing this start with Step 14 in the previous topic. The following image shows the completed screen for the new relationship type. Notice that the Source entity ID refers to the CustRef property and the Target entity ID refers to the CustomerID property; this is the link that ties the virtual entity to the physical entity because both properties include the customer ID.

Also, in this example, the relationship between customers is a one-to-one relationship, meaning for every physical entity there should be at most one virtual entity and for each virtual entity there should be at most one physical entity. For one-to-one relationships, as well as one-to-many relationships, there is no need for a true Join table, so we select the Customers table, which is the same table that was used to define the virtual entity.

The metadata for the model now includes a link to the virtual data. Notice the blue star next to the ERP_Customer entity; this denotes that the entity is virtual.

Many-to-Many Relationships

Now that we have linked the virtual customer to the physical customer, we need to link purchases and products as well. The optimal approach is to create a virtual entity for products and create a many-to-many virtual relationship between physical Customers and virtual Products using the Purchases table as a join table. We first need to create a virtual entity for the Products table that uses the ProductID as the Primary Key.

This is followed by adding a new relationship that uses the Purchases table as a join table to achieve a many-to-many relationship; Customer entities are linked with Product entities based on information in the Purchases table. Notice that the CustomerID property is used as the Link ID for the Customer entities and the ProductID property is used as the Link ID for the Product entities; the Purchases table contains both of these properties, which is how it is able to map information from Customer to Product.

The Source entity ID (CustRef) is the property of the Customer entity that will be matched to the Source Link ID column (CustomerID) in the Purchases table. The Target Link ID (ProductID) is the column in the Purchases table that will be matched to the Target entity ID (also ProductID) property of the Products entity. The following diagram illustrates how each field is used to link Customer to Product.

When a query traverses from Customer to Product, it will begin with the property that was selected as the Source entity ID, which in our example is the CustRef property. For Martha Washington the value for this property is C23; this value will be used to find rows in the Purchases table where the Source Link ID (in our case, the CustomerID) has a value of C23.
The value in the Target Link ID (in our case, the ProductID) will be used to look up the corresponding row in the Products table. For each row returned from the Purchases table, the value of the ProductID field will be matched to the Target Entity ID property (in our case, the ProductID) of the Products table. The following diagram illustrates how each field is used throughout the flow.

Advanced Configuration

One alternative to the approach shown here is to use the Purchases table to create the many-to-many relationship between the virtual customer and the virtual product entities. Doing this involves an additional and unnecessary virtual hop from physical customer to virtual customer. The Data Hub is much more efficient at traversing relationships than SQL data sources because it is built upon a graph database; therefore, the extra hop should be avoided if possible.

A second alternative is to create virtual entities for the Purchases table in addition to those for the Products table. This approach would require one-to-many relationships between physical customer entities and virtual purchases entities as well as one-to-many relationships between virtual purchases and virtual products. This is inefficient because it creates unnecessary virtual hops to get from physical customer to virtual product. There is no advantage in representing a purchase as an entity because the relationship can hold all the Purchases fields as properties. The only reason to create an entity for Purchases is if it connects more than two entity types, such as if there were a third foreign key in the table linking purchases to a store.