boxes to fill intype of boxwhat is in the
drop-down menu?
what is this? which data set is it attached to?
what kind of information goes there?
how do we know what to choose?
Entity namefree-formenter, a name, often matching an entity in SEAD that we have not yet created an entity for in this project
Typedrop-down menuData (derived)
SQL Query
Fixed Values
CSV File
Excel File (Pandas)
Excel File (OpenPyXL)
from where are we going to extract the data to transform?
Source Entity
(only present for certain Types above)
drop-down menu(drop down contents
change based on Type)
the incoming data set
System IDpre-setdon’t enter, it fills in self
Public IDfree-formthe name of the ID in SEAD’s system, so for sites, enter site_id
Business Keysfree-formthe list of unique set of columns in the incoming data that are needed to uniquely define each row of data for the data set
Columnsfree-form
Depends Onfree-formused to define dependencies. In some datasets every site has to have a defined site type. the “before you can do X you need to know Y” sorts of information

note: if you chose type “derived” then the dependency is already defined on what you derived it from

If this step is done wrong, the validation will fail
Drop Duplicatescheck boxthis step is how you get a complete list of each unique catagory of data. If we are making a list of all sites in a dataset, How needed this is will depend on the incoming dataset. The list of columns works like Excel’s drop duplicates
Check Functional Dependencycheck boxThis button asks the program to confirm that if one considers only the columns entered in the “drop duplicates” column that three aren’t examples of (for example) a single site with multipile coordinate systems
Drop Empty Rowscheck boxuseful if there were blank rows
Deduplication Columnsfree-formdefine the columns needed to get a unique set (example site name plus national site id )
Colums to Check for Empty Valuesfree-formuse for columns so important that that it doesn’t mater if data exists in other columns
SQL Query (only present if Type = SQL Query)free-form(note: use the )write a query that extracts all the pertanate columns from the data set for the table you are working on
Data (derived) we already have data in an entity we have already defined in Shape Shifter
SQL Query use if you want to use an SQL query to extract data from a data source
Fixed Values
CSV File
Excel File (Pandas)
Excel File (OpenPyXL)

ETL = extracts data, transforms it, and loads it

So for SQL we write a query to extract all of the site related data(for example)

Fixed values: use if you have data on paper, and it needs entering to be machine readable, or it is a data type that doesn’t exist in the incoming dataset, but is required in SEAD