Relating table using synthetic keys
The relationship between genus
and species
relies on the genus
attribute which has a string value. It is therefore
enough to construct a script using the appropriate values which are all present in the original spreadsheet. With country
and region
however it is a different story. A row in the region table must use the country_id
value from the related
row in country
. The problem is that we don't know what that value is before we insert the country
record because the
value is generated by the sequence.
There are two potential solutions to this problem. Using a subquery is the easier method, but is only possible if your data has an alternative key (another attribute or group of attributes besides the primary key that uniquely identify a row). Without an alternative key, the solution is to interleave the related insert statements so that you can query the most recently generated value for the primary key in the parent table.
In both solutions, you still need to prepare the scripts using Excel as you did for genus
and species
. The notes
below will omit the details steps for doing this on the assumption that you can go back to earlier pages if you need to.
Using a subquery
A spreadsheet of country information is available in the sidebar. Use the same procedure for constructing a country
DML script as for species
. Your spreadsheet should look similar to the graphic below.
When you try to run your script the first time, you will probably see some errors such as
1 2 3 |
|
The reason for these is that some country names contain an apostrophe. Oracle interprets this as the terminating single quote for a character string and so the rest of the text string cause the error. If you want to use an apostrophe inside a character string, Oracle requires that you repeat it. Two single quotes in a row are interpreted as one single quote which is part of the string. For example, to insert the name Cote d'Ivoire you would need the following insert statement:
1 2 |
|
Finding and replacing all of the internal apostrophes is difficult after you have constructed the insert statements,
but it is easy while you still only have the data in the spreadsheet. To correct the problem, go back a step and
construct the country
DML script again. This time however, use the standard Excel Replace... option to duplicate
any single quotes in the file:
Remove the TAB characters using a text editor, and your file should now run without errors.
When constructing your region
DML script, you can find the appropriate value of country_id
to use by selecting
it from the database using the country name as an alternative key. In this case, your spreadsheet would be similar
to that shown below. Remember to find and replace any single quote characters with two single quote characters
before you do anything else.
Notice that you will need to reorder the columns in the spreadsheet to construct this script. The syntax of the insert statements uses the subquery form rather than a values clause, but the procedure for preparing the script in Excel is exactly the same as before.
Using sequence values
This method relies on inserting a row for a country, and then immediately inserting all the related regions before going on the the next country. It takes a little more time because you need to handle each of the 272 countries in the file separately; however Excel shortcuts can still be used to make this relatively painless.
The first step is essentially the same as before: you need to create a script that inserts the data for the country
table but this time there are two differences. Firstly you will be inserting the region
statements in between the
rows in this script, and secondly you need to add a blank column before the one containing the beginning of your
insert statements. The reason for this will become clear below.
Once your country
script is finished, make a new copy of the original spreadsheet to prepare the region
statements.
Remember to duplicate any single quote characters.
This time, you only need to use the region
column in your insert statements, but you should keep the country
column for reference. Your spreadsheet should therefore look like this:
Notice that the insert statement uses a values clause, but also references the most recently generated value from the
country_seq
sequence. Notice also that the region_id
column is not required because it will be automatically
populated by a trigger.
Copy the SQL text to all other rows in the spreadsheet as before. Now you will need to copy the block of
statements corresponding to a particular country into the country
DML script. Because of the way that Excel
behaves when you insert a new group of rows, it is most efficient to start from the bottom of the list, so navigate
the the end of your region
data and find the set of regions belonging to Zimbabwe.
Select the last row by clicking the row header. Then hold down the SHIFT key and press the up arrow repeatedly until you have selected all of the rows related to Zimbabwe.
Now you should cut these rows ready to be pasted into the country
script. The quickest way to do this is to press
CTRL+X. When you do this, the values in the spreadsheet will be highlighted with a moving dashed outline, but they
do not disappear - this is usual Excel behaviour.
Navigate to the bottom of your country
script, click in the cell in column A immediately below the insert statement
for Zimbabwe and press CTRL+V. This will paste the region
rows into place as shown below.
Notice that the insert statements all now start in the same column - this is the reason for the extra blank column in the file. This column will be removed at the end of the process.
Notice also that the reference to country_seq.currval
corresponds to the value that has just been inserted
for Zimbabwe. Thus the relationship is maintained.
This operation needs to be repeated for every country in the list (but not for every region, which would take considerably longer...). Go back now to the regions data to see how the transfer of rows works for subsequent countries.
Because they were previously cut, the Zimbabwe regions have now disappeared and the last set now belong to Zambia. Highlight the rows as before and cut them using CTRL+X.
Switch back to the countries script, and this time, right-click on the header of the row containing the insert statement for Zimbabwe as show below. Choosing the Insert cut cells option places the rows for Zambia in the correct position.
Once this operation has been repeated for each country, the script will be ready for tidying up in a text editor.