Relating table using synthetic keys

The relationship between genus and species relies on the genus attribute which has a string value. It is therefore enough to construct a script using the appropriate values which are all present in the original spreadsheet. With country and region however it is a different story. A row in the region table must use the country_id value from the related row in country. The problem is that we don't know what that value is before we insert the country record because the value is generated by the sequence.

There are two potential solutions to this problem. Using a subquery is the easier method, but is only possible if your data has an alternative key (another attribute or group of attributes besides the primary key that uniquely identify a row). Without an alternative key, the solution is to interleave the related insert statements so that you can query the most recently generated value for the primary key in the parent table.

In both solutions, you still need to prepare the scripts using Excel as you did for genus and species. The notes below will omit the details steps for doing this on the assumption that you can go back to earlier pages if you need to.

Using a subquery

A spreadsheet of country information is available in the sidebar. Use the same procedure for constructing a country DML script as for species. Your spreadsheet should look similar to the graphic below.

When you try to run your script the first time, you will probably see some errors such as

    ORA-01756: quoted string not properly terminated

    ORA-00917: missing comma

The reason for these is that some country names contain an apostrophe. Oracle interprets this as the terminating single quote for a character string and so the rest of the text string cause the error. If you want to use an apostrophe inside a character string, Oracle requires that you repeat it. Two single quotes in a row are interpreted as one single quote which is part of the string. For example, to insert the name Cote d'Ivoire you would need the following insert statement:

    INSERT INTO country (country, continent)
    VALUES ('Cote d''Ivoire', 'Africa');

Finding and replacing all of the internal apostrophes is difficult after you have constructed the insert statements, but it is easy while you still only have the data in the spreadsheet. To correct the problem, go back a step and construct the country DML script again. This time however, use the standard Excel Replace... option to duplicate any single quotes in the file:

Remove the TAB characters using a text editor, and your file should now run without errors.

When constructing your region DML script, you can find the appropriate value of country_id to use by selecting it from the database using the country name as an alternative key. In this case, your spreadsheet would be similar to that shown below. Remember to find and replace any single quote characters with two single quote characters before you do anything else.

Notice that you will need to reorder the columns in the spreadsheet to construct this script. The syntax of the insert statements uses the subquery form rather than a values clause, but the procedure for preparing the script in Excel is exactly the same as before.

Using sequence values

This method relies on inserting a row for a country, and then immediately inserting all the related regions before going on the the next country. It takes a little more time because you need to handle each of the 272 countries in the file separately; however Excel shortcuts can still be used to make this relatively painless.

The first step is essentially the same as before: you need to create a script that inserts the data for the country table but this time there are two differences. Firstly you will be inserting the region statements in between the rows in this script, and secondly you need to add a blank column before the one containing the beginning of your insert statements. The reason for this will become clear below.

Once your country script is finished, make a new copy of the original spreadsheet to prepare the region statements. Remember to duplicate any single quote characters.

This time, you only need to use the region column in your insert statements, but you should keep the country column for reference. Your spreadsheet should therefore look like this:

Notice that the insert statement uses a values clause, but also references the most recently generated value from the country_seq sequence. Notice also that the region_id column is not required because it will be automatically populated by a trigger.

Copy the SQL text to all other rows in the spreadsheet as before. Now you will need to copy the block of statements corresponding to a particular country into the country DML script. Because of the way that Excel behaves when you insert a new group of rows, it is most efficient to start from the bottom of the list, so navigate the the end of your region data and find the set of regions belonging to Zimbabwe.

Select the last row by clicking the row header. Then hold down the SHIFT key and press the up arrow repeatedly until you have selected all of the rows related to Zimbabwe.

Now you should cut these rows ready to be pasted into the country script. The quickest way to do this is to press CTRL+X. When you do this, the values in the spreadsheet will be highlighted with a moving dashed outline, but they do not disappear - this is usual Excel behaviour.

Navigate to the bottom of your country script, click in the cell in column A immediately below the insert statement for Zimbabwe and press CTRL+V. This will paste the region rows into place as shown below.

Notice that the insert statements all now start in the same column - this is the reason for the extra blank column in the file. This column will be removed at the end of the process.

Notice also that the reference to country_seq.currval corresponds to the value that has just been inserted for Zimbabwe. Thus the relationship is maintained.

This operation needs to be repeated for every country in the list (but not for every region, which would take considerably longer...). Go back now to the regions data to see how the transfer of rows works for subsequent countries.

Because they were previously cut, the Zimbabwe regions have now disappeared and the last set now belong to Zambia. Highlight the rows as before and cut them using CTRL+X.

Switch back to the countries script, and this time, right-click on the header of the row containing the insert statement for Zimbabwe as show below. Choosing the Insert cut cells option places the rows for Zambia in the correct position.

Once this operation has been repeated for each country, the script will be ready for tidying up in a text editor.