How do I prevent the loading of duplicate rows in to an Oracle table? -
I have some large tables (millions of rows). I constantly get new rows of files to add to those tables - up to 50 million rows per day. Almost the lines I get are 0.1% of the lines I have already loaded (or are duplicates within the files) I would like to stop those lines from loading in the table.
I am currently using SQL loader to perform enough to deal with my large data volume. If I take a clear step and add a unique index to the column that is a duplicate of a line, the SQL loader will start to fail the whole file in which there is a duplicate row - while I only stop it I want to duplicate the line itself being loaded.
I know that in SQL Server and Sybase I can create a unique index with the 'undiscovered duplicate' property and if I use the BPP duplicate rows index) simply will not be loaded.
Is there any way to achieve the same effect in Oracle?
I do not want to delete duplicate rows once it loads - it is important for me to never load in the first place.
What does "duplicate" mean? If you have a column that defines a unique line then you should set up a unique barrier against that column. One usually creates a unique index on this column, which will automatically set the barrier.
Edit: Yes, as noted below, you should capture invalid rows for the "bad" file for the SQL * loader. But I think the establishment of a unique index is probably a good idea from the point of view of data-integrity.
Comments
Post a Comment