Options

SSIS package is creating duplicate values

TheFORCETheFORCE Member Posts: 2,297 ■■■■■■■■□□
I'm working on a personal project and just created my first SSIS package to import flat files into a SQL database but I'm having some difficulties with the results. Here's the issue I'm facing

1. I have a DB1 with empty table1 with the column names defined.
2. I have a delimited csv file with 7 values.
3. I have a SSIS package that imports the csv file above
4. I query the table and i see the values have been imported.

So far everything works well.
5. Now I update file1 and add 2 extra records.
6. File1 now has 7 existing records + 2 new records.
7. I run the SSIS package again, the import works.
8. I query the table and see now that the table has 16 values of which 14 values are the same. This is incorrect.

That means that I'm probably missing a step where i need to check if the value in the table1 already exists and if it does do not import it, instead add only those values that don't exist.
The correct result that i wanted is 7 existing values + 2 new ones, so table1 should now have 9 values.

I've looked around and found that i need to use a transformation lookup, but have not gotten it to work yet or maybe my tasks are not setup correctly.

Can anyone provide any ideas on how to solve this issue or point me to a link that explains how i can fix this.

Comments

  • Options
    TheFORCETheFORCE Member Posts: 2,297 ■■■■■■■■□□
    Man it is amazing how the brain works and what a good sleep does. During the night I felt like I was solving this issue in my sleep lol, I woke up this morning and did just some small modifications on where i was redirecting the output of duplicate values and where i was redirecting the values of non-duplicate values using the Lookup transformation and bam bam bam, the desired results were provided, no failures. So onto the next step now.

  • Options
    DatabaseHeadDatabaseHead Member Posts: 2,753 ■■■■■■■■■■
    I usually date stamp my loads and then grab max date to keep history. Either that or use getdate between effective and end date if the records are time stamped.
    i was redirecting the output of duplicate values and where i was redirecting the values of non-duplicate values using the Lookup transformation

    I'm not sure I follow this. Sounds like you change the setting in the transformation task (Lookup) to omit records that matched across all 7 columns?

    Just curious.....

    Either way glad you found success.
  • Options
    TheFORCETheFORCE Member Posts: 2,297 ■■■■■■■■□□

    I'm not sure I follow this. Sounds like you change the setting in the transformation task (Lookup) to omit records that matched across all 7 columns?

    Just curious.....

    Either way glad you found success.

    Yep that was the issue. I was directing all records to the table even if they existed when I should have been directing only records that did not exist.

    I need to complete 2 more tasks now, i have the insert down, now i need update and delete.

    From your experience, what should be the next easier step, is update easier to implement or is delete easier to implement? So that in the end i can instert, update or delete a record based on the data in the flat file.
  • Options
    DatabaseHeadDatabaseHead Member Posts: 2,753 ■■■■■■■■■■
    Just curious why you would be deleting. If your insert is only pulling in records that don't exist all 7 columns then your updates would be based off of records that already exist.

    How will the records be updated, what criteria will need to be met. The reason I ask, you could insert another column called Status and set a constraint to automatically flag all records as A. If an "update" record came in you could change the status of the original record to D or I (Inactive) and the new record would be flagged Active. In reports, queries etc you could just omit I or D records. I wouldn't think you would need to worry about deletes. If the table was performing slowly you could just run a SQL Script task to look for records with D or I and delete them. Obviously make that task at the end of the process flow. (Sorry I should of called it what it really is, default constraint).

    In short I would update first then worry about deletes last. But honestly if I don't know the objective it's hard to say.
Sign In or Register to comment.