Tuesday, September 25, 2012

Removing duplicate rows from a SQL server DB

I've been working on data imports lately and have run into this issue several times. If you data isn't clean when you import from a spreadsheet or csv or wherever, you can end up with duplicate rows in your tables. This isn't a really big deal if the rows have differing keys, but if ALL the data is identical, including the keys, then how do you delete the extra rows?

It turns out that the answer is pretty simple. Write your query to delete the rows but before you run it, set the rowcount to 1, this will prevent SQL server from performing your query on more than a single row. Don't forget to set rowcount back to 0 afterwards, or all your subsequent operations will only return 1 row which, at least for me, would be really frustrating :)

Here's a nice explanation of the process from Microsoft: http://support.microsoft.com/kb/139444.

No comments: