Get Data Moving 1
Results 1 to 2 of 2

Thread: fn_cdc_get_net_changes exponentially slow with high volume

  1. #1
    sjt003 is offline Junior Member
    Join Date
    May 2017
    Posts
    2
    Rep Power
    0

    fn_cdc_get_net_changes exponentially slow with high volume

    Hi,

    I'm having a problem that one of the tables I capture via fn_cdc_get_net_changes is occasionally extremely slow when the volume is high. I understand that higher volume will lead to slower performance, but it seems to be exponential.
    For example, 100 rows may process in seconds, while 1000 may take 20-30 minutes. This makes me think that it is a query plan thing or something.

    I'm using an SSIS data flow with the CDC Source Component, "Net with merge" mode. It's always the same table that is an issue. I don't have the exact statistics, but when I was looking at the reads one time that it was taking a long time I think it was in the tens of millions. The cleanup job runs at night and purges all but the last few hours of data. The table typically sees a few hundred thousand updates in a day, though they are generally clumped together at a few different times.

    I didn't want to mess with the function of course or turn on query store on the CDC database (afraid of any additional overhead).

    The table is pretty narrow, with a primary key that consists of 6 columns (vendor design) and a few numeric values. When there is a large volume of updates, it's usually the same rows being updated dozens or a hundred times in seconds. For example, maybe only 100 rows are being updated, but each row is getting 50 updates, so there are 5,000 rows in the CDC table yet I expect the Net mode to give me 100 rows.

    Without providing the table definition just yet or the query plan that I've seen, is there any general recommendations for how to deal with performance in this mode? Should I consider using All instead of Net and making my own process to merge the latest values? Would that just make so much more work for SSIS since a lot of rows will be going through?

    Thanks,
    Scott

  2. #2
    sjt003 is offline Junior Member
    Join Date
    May 2017
    Posts
    2
    Rep Power
    0

    Bad execution plan / statistics

    It looks like it was a statistics problem. I would have a lot of rows in the change table, then several thousand would come in suddenly with new lsn's. But it wasn't enough to trigger new statistics on the table. My select would then be on the range of those new lsn's, but the query plan would have a lot of seeks and nested loops because it didn't think these lsn's would exist. I added a step to update stats on the problem table before every ETL run, which only takes a couple of seconds but eliminates the bad execution plans. I wish SQL Server did a better job of automatically recognizing that I'm frequently selecting with filters on a column and more aggressively update statistics when new values in that column are created.

    Scott

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •