Load Ingest Data
Results 1 to 3 of 3

Thread: Bug with the calculation of the __Changed columns in the CDC Source Component in SSIS

  1. #1
    ctownsley is offline Junior Member
    Join Date
    Mar 2013
    Posts
    5
    Rep Power
    0

    Exclamation Bug with the calculation of the __Changed columns in the CDC Source Component in SSIS

    Hello,

    While setting up an SSIS package for processing the CDC changes from the CDC instance database created by the CDC for Oracle service, I discovered the following bug. If the CDC Source is configured with the “Net with Update Mask” CDC Processing mode it is supposed to add the __${column}__Changed columns to the SSIS data flow and populate them based on the bit mask in the __$update_mask column from the CDC table. What we are seeing is that depending on the number of captured columns in the CDC captured table the component is either incorrectly populating these __Changed columns or causing an error to be thrown. It appears that the component is incorrectly parsing the bit mask in the $update_mask column to set the values. Is anybody else seeing this or found fix for this?

    Originally this was occurring on my original table from Oracle which has 82 columns but after doing some testing I am able to recreate this issue with even a simple standard SQL Server table that is CDC enabled (not using the CDC for Oracle service). The following are the steps I used to re-create the issue. The scripts I used are also attached to this post.
    1. In SQL Server, in a CDC enabled database, run the DropCreateAndSeedTable script to create the table, enable it for CDC and seed with an initial row.
    2. Create an SSIS package for doing an initial load. For purposes of testing this bug this package simply needs to have only two CDC Control Tasks on it, one to Mark Initial Load Start and the other to Mark Initial Load End.
    3. Create an SSIS package for doing incremental load. The control flow for the package should simply have two CDC control tasks (to get processing range and mark range processed) with a data flow in between them. In the data flow have a CDC source configured to pull changes for the dbo.TestChangeDataCapture using the "Net with Update mask". After the CDC Source add a CDC Splitter. Add a Data Viewer to the data flow path between the CDC Source and CDC Splitter so you can see the records when the package is ran. You can optionally add other components for the Insert, Update and Delete outputs from the CDC Splitter, however for the purposes of demonstrating this bug they are not necessary and do not seem to impact the results.
    4. Run the initial load package.
    5. Run the UpdateRecord script to update the record in the table, only updating the TestColumn1 and TestColumn2 columns. May need to uncomment/add calls to sys.fn_cdc_is_bit_set to get _Changed flag for TestColumn{n} from the update mask.
    6. Note that the query from the CDC _CT instance table shows entries for pre- and post- UPDATE with only the values for the TestColumn1 and TestColumn2 columns changing and calls to the sys.fn_cdc_is_bit_set function is returning TRUE for only the TestColumn1 and TestColumn2 columns.
    7. Run the incremental load package (will need to run it twice since the first run handles reprocessing from the initial load and will not pick-up the updated record).
    8. Note the results in the data viewer show the following results based on how many TestColumn{n} columns are currently in the table.
    1-6 columns - Everything seems to work fine. Data view shows correct values for all table and CDC columns (update mask, LSN, operation) and only the TestColumn1 and TestColumn2 _Changed columns are TRUE. The value of the __$update_mask is “0x06”

    7 columns - Data viewer shows NULLs for all table and CDC columns, the _Changed columns for TestColumn1 and TestColumn2 are TRUE and for the other _Changed columns are FALSE except for TestColumn7 which is NULL. The value of the __$update_mask is “0x06” The following errors are thrown and the package fails:
    [CDC Source [2]] Error: "Failed processing column __$TestColumn7__Changed. Reason - Index was outside the bounds of the array..".
    [CDC Source [2]] Error: "Problems when trying to get changed records from dbo_TestChangeDataCapture. Reason - Pipeline component has returned HRESULT error code 0xC0047038 from a method call..".
    [CDC Splitter [74]] Error: Microsoft.SqlServer.Dts.Pipeline.ColumnIsNullExcep tion: The column has a null value.
    at Microsoft.SqlServer.Dts.Pipeline.PipelineBuffer.Ch eckStatusAndNull(Int32 columnIndex)
    at Microsoft.SqlServer.Dts.Pipeline.PipelineBuffer.Ge tInt32(Int32 columnIndex)
    at Attunity.SqlServer.CDCSplit.CdcSplitterComponent.P rocessInput(Int32 inputId, PipelineBuffer buffer)
    at Microsoft.SqlServer.Dts.Pipeline.ManagedComponentH ost.HostProcessInput(IDTSManagedComponentWrapper10 0 wrapper, Int32 inputID, IDTSBuffer100 pDTSBuffer, IntPtr bufferWirePacket)
    8 columns - Data viewer shows correct values for all table and CDC columns, except ALL of the __$<column>__Changed columns are FALSE, inluding those for the TestColumn1 and TestColumn2 which should be TRUE. The value of the __$update_mask is “0x0006”.

    9 columns - Data viewer shows correct values for all table and CDC columns, the __$<column>__Changed columns are all FALSE (including the columns that actually changed), but the TestColumn9 _Changed column is TRUE and this column was not changed in the UPDATE statement. The value of the __$update_mask is “0x0006”.

    10-14 columns - The __$<column>__Changed columns are again all FALSE except for the TestColumn9 and TestColumn10 which are TRUE. The value of the __$update_mask is “0x0006”.
    15 columns - Data viewer shows NULLs for all table and CDC columns, FALSE for all of the _Changed columns except for TestColumn9 and TestColumn10 which are TRUE and TestColumn15 which is NULL. The value of the __$update_mask is “0x0006”. The following errors are thrown and package fails:
    [CDC Source [2]] Error: "Failed processing column __$TestColumn15__Changed. Reason - Index was outside the bounds of the array..".
    [CDC Source [2]] Error: "Problems when trying to get changed records from dbo_TestChangeDataCapture. Reason - Pipeline component has returned HRESULT error code 0xC0047038 from a method call..".
    [CDC Splitter [122]] Error: Microsoft.SqlServer.Dts.Pipeline.ColumnIsNullExcep tion: The column has a null value.
    at Microsoft.SqlServer.Dts.Pipeline.PipelineBuffer.Ch eckStatusAndNull(Int32 columnIndex)
    at Microsoft.SqlServer.Dts.Pipeline.PipelineBuffer.Ge tInt32(Int32 columnIndex)
    at Attunity.SqlServer.CDCSplit.CdcSplitterComponent.P rocessInput(Int32 inputId, PipelineBuffer buffer)
    at Microsoft.SqlServer.Dts.Pipeline.ManagedComponentH ost.HostProcessInput(IDTSManagedComponentWrapper10 0 wrapper, Int32 inputID, IDTSBuffer100 pDTSBuffer, IntPtr bufferWirePacket)
    16 columns - Data viewer shows correct values for all table and CDC columns, except ALL of the __$<column>__Changed columns are FALSE, including those for the TestColumn1 and TestColumn2 which should be TRUE. The value of the __$update_mask is “0x0006”.
    9. Modify the DropCreateAndSeedTable script to (re-)create table, adding in a new TestColumn{n} column at the end of the table.
    10. Run script to (re-)create table, enable it for CDC and seed with initial row.
    11. Modify incremental load package, updating the CDC Source component of the Data Flow task to refresh the columns from the table (using the Advanced Editor). May need to also update downstream references on the data flow path between this source component and the CDC Splitter component if removing any of the TestColumn{n} columns.
    Update the Data Viewer on the data flow path between the CDC Source and the CDC Splitter to add in any new columns.
    12. Repeat steps 4-8 to test.

    Note that there seems to be a pattern between the number of bits stored in the __$update_mask column and what type of invalid results/error we get.
    Attached Files Attached Files

  2. #2
    Gadi.Farhat is offline Development Manager
    Join Date
    Mar 2009
    Posts
    255
    Rep Power
    11
    Hi,

    There were two issues with the in the CDC Source component when it is configured with the "Net with Update Mask: CDC Processing mode:

    1. If the source table columns number is divisible by 8 (i.e. 8, 16, 24…) an error is reported in runtime.
    2. The additional "_Changed" output columns values are not correct.


    These issues were fixed in SQL Server 2012 SP1.


    Regards,

    Gadi
    To Find Out more About Attunity Technology:
    Attunity
    or:
    Contact Us

  3. #3
    ctownsley is offline Junior Member
    Join Date
    Mar 2013
    Posts
    5
    Rep Power
    0
    Thanks Gadi, that seems to have fixed it. I had thought I already had SP1 installed, but apparently not.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •