Data Replication 2
Results 1 to 5 of 5
Like Tree1Likes
  • 1 Post By stevenguyen

Thread: Distribution and Sort Key

  1. #1
    iamhasib is offline Junior Member
    Join Date
    Dec 2014
    Posts
    7
    Rep Power
    0

    Distribution and Sort Key

    We are loading data from SQL Server to Redshift.

    Would the transfer be faster if we create the target table on redshift first with distribution and sort key?

    So the way it's now is this -

    We're just taking the tables from source SQL DB as is and ingest on redshift. So the table is created on the fly.

    What we thought of doing is -

    create table on redshift, declare distribution sort key and then load...

    Objective is faster transfer and distribution.

  2. #2
    stevenguyen is offline Senior Member
    Join Date
    May 2014
    Posts
    297
    Rep Power
    6
    Quote Originally Posted by iamhasib View Post
    We are loading data from SQL Server to Redshift.

    Would the transfer be faster if we create the target table on redshift first with distribution and sort key?

    So the way it's now is this -

    We're just taking the tables from source SQL DB as is and ingest on redshift. So the table is created on the fly.

    What we thought of doing is -

    create table on redshift, declare distribution sort key and then load...

    Objective is faster transfer and distribution.
    ====

    The way to do it is by changing the provider syntax used.

    Below are instructions on how to set it up.

    Also, Note the following.

    a. With this solution, you can only set one column to be the sort key (you can only set one distribution key in Redshift anyway)
    b. With this solution , The primary key column will be used made both the distribution key and sort key. In case of several primary key columns, the first column will be used.

    Solution instructions:

    The below properties should be added to the connection string text box after checking the ‘Override connection string parameters’ check box.

    $info.query_syntax.create_table=CREATE TABLE ${QO}${TABLE_OWNER}${QC}.${QO}${TABLE_NAME}${QC} ( ${COLUMN_LIST} ) ${DISTRIBUTION_QUERY} ;$info.query_syntax.distribution_query= distkey (${COLUMN_LIST}) sortkey (${COLUMN_LIST}); $info.query_syntax.max_distribution_columns=1;

    Note that you can have distribution style to (all|even|key) and sortkey by replacing the following:

    $info.query_syntax.create_table=CREATE TABLE ${QO}${TABLE_OWNER}${QC}.${QO}${TABLE_NAME}${QC} ( ${COLUMN_LIST} ) ${DISTRIBUTION_QUERY} ;$info.query_syntax.distribution_query= diststyle all sortkey (${COLUMN_LIST}); $info.query_syntax.max_distribution_columns=1;

    More information about CREATE table from Amazon:
    http://docs.aws.amazon.com/redshift/..._examples.html

    Thanks,
    Steve Nguyen
    Last edited by stevenguyen; 05-29-2015 at 10:11 AM.
    iamhasib likes this.

  3. #3
    kad
    kad is offline Junior Member
    Join Date
    Apr 2015
    Posts
    8
    Rep Power
    0
    Quote Originally Posted by stevenguyen View Post
    ====

    The way to do it is by changing the provider syntax used.

    Below are instructions on how to set it up.

    Also, Note the following.

    a. With this solution, you can only set one column to be the sort key (you can only set one distribution key in Redshift anyway)
    b. With this solution , The primary key column will be used made both the distribution key and sort key. In case of several primary key columns, the first column will be used.

    Solution instructions:

    The below properties should be added to the connection string text box after checking the ‘Override connection string parameters’ check box.

    $info.query_syntax.create_table=CREATE TABLE ${QO}${TABLE_OWNER}${QC}.${QO}${TABLE_NAME}${QC} ( ${COLUMN_LIST} ) ${DISTRIBUTION_QUERY} ;$info.query_syntax.distribution_query= distkey (${COLUMN_LIST}) sortkey (${COLUMN_LIST}); $info.query_syntax.max_distribution_columns=1;

    Thanks,
    Steve Nguyen
    Is there a way to set the sort key and distribution key on more specifically, instead of only on the primary key?

  4. #4
    stevenguyen is offline Senior Member
    Join Date
    May 2014
    Posts
    297
    Rep Power
    6
    Quote Originally Posted by kad View Post
    Is there a way to set the sort key and distribution key on more specifically, instead of only on the primary key?
    Hello Kad,

    Yes, you can have more specifically key or sort to anything not just primary key.

    What you need to do is manually create the target table in Redshift to your specific sort key and distribution key.

    Then within Replicate do not put the extra qualifier for creating the distribution and sort key.

    For the Task design / setting make sure that you set the Full Load to "TRUNCATE before loading" instead of the default of "DROP and CREATE table.

  5. #5
    kad
    kad is offline Junior Member
    Join Date
    Apr 2015
    Posts
    8
    Rep Power
    0
    Quote Originally Posted by stevenguyen View Post
    Hello Kad,

    Yes, you can have more specifically key or sort to anything not just primary key.

    What you need to do is manually create the target table in Redshift to your specific sort key and distribution key.

    Then within Replicate do not put the extra qualifier for creating the distribution and sort key.

    For the Task design / setting make sure that you set the Full Load to "TRUNCATE before loading" instead of the default of "DROP and CREATE table.
    Great, I tried that out and it worked perfectly. Thanks!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •