Load Ingest Data
Results 1 to 2 of 2

Thread: When high Latency within Attunity Replicate occur on Redshift with commit time.

  1. #1
    stevenguyen is offline Senior Member
    Join Date
    May 2014
    Posts
    234
    Rep Power
    4

    When high Latency within Attunity Replicate occur on Redshift with commit time.

    Ans:
    Changing Redshift Cluster type from Dense compute node to Dense Storage Node will help to avoid higher commit rate.
    ===

    Example:
    We had 2 Dense Compute Node, each with 32 slices (64 slices in all) by switching smaller node (16 slices) we cut down the commit time. In Redshift

    - there is only 1 commit queue per cluster
    - every commit (update/insert/COPY) will need to queue up one after another
    - whenever a commit occurs, it need to synchronize the operation across all slices.

    By running following query...

    select IQ.*, ((IQ.wlm_queue_time::float/IQ.wlm_start_commit_time)*100)::DECIMAL(5,2) as pct_wlm_queue_time, ((IQ.exec_only_time::float/IQ.wlm_start_commit_time)*100)::DECIMAL(5,2) as pct_exec_only_time, ((IQ.commit_queue_time::float/IQ.wlm_start_commit_time)*100)::DECIMAL(5,2) pct_commit_queue_time, ((IQ.commit_time::float/IQ.wlm_start_commit_time)*100)::DECIMAL(5,2) pct_commit_time from ( select trunc(b.starttime) as day, d.service_class, c.node, count(distinct c.xid) as count_commit_xid, SUM(datediff('microsec', d.service_class_start_time, c.endtime)) as wlm_start_commit_time, SUM(datediff('microsec', d.queue_start_time, d.queue_end_time )) as wlm_queue_time, SUM(datediff('microsec', b.starttime, b.endtime)) as exec_only_time, SUM(datediff('microsec', c.startwork, c.endtime)) commit_time, SUM(datediff('microsec', DECODE(c.startqueue,'2000-01-01 00:00:00',c.startwork,c.startqueue), c.startwork)) commit_queue_time from stl_query b , stl_commit_stats c, stl_wlm_query d where b.xid = c.xid and b.query = d.query and c.xid > 0 and d.service_class > 4 group by trunc(b.starttime), d.service_class,c.node order by trunc(b.starttime), d.service_class,c.node ) IQ;

    The query will produce a result with the following columns:

    day | service_class | node | count_commit_xid | wlm_start_commit_time | wlm_queue_time | exec_only_time | commit_time | commit_queue_time | pct_wlm_queue_time | pct_exec_only_time | pct_commit_queue_time | pct_commit_time

    The columns to pay attention on are:

    - day, date
    - count_commit_xid, the amount of commit transactions for the 'day'
    - node, we want to see node with a value of -1 which is the leader node, and this is where we will see the commit queue time
    - pct_wlm_queue_time, percentage of time all the transactions spent waiting for an available slot (WLM Query Queue)
    - pct_exec_only_time, percentage of time the actual statement spend in execution state
    - pct_commit_queue_time, percentage of time the statement spend waiting in the 'commit queue'
    - pct_commit_time, percentage of time that is spent actually committing

    From the result, we care when pct_commit_queue_time and pct_wlm_queue_time is a large percentage, it means the query spent more time waiting than actually doing stuff.

    ======
    Ans:
    Also it is best to setup Replicate Task as follow:

    1. Please disable all control tables (Replication status, Replication history or suspended tables control tables) that you might have enabled on all of your replicate tasks
    Those tables do one change each time, and do by default every 5 seconds (status table) or 5 minutes (history table) , so can produce many commits)

    2. Regarding the items that AWS support suggested:
    2a. Item (1) your CSV file size should already be configured to be 1 GB size (it is the default). If not, please do set it up in Redshift target database settings->Advanced tab->max file size (MB) and set it to 1024. Please do this for all your redshift target databases. Regarding the number of files loaded in a transaction this currently cannot be controlled from Replicate.

    2b. if the above steps didn't help then.
    Item (2) - Incorporate as many commands as you can into a single transaction. - You can achieve that by doing the following: stop task,
    - In task settings->change processing settings, set the following

    "longer than" choose a higher number , say from 1 change to 30 seconds.
    "but less than"- choose a higher number , say from 30 change to 120 seconds.
    "force apply a batch when processing memory exceeds" to a higher number, say from 500 MB to 1000 MB.
    Uncheck the "limit the of changes applied per change processing statement to" , if it is checked.

  2. #2
    stevenguyen is offline Senior Member
    Join Date
    May 2014
    Posts
    234
    Rep Power
    4
    Just for clarification:

    1. If your task is configured to use control tables:

    Please disable all control tables (Replication status, Replication history or suspended tables control tables)

    Those tables do one change each time, and do by default every 5 seconds (status table) or 5 minutes (history table) , so can produce many commits)

    2. Make sure, that your Redshift Endpoint is setup for Max file size:

    If not, please do set it up in Redshift target database settings->Advanced tab->max file size (MB) and set it to 1024. Please do this for all your Redshift target databases.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •