Parallel Servers in Oracle Database

      Many think that processing a activity or work parallely will be more effective then performing it sequentially(serially),Even we do apply for most of the cases.Since, in parallel processing , the work is divided among the multiple resources and each resource will take care of their allocated work and finally  results from all of them are merged as one final result. Oracle has also introduced Parallel processing technique which is called as Parallel servers or Parallel query servers, where work is divided among multiples resources , resources here mean threads which are capable of doing working dependently. But when it comes to implementing parallel processing techniques in machines , it is not as easy as what we can do as human, Since the machines need to be guided so that each parallel thread should coordinate with other whenever required.

  Lets discuss in depth about Parallel processing technique i.e Parallel servers implemented in Oracle Database. Before going in detail, we will discuss some of the new terms used in this concept

Parallel Servers     -- Parallel servers are threads which are capable of doing work allocated to them ,     these are also called as slave processes as they report their final result   to the  master processes, The no of parallel servers allocated by oracle optimizer depends on the Degree of parallelism allocated/mention for the activity.

Parallel Server Set --Collection  of parallel servers is called parallel server set, we can have minimum 1  and maximum 2 or more parallel server sets based on  no of Data flow operations  under DFO tree , we will discuss this DFO concept later in this post.

Parallel Groups     --Collection of Parallel server sets is called Parallel groups, A parallel group will      dedicatedly take care of processing a DFO or DFO tree. It means if we have multiple parallel DFO trees, then optimizer allocateds more than on parallel groups.                             

Data Flow Operation --A data flow operation or DFO shows the pictorial representation of how the a    work/job/activity will b processed by the optimizer.Parallel groups,parallel server sets and parallel servers will be part of each DFO, A DFO tree in the plan is carried out by at most 2 PX server sets. If there is onlyone DFO under the DFO tree , there will will be only 1 PX server sets will be used , we limit the number of PX server sets to 2 for DFO tree. This is why most statements use 2 PX server sets which mean they will use DOP*2 number of PX servers. The no of px coordinator is equal to no of DFO trees and no of DFO'S under DFO tree is equal to no of parallel sets. 

     Each parallelizer i.e DFO tree can't use more than 2 parallel sets. In reality, one DFO tree can have mximum 2 parallel sets and 2 DFO trees can have 3 or 4 parallel sets.3 parallel sets is possible when one of the DFO tree has one DFO under it.                       


-->Px Coordinator is like master process which guides all the server processes in processing their        allocated work. A Px coordinator is allocated for each DFO in oracle

-->Hash Join or Any join mentioned in the execution plan mean that particular method is followed by    optimizer in joining the two tables, we know that we oracle uses hash join,merge join ,nested loop        etc for joining tables.

-->Px send and Px receive mean that data has been sent by one parallel set and it has been received by    another parallel set. The word hash or broadcast after px send mean the distribution methods used          while transferring to next parallel set, which we will discuss later in this past.

-->px block iterator means that optimizer is scanning the table data in granules which is small portion  of data, px partition range all indicates that optimizer is scanning partitions inspite of granules . The       optimizer goes for data access using block iterator or partition range all based on below object                 access methods.

   

Full table Scan                                      -- Block Iterator

Table access row ID                             -- Partition range all

Index Unique Scan                               -- Partition range all

Index range scan/skip scan/full scan -- Partition range all

Fast Full index scan                              -- Block iterator

Bitmap Indexes                                     -- Block iterator


Execution plan for parallel queries are different in most of the ways as it has new terms used in it. Let us discuss about the explain plan for a parallel server.




In the above execution plan, we can two DFO under main DFO tree , this can confirmed as we can see px send two times under px coordinator i.e no of px send means no of DFO' scanning and also as we have mentioned DOP as 2,so two parallel sets will be allocated and each has two parallel servers . Under TQ column i.e table queue we have values like Q1,01 Q1,02  Q1,03 and Q1,04 which means 01,02,03 and 04 are the parallel servers under query coordinator Q1.

The values PCWP,PCWC,P->S,P->P means parallel combined with parent,parallel combined with child,parallel to serial,parallel to parallel.


PCWP --The same parallel servers which worked on the current sub activity will take care of parent        activity/task.

PCWC --The same parallel servers which worked on the current sub activity will take care of child        activity/task.

P->S    --The operation will be shifted from parallel to serial, this happens when the final output is        being  sent to query coordinator from parallel groups.

P->P    --The operation will be shifted from parallel to parallel, this happens mostly when producer        parallel servers sends data to consumer parallel servers.

S->P    --The operation will be shifted from serial to parallel.

SCWP --The same Serial process which worked on the current sub activity will take care of parent    activity/task.

SCWC --The same Serial process which worked on the current sub activity will take care of child          activity/task.


The pq distribution means how the data is being distributed/sent to the parent parallel server process. Below are the different values under this column and what does they mean.

Broadcast          --Each producer sends all the rows to all the all the parallel sets of the consumer            process.

Broadcast Local--Each producer sends all the rows to all the parallel sets on the local/same node. This    will reduce cross instance traffic in RAC machines.

Hash                  --Each producer distributes the data to the different parallel sets based on the output of hash function which will be applied on the key column used for joining the tables.

Hybrid or hash  --The data will be distributed using hash or broadcast distribution method ,but it be    known after run time.

One Slave         --The data will be distributed to only one slave process.

Partition key     --The table partition key will be used that which consumer parallel server will receive    which database, this happens mostly in partition wise joins.

Range               --The producer sends range of data to one consumer and different range of data to other consumers.

Round-Robin    --The data will be distributed in round-robin between the different consumers

Qc order           --The data will be sent to the query coordinator in certain order to facilitate sorting.

Qc random        --The data will be sent to the query coordinator in random order.


      Below is the pictorial representation of commonly used distribution methods i.e Broadcast and Hash distribution.




Even though we think that the processing a query/transaction parallely give us better performance but there are few execeptions,where we may not get expected peformance there lies the disadvantage
parallel servers if not used properly                        

--- This happens mostly when the execution plan is not proper ,but still we go with parallelism and               expect the better performance. When the bad plan is picked, then the performance will not improved      even if we go with parallelism, that is why its more important tune the query properly before                   applying parallel techniques. This recommendation applies mostly to the parallel techniques in                Oracle Database ,but might be same is other environments also.

--- Also as part of DFO, the next operation needs to started before the current operation has not                 finished  processing their work completely, then we will end up lack of availability of parallel                 servers for next operation as all the parallel servers are occupied by the current producer and                 consumers. In this situation, the oracle picks the parallel set of one among producer or consumer for      the next operation this in returm forces the parallel set picked for next operation to park their data         which they have processed till now in temp. Once the parallel sets are freed up by the next operation      , they will come back and read the data from temp. This unnecessary reading and writing data from         temp increases the execution time of the transaction. This can be avoided till some extent by asking      optimizer to pick the best distribution method.

--- In most of the cases, due to non up-to-date statistics optimizer ends up in wrong calculation of               cardinality during execution , which in turn ends up in picking wrong distribution method. It is             always important to have up-to-date statistics most importantly in data ware house databases.

--- The low cardinality of driving table will ask optimizer to go Broadcast distribution and high                 cardinality of driving table will lead to hash distribution.Due to not up-to-date statistics,if the                 optimizer picks hash distribution by estimating high cardinality for the driving table,but actually it         has low cardinality. In some cases,this will lead to distributing most of the data to only parallel             server set since the result of hash function points to the only parallel server set. As most of the work     is done by one parallel set and other remaining idle most of the time, the execution time will be             mostly equal to the time take to process the data by the busy parallel set.

    E.g When the data is distributed equally between parallel sets.

    parallel set 1---takes 5 seconds to process data

    parallel set 2---takes 5 seconds to process data

    Final execution time will be 5 seconds as they are processing data parallely

    When the data is non distributed equally between parallel sets.

    parallel set 1---takes 9 seconds to process data

    parallel set 2---takes 1 seconds to process data

    Final execution time will be 9 seconds .




Comments

Post a Comment

Popular posts from this blog

Oracle GoldenGate Microservices Architecture - 1

All About Oracle Database Block

ORACLE CLOUD INFRASTRUCTURE(OCI) FOR ABSOLUTE BEGINNER'S