Thursday, March 29, 2012

Distributed partitioned views

I'm considering distributing a large table which we
now have, for performance reasons.
We've been doing a join on this large table (11 million
rows), and I don't know whether it would help much to
partition the table across four machines.
When you perform a join on a distributed partitioned view,
where does the "work" take place. ie. If the table just
gets reconstructed at the site of the view, and then the
join operation takes place, I might actually *suffer* in
performance because of the network overhead.
Could someone advise me on this? I may be wrong about how
it works, and I'd like to understand it before undertaking
the task of distributing it.
Thanks in advance,
AndrewAndrew,
you will gain the performance enhancement as you imagined -- the join
condition will direct the work to the member table via the view. You can
verify this with the execution plan.
The key is, that your distribution will do this for all your important
joins. You may have different join conditions. If the partitioning column
is not always in the joining key, you will have to engage multiple member
tables. If this happens not two often, it might be acceptable. So here you
have to make the judgment call.
hth
Quentin
"Andrew" <a@.b.com> wrote in message
news:153501c38c0d$783e7b10$a101280a@.phx.gbl...
> I'm considering distributing a large table which we
> now have, for performance reasons.
> We've been doing a join on this large table (11 million
> rows), and I don't know whether it would help much to
> partition the table across four machines.
> When you perform a join on a distributed partitioned view,
> where does the "work" take place. ie. If the table just
> gets reconstructed at the site of the view, and then the
> join operation takes place, I might actually *suffer* in
> performance because of the network overhead.
> Could someone advise me on this? I may be wrong about how
> it works, and I'd like to understand it before undertaking
> the task of distributing it.
> Thanks in advance,
> Andrew
>|||Thanks Quentin,
I'm glad to hear that the work itself gets distributed.
It seems from your answer that the partitioning column
should be the column on which the join is performed for
optimal performance... did I read that right?
Also, I understand you want to keep as much "closely
related" data together as possible. Unfortunately, there
are several tables on which my large table will be joined,
and I'd like to keep them on the host where I create the
distributed view.
Is this going to be a problem? Or is it advisable to
partition the smaller tables as well?
Thanks in advance,
Andrew
>--Original Message--
>Andrew,
>you will gain the performance enhancement as you
imagined -- the join
>condition will direct the work to the member table via
the view. You can
>verify this with the execution plan.
>The key is, that your distribution will do this for all
your important
>joins. You may have different join conditions. If the
partitioning column
>is not always in the joining key, you will have to
engage multiple member
>tables. If this happens not two often, it might be
acceptable. So here you
>have to make the judgment call.
>hth
>Quentin
>
>"Andrew" <a@.b.com> wrote in message
>news:153501c38c0d$783e7b10$a101280a@.phx.gbl...
>> I'm considering distributing a large table which we
>> now have, for performance reasons.
>> We've been doing a join on this large table (11 million
>> rows), and I don't know whether it would help much to
>> partition the table across four machines.
>> When you perform a join on a distributed partitioned
view,
>> where does the "work" take place. ie. If the table
just
>> gets reconstructed at the site of the view, and then
the
>> join operation takes place, I might actually *suffer*
in
>> performance because of the network overhead.
>> Could someone advise me on this? I may be wrong about
how
>> it works, and I'd like to understand it before
undertaking
>> the task of distributing it.
>> Thanks in advance,
>> Andrew
>
>.
>|||Andrew,
> It seems from your answer that the partitioning column
> should be the column on which the join is performed for
> optimal performance... did I read that right?
You are right. Think of your table you want to partition by lastname, where
a-m go to server1 and the rest goes to server2. If you join or search by
firstname, what will sql server do? it of course will have to go into both
server and search the data for every entry of firstname. Had you
partitioned by firstname, the query will go to server1 finding all records
with firstnames with value a-m, and finding all the others in server2.
> Also, I understand you want to keep as much "closely
> related" data together as possible. Unfortunately, there
> are several tables on which my large table will be joined,
> and I'd like to keep them on the host where I create the
> distributed view.
>
I don't see any problem. The way you want use distributed partitioned view
to improve performance is to effectively retrieve and write -- reducing
physical read and load index/data into memory with reduced paging. For
small tables, this does not present a problem and you can keep them in a
convenient db.
> >--Original Message--
> >Andrew,
> >
> >you will gain the performance enhancement as you
> imagined -- the join
> >condition will direct the work to the member table via
> the view. You can
> >verify this with the execution plan.
> >
> >The key is, that your distribution will do this for all
> your important
> >joins. You may have different join conditions. If the
> partitioning column
> >is not always in the joining key, you will have to
> engage multiple member
> >tables. If this happens not two often, it might be
> acceptable. So here you
> >have to make the judgment call.
> >
> >hth
> >
> >Quentin
> >
> >
> >"Andrew" <a@.b.com> wrote in message
> >news:153501c38c0d$783e7b10$a101280a@.phx.gbl...
> >> I'm considering distributing a large table which we
> >> now have, for performance reasons.
> >>
> >> We've been doing a join on this large table (11 million
> >> rows), and I don't know whether it would help much to
> >> partition the table across four machines.
> >>
> >> When you perform a join on a distributed partitioned
> view,
> >> where does the "work" take place. ie. If the table
> just
> >> gets reconstructed at the site of the view, and then
> the
> >> join operation takes place, I might actually *suffer*
> in
> >> performance because of the network overhead.
> >>
> >> Could someone advise me on this? I may be wrong about
> how
> >> it works, and I'd like to understand it before
> undertaking
> >> the task of distributing it.
> >>
> >> Thanks in advance,
> >>
> >> Andrew
> >>
> >
> >
> >.
> >|||Quentin,
Thanks, you've been a great help. I'll get to designing
this thing.
All the best,
Andrew
>--Original Message--
>Andrew,
>> It seems from your answer that the partitioning column
>> should be the column on which the join is performed for
>> optimal performance... did I read that right?
>You are right. Think of your table you want to
partition by lastname, where
>a-m go to server1 and the rest goes to server2. If you
join or search by
>firstname, what will sql server do? it of course will
have to go into both
>server and search the data for every entry of
firstname. Had you
>partitioned by firstname, the query will go to server1
finding all records
>with firstnames with value a-m, and finding all the
others in server2.
>> Also, I understand you want to keep as much "closely
>> related" data together as possible. Unfortunately,
there
>> are several tables on which my large table will be
joined,
>> and I'd like to keep them on the host where I create
the
>> distributed view.
>I don't see any problem. The way you want use
distributed partitioned view
>to improve performance is to effectively retrieve and
write -- reducing
>physical read and load index/data into memory with
reduced paging. For
>small tables, this does not present a problem and you
can keep them in a
>convenient db.
>> >--Original Message--
>> >Andrew,
>> >
>> >you will gain the performance enhancement as you
>> imagined -- the join
>> >condition will direct the work to the member table via
>> the view. You can
>> >verify this with the execution plan.
>> >
>> >The key is, that your distribution will do this for
all
>> your important
>> >joins. You may have different join conditions. If
the
>> partitioning column
>> >is not always in the joining key, you will have to
>> engage multiple member
>> >tables. If this happens not two often, it might be
>> acceptable. So here you
>> >have to make the judgment call.
>> >
>> >hth
>> >
>> >Quentin
>> >
>> >
>> >"Andrew" <a@.b.com> wrote in message
>> >news:153501c38c0d$783e7b10$a101280a@.phx.gbl...
>> >> I'm considering distributing a large table which we
>> >> now have, for performance reasons.
>> >>
>> >> We've been doing a join on this large table (11
million
>> >> rows), and I don't know whether it would help much
to
>> >> partition the table across four machines.
>> >>
>> >> When you perform a join on a distributed partitioned
>> view,
>> >> where does the "work" take place. ie. If the table
>> just
>> >> gets reconstructed at the site of the view, and then
>> the
>> >> join operation takes place, I might actually
*suffer*
>> in
>> >> performance because of the network overhead.
>> >>
>> >> Could someone advise me on this? I may be wrong
about
>> how
>> >> it works, and I'd like to understand it before
>> undertaking
>> >> the task of distributing it.
>> >>
>> >> Thanks in advance,
>> >>
>> >> Andrew
>> >>
>> >
>> >
>> >.
>> >
>
>.
>

No comments:

Post a Comment