Best practices SQL on Netapp luns

Discussion:

(too old to reply)

xperre van wilrijk

2012-04-20 09:30:31 UTC

Hi,

I'm not a storage expert. I just maintain a set of virtualized sql
servers using Netapp luns (iscsi/snapdrive/snapmanager;...).

I noticed that data updates on Netapp are written to free blocks,
meaning the original block is not updated, but kept, since referenced
by snapshots earlier made.

So, may I conclude fragmentation is inherent to Netapp? May I
conclude windows defrag might cause volumes running out of space? May
I conclude that (in case we would have enough free space in the
volume) the chance that less physical IO is initiated after windows
defrag is negligible or even that in some cases the number of physical
IO's might increase? May I conclude Windows will initiate less IO's
since it thinks data is sequentialized, but the consequential number
of IO's on Netapp is unpredicatable?

May I conclude that the sql command "set statistics io on" does not
tell me the truth about the number of physical reads executed on
Netapp (or any other disk virtualisation/SAN system), only the number
of physical IO windows or SQL initiated (thinks that have to be
done)? Thus it's possible that Windows launches 5 physical IO's, but
data is in Netapp cache, meaning no real physical disk access is
executed.

Anyway I wonder what defrag means in RAID setup. Isn't it so that 1
byte might be spread over 8 physical disks ... or that 8 bytes might
be spread over 8 physical disk?

When I read https://communities.netapp.com/thread/8226 I start to
wonder whether sql server index rebuilds might no longer be best
practice, since this will have the same effect on snapshots as windows
defrag? May I conclude our metroclustered NetApp offers HA, DR and
fast restore, but that we should review best practices regarding IO
optimisation?

Thanks
Peter

Erland Sommarskog

2012-04-20 22:15:19 UTC

Permalink

Post by xperre van wilrijk
I noticed that data updates on Netapp are written to free blocks,
meaning the original block is not updated, but kept, since referenced
by snapshots earlier made.
So, may I conclude fragmentation is inherent to Netapp? May I
conclude windows defrag might cause volumes running out of space? May
I conclude that (in case we would have enough free space in the
volume) the chance that less physical IO is initiated after windows
defrag is negligible or even that in some cases the number of physical
IO's might increase? May I conclude Windows will initiate less IO's
since it thinks data is sequentialized, but the consequential number
of IO's on Netapp is unpredicatable?

I think those are question you will need to ask in forums devoted
to Netapp. I have no knowledge about product.

Post by xperre van wilrijk
May I conclude that the sql command "set statistics io on" does not
tell me the truth about the number of physical reads executed on
Netapp (or any other disk virtualisation/SAN system), only the number
of physical IO windows or SQL initiated (thinks that have to be
done)? Thus it's possible that Windows launches 5 physical IO's, but
data is in Netapp cache, meaning no real physical disk access is
executed.

This assumption is correct. However, that does not mean that SET STATISTICS
IO ON is irrelevant. And even a standard SATA disk have some amount of
cache.

Post by xperre van wilrijk
When I read https://communities.netapp.com/thread/8226 I start to
wonder whether sql server index rebuilds might no longer be best
practice, since this will have the same effect on snapshots as windows
defrag? May I conclude our metroclustered NetApp offers HA, DR and
fast restore, but that we should review best practices regarding IO
optimisation?

Or maybe Netapp does not play well with SQL Server? After all, there
are more things than just index rebuilds that causes lots of IO
on SQL Server. ETL loads for instance.

No, neither am I a storage expert.

--
Erland Sommarskog, SQL Server MVP, ***@sommarskog.se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx

xperre van wilrijk

2012-04-23 15:36:21 UTC

Permalink

Post by Erland Sommarskog
Or maybe Netapp does not play well with SQL Server? After all, there
are more things than just index rebuilds that causes lots of IO
on SQL Server. ETL loads for instance.

Thanks Erland.
I have impression we need to get backend experts (virtual machines,
network architecture, storage, ...) in order to tackle "sql
performance issues".
We do ETL and we do index rebuilds during the night. No issue, since
we do not have a 24h business. However I wonder whether index
rebuilds is still best practice on virtualized disks. I think I
should just try to find time to make a testcase.

I just read the following articles.
https://communities.netapp.com/community/netapp-blogs/the-it-corner/blog/2012/03/14/are-all-snapshots-created-equal
http://www.sql-server-performance.com/forum/threads/disk-speed-local-vs-netapp.29099/

It clearly tells me the difference between CoW and RoW snapshotting.
We use RoW. That's great since I can restore a 1TB database in 5
minutes, but I am just afraid RoW is evil regarding fragmentation,
while that's less a problem with CoW.

Thanks
Peter

George Neuner

2012-04-23 19:01:40 UTC

Permalink

On Fri, 20 Apr 2012 02:30:31 -0700 (PDT), xperre van wilrijk

Post by xperre van wilrijk
Anyway I wonder what defrag means in RAID setup.

Defrag means the same on an array as on a single disk ... it makes all
the bytes of the files contiguous. On an array, however, "contiguous"
files may be striped across the disks.

Post by xperre van wilrijk
Isn't it so that 1 byte might be spread over 8 physical disks ... or
that 8 bytes might be spread over 8 physical disk?

I've never encountered byte striping. RAID levels 2 & 3 perform bit
striping, but neither has been very popular and systems that use them
are quite rare. The only bit striping system I have ever seen was the
Connection Machine (CM2 series) Data Vault which used a non-standard
array ... and I don't believe many (or maybe any) of them still are in
service.

All arrays you are likely to encounter operate at the block level.
Level 0 stripes file blocks across the array with no checking or
redundancy. Level 1 mirrors each block of the primary disk on the
corresponding block of the secondary disk.

RAID levels 4 and above provide checking and redundancy using an
additional disk (or disks). Corresponding blocks from each of the
data disks are checked and backed up by "parity" blocks stored on the
additional disks. Parity blocks don't really store parity bits ...
the name is historical ... but rather store the bitwise XOR of their
corresponding data blocks. Level 4 stores parity blocks on a
dedicated disk. Level 5 rotates parity blocks amongst the disks to
try to minimize bottlenecks accessing the parity blocks. Level 6
extends level 5 to use multiple parity blocks (and thus additional
disks).

Occasionally you may encounter a RAID 4 system, but in practice you
are most likely to encounter level 0, 1, 5 or 6, or some simple
combination of 0 (striping) with another level.

Currently the highest RAID level is 6. Higher, 2 and 3 digit, numbers
are used as shorthand for combinations: e.g., "10" is read "1+0" which
describes a stripe of mirrors. Similarly "51" is read "5+1" which
describes a mirror of level 5 arrays. "100" is a stripe of striped
mirrors. This shorthand can be extended ad nauseam to describe
virtually any kind of system: e.g., you could create a "1601" which
would be a mirror of stripes of level 6 arrays each composed of
mirrors, etc.

George