When you create or rebuild an index in SQL Server, you can specify a FILLFACTOR value. It determines how full (as a percentage) that SQL Server should try to make the pages.
There is also one special value. Zero says "I haven't set a value" and it applies the default, which by default, is 100%. Any value from 1 to 100 is the target percentage.
So why do people set lower values for FILLFACTOR? The argument is they know that their inserts are going to happen all over the table (i.e. not just neatly at the end of the table), and they're trying to avoid page splits. And yes, lots of page splits can be a nasty thing. The most common problem case that people describe is where they've used a GUID as a clustering key (usually also a primary key) for a table. Then the inserts happen randomly throughout the table.
Now many applications do this with all their tables. SharePoint is one of the applications that has made database decisions that I really, really don't like. Because they used GUID keys all over the place, they just routinely make all the indexes have a 70% FILLFACTOR. That's really sad.
The problem with low values is that your data is spread across more pages, so you have to read more pages to get the same amount of data.
If you set a 50% FILLFACTOR, you now have to read twice as many pages to get the same amount of data.
And this is where the pain really occurs. Most systems spend the vast majority of their time reading, not writing. The time it takes for writes to occur might be important, but if you're going to optimize most SQL Server systems today, you need to be optimizing for reads.
And that's exactly the opposite of what you're doing with a low FILLFACTOR.
I often hear people discussing the proportion of writes on most systems. People assume that systems will spend 10 or 20% of the time writing. I had a lengthy discussion with a Microsoft product group member who told me that he thought even 40 or 50% was common. I've worked on these systems a long time, and I can assure you, that's not common.
Every time I measure the proportion of writes on systems, I typically see numbers like 1%. So you have to ask yourself if you want to optimize for the 1% or for the 99%.Â
I'm not underestimating the pain that a large number of page splits can cause but you need to be really sure that that's your biggest issue before you start dropping FILLFACTOR values. And you certainly never want to do it as a blanket rule across all your tables and indexes.
Good Post !!
Could you please suggest what tables to be set fillfactor . Do we need to use operational stats for setting fillfactor pls
In general, start with all tables at 100 (or still at the default of 0). Only when you detect significant painful fragmentation from inserts, lower the fillfactor a bit. Consider tables and indexes separately when doing this. Also, before lowering it, decide if you're using the right clustering key in the first place. You might be better changing that, in some situations.
Not quite right, Greg. The tables don't stay at only 70% full. The do fill up to 100% and can do so, even with GUIDs, quite rapidly. The only time they're at only 70% full is when they're first rebuilt. Especially with GUIDs, page splits can block other Inserts, Updates, Deletes, and, yes… Selects especially on narrow Clustered Indexes based on GUIDs. The more narrow the row, the more rows get logged in the transaction log when a page split occurs and all of that happens inside a system transaction where nothing can use the current page, the new page, or the next page (which needs to have its "Previous Page links updated).
You CAN go, quite literally, for months on GUID keyed index with zero page splits (not even so-called "good" ones) not to mention mostly total avoidance of "insert hot spots".
And you say it's a waste of memory but it's not unless the table is totally static (which should be rebuilt at 100% just so you know it's static by looking at the Fill Factor). The reason why I say that is that the space will, in fact, be used eventually. Just like an MDF or LDF, you're simply right-sizing it in advance to prevent unwanted "growth". Yes, you'll be reading more pages a little earlier but, as those pages fill, there will be no changes in the number of pages until you do a rebuild again or you make the mistake of allowing it to do page splits again.
This also means that the current "Best Practice" (it's not and was actually never meant to be) of REORGANIZEing between 5% and 30% and REBUILDing above 30% fragmentation is one of the worst things you could ever do to a random GUID-Keyed index. You MUST NOT USE REORGANIZE on these (or just about anywhere else but that's another much longer story) and you MUST REBUILD THEM AT 1% or you will suffer from extremely high, severely blocking, page splits. In fact, if you use REORGANIZE on random GUID-keyed indexes (and several other types, as well), you'll actually both perpetuate and increase page splits. It's like a bad drug habit… the more you do it, the more you NEED to do it.
And, no… heh… I'm not talking through my hat or the seat of my pants. đŸ˜€ I have literally hundreds of hours in tests and test code to prove all this.
Can't say that I agree. My focus is rarely on page splits. As I mentioned, the vast majority of systems that I work on spend almost their entire time reading, not writing. I'm not focused on the impact on the 1% of the overall load, unless the timing of it is critical.
So even if you can go for months without getting back to full pages, you've had months of inefficient reads, and for me, that's the core issue. It's quite unlike mdf files. There, empty space isn't read, and you pay no daily penalty for it. It's a good thing. Empty space in pages that you're reading isn't a good thing.
And even if tables do go from 70% back up to 100% (particularly if it's uniform across the table), then I think it's a lousy clustering key anyway.
You actually might want to reconsider your position of rarely focusing on page splits because very few systems in the world are as 'stable' as the one you've described. You also have to remember that with only one exception (Sequential Silo indexes), all logical and physical fragmentation is cause by page splits.
As for thinking that uniformly distributed keys are a lousy clustering key, you need to reconsider that, as well. You can actually use them to prevent fragmentation. Well, except for systems like you describe where writes practically don't exist. Just remember, that's a very rare form of database that most people never get to see never mind enjoy.
I routinely measure page splits and I know when they are an issue. I can assure you that reading more pages than necessary to get data ends up a much bigger overall problem at typical sites we work with.
p.s. I do very much agree with your final statement of "And you certainly never want to do it as a blanket rule across all your tables and indexes." Nothing is a panacea.
Hi Greg,
We have an audit table for logging user logins, with around 3000 inserts a minute, and very reads a day, that is, 10.000 thousend logins a day would be almost impossible
The table has more than 110 million records, and it has a non-clustered index key on a single datetime column. Old records are never read again
Would "up to 75% filled pages" be a reasonable number?
thanks
Hi Hernan, if it's a heap, and the only index is in datetime order, what's the advantage of the low fillfactor? Why not fill the pages? It's only an advantage if you want to avoid page splits on inserts / updates.
I have to agree with Greg. An ever-increasing key that only suffers "end of index" inserts does not need a lower Fill Factor. It wouldn't work anyway because all new rows will fill pages to 100% anyway.
What you DO have is an "insert hot spot" at the end of your index and it may be slowing your inserts down quite a bit (contention) especially when a "good" (new) page split occurs.
I can only contend that we see very different databases to each other. I could count on one hand the number of sites I see with hotspot issues, unless they are super high concurrency write sites. I do work with hundreds of clients and most transactional DBs I see spend the vast majority of their time reading, not writing. While there are obviously exceptions, optimising for the writes over the reads would be a fundamental mistake.