I had almost the exact same conversation with three customers over the last few weeks, and one of them ended with the EMC vSpecialist who had invited me to the call telling the customer “Keep an eye on his blog, I’m sure he’ll have more up there soon!” With pressure like that, who needs deadlines? ;-)
The conversation was around whether the storage workload profile of an IaaS offering would differ from a normal virtualized, enterprise workload, and if so how, how could an SP leverage FAST and FAST Cache to the best effect? EMC has some great general design guidelines for these features, and there have been some great posts on the interwebs, but to my knowledge there hasn’t been anything specifically targeted at Service Providers. Since, in my experience, the workload profile is significantly different in an IaaS environment I thought it might be a good idea to collect some data and write down my thoughts. We’ll do FAST Cache in this post and FAST VP in a subsequent one.
To be clear, I’m not your EMC TC and none of these opinions or observations are endorsed by EMC or VCE. They are my opinions only and your mileage may vary! I’m going to make some generalizations based on my experience and what I’ve seen, but I understand that there are always going to be exceptions to any rule. If you want to discuss your particular configuration or I/O pattern, please throw something in the comments and I’ll do what I can to get the right resources engaged for you.
First, let’s do a quick recap of the two technologies that we are talking about here for those of you who aren’t familiar with them. VNX FAST VP is a policy-based auto-tiering solution. The goal of FAST VP is to efficiently utilize storage tiers is to lower the overall cost of the storage solution by moving “slices” of colder data to high-capacity disks and to increase performance by keeping hotter slices of data on performance drives. In an IaaS environment, FAST VP is a way for the provider to offer a blended storage offering, reducing the cost of a traditional single-type offering while allowing for a wider range of customer-use cases and accommodating a larger cross-section of VMs with different performance characteristics.
FAST Cache is supported by all 300- and 700-series Vblock platforms, and is designed to extend the VNX array’s read-write cache and ensure that unpredictable I/O spikes that would normally result in cache misses can be serviced at EFD speeds. FAST Cache mitigates the effects of these I/O patterns by extending the DRAM cache for both reads AND writes, dramatically reducing the overall number of dirty pages and cache misses.
Now that we know the tools we are working with, let’s look at the I/O patterns that are found most often in an IaaS environment. The basic challenge is that while it is difficult (impossible?) to predict the types of workloads that are going to be running against the array when you open it up to a public audience, it’s also a business model that doesn’t easily support over-building to provide buffer space. If you contract pools of CPU, RAM and disk to customers for them to partition into VMs as their business requires, you are going to see workloads all over the map. Especially in a cloud environment (private cloud?) where you are hosting production servers, you may have write-heavy databases next to read-heavy file servers next to RAM-hungry Exchange servers next to AD servers doing almost nothing. The architecture between the array and the VMs can also complicate the issue: multiple VMs running multiple applications/workloads, on multiple VMFS datastores spread across multiple hosts can generate a very random I/O pattern, placing stress on both the storage processors as well as on the DRAM-based cache.
FAST Cache is essentially a no-brainer here, and the IaaS use-case is one that sees an almost immediate benefit from it. In our (non-VCE affiliated) design document for vCloud Director on Vblock we actually called for doubling the amount of FAST Cache that is normally recommended by EMC on the VNX platforms, and there was a good bit of conversation around simply recommending the maximum amount the systems would take. There are a number of reasons for this:
- As discussed earlier, the variability of the workload pattern requires a “buffer” of some sort to account for the spikes generated by customers on both the read and write side.
- Despite the impressive $ per I/O numbers that today’s generation of EFDs show, most every SP I talk to struggles to find out how to work the cost into the overall business model. The truth is that when given the option of SATA, SAS and EFD to support their workloads in an IaaS environment, customers will gravitate towards the bottom two tiers regardless of the cost. FAST Cache gives the SP the ability to market EFD speeds to their customers for the most frequently accessed data without requiring the customer to explicitly commit to the cost of an EFD tier.
- By providing read and write cache SPs have the option of extending out their existing IaaS infrastructure to support additional lines of business. You don’t need to have multiple arrays to handle your different offerings if your primary array can handle multiple types of workloads, and this lowers the cost of entry and initial capital outlay of those offerings.
The general rule of thumb is to start with 5% of the total capacity as FAST Cache, and make sure to plan out your disk layout to ultimately support as much as the array can handle. For the VNX 7500 that lives in the Vblock 300GX, that’s a whopping 2.1Tb of cache spread out over 20 200GB EFDs so it’s important to account for those disks during the design phase. For IaaS workloads, I’m recommending increasing that to 10%-12% of the usable capacity of the array due to the nature of the workload as well as the average I/O footprint that we see. Again, every environment can be different, so LISTEN TO YOUR TCs! These are general guidelines, and it may make sense for your ratio to go up or down based on how your customers are using the platform!
In fact (and this probably won’t make the sales guys at VCE happy) I’d prioritize FAST Cache over having an EFD tier in your array for almost ALL customers, but especially for service providers. We’ll talk about tiering in the next post, but even if you wanted to include an EFD tier for your offering the amount that needs to be installed in order to make it viable and useful to customers is fairly low. In my opinion, FAST Cache is the single most important thing to include in your design to extend capacity, protect the tenants and create flexibility in the platform.
Hopefully this has been helpful, and I’m interested to hear your thoughts in the comments below. As always, professional disclosure and common decency is appreciated! In my next post we’ll talk about storage tiering and how that can help differentiate a service provider’s offering and make the cost of high performance disk more palatable to customer.