Is it NoSQL or NOSQL (Not Only SQL)? Or is it SQL on MySQL, Postgres, or Oracle? Or maybe, just files? My simple answer is most likely any and all, once you go through the process of defining your requirements at a higher level. In many ways, I find the debate about SQL and the flavors of other non-sql stores to be a little silly as they both have a role as do files. The question has to start with what qualities do you need from your persistence for your platform. Even that will uncover that you probably have different needs for different bits of data that you need to keep.
That said, it is very exciting to me to have some serious choice. Roll the clock back a scant 5 years and your only practical choice was a SQL database (either commercial or open source) with your data most likely stored on a SAN. The simple problem of matching importance of the data to the cost of storage was very constrained, with perhaps only 50% cost difference separating your most expensive SAN space from your least expensive. Now we have a range of solutions that give us some real choice when matching requirements to solutions. But what are those requirements?
Life Cycle - All data has a meaningful life cycle. Very little data really is meaningful forever. Data you use to run your business needs to stick around for at least 7 years, but beyond that has diminishing value. Shopping carts may only be meaningful for a few days or weeks. How long the data is meaningful is often one of the drivers in the next requirement...
Business Importance - Ultimately if you're putting data on disk, that means it has a use for the duration of it's life cycle. The importance of the data is directly proportional to the business impact should it be lost. If you lose your order history, that will be very bad for business. If you lose a shopping cart, that might be irritating to the impacted users, but the business impact will be considerably less.
Availability - Obviously if you stored the data, you'd like to get back to it. Again, it's important to understand the impact to your application should the data be temporarily unavailable. Data that is always available is costly to achieve and comes with other interesting challenges (i.e. CAP theorem applies).
Scaleability - What volume of transactions will the data need to support? And what is the mix of read to write? The volume of data plays a role here as well, but in most scale problems it's the transaction rate more than the data volume that presents challenges in scaling.
Access Patterns - Clearly the types of access that are required to find the data is important. This is the center of the debate about SQL vs NoSQL. The business value of the perceived access patterns should be carefully reviewed when defining these requirements. Patterns that lend marginal business value, may not really be requirements at all.
Each storage platform has strengths and weakness in each of these areas. What is important is to understand these as well as your application requirements. For each piece of data that you will want to store, match these requirements and the answer of how to store the data becomes clear in most cases. The question really isn't SQL vs NoSQL, but as the NOSQL camp likes to state, it's about selecting the tool that best meets your requirements for each piece of data you need to persist.
One requirement I would like to discuss a bit further is business importance as I feel it's only with the introduction of serious alternatives to SQL on SAN that we have an opportunity to better match costs of storage to value to the business. SAN's allow the raw storage they control to be configured as RAID 0 through 6, although some RAID controllers will support a subset of these options. Most organizations rely on either RAID 1 or 5 for protecting themselves against the loss of a disk drives within the SAN. This leads to an overhead that reduces the protected storage vs the raw storage in the SAN.
RAID 1 is accomplished via mirroring and while 2 drives provides protection, 3 drives are required to ensure you are never running without protected storage. This means that you need to purchase 3X the raw storage that you want for protected capacity. RAID 5 improves the situation considerably allowing protection overhead that is as little as 125% raw storage for protection, although many organizations use double parity protection in RAID 5 which results in a 40% overhead. Unfortunately SAN's bring their own cost overhead with them due to the cost of controllers and other support hardware. SAN storage can range from $1,000 to $2,000 per TB for raw capacity. In the world of dropping storage costs, this is still a very high cost per TB for raw storage, especially when direct attached drives can be as little as $300/TB.
NoSQL (and to a large extent, even MySQL) platforms can take advantage of this inexpensive storage and protect data through replication, much like RAID 1. But since the cost of storage is dramatically less, the cost of protected storage is also less. There is also more flexibility available, allowing the number of copies of data retained to be easily matched to the value the data represents to the business. Mission critical data may have 6-8 copies, spread across two geographically diverse data centers. Low value data, may only have 2 copies in one data center. The opportunity to tune your costs to your business value has finally become practical and at lower price points than have been previously available.s
This is currently a hot area for discussion and debate. I encourage and welcome feedback and comments on the ideas that I have put forth here. Let me conceded now that I am not a RAID expert, so if you would like to elaborate and correct any mistakes I've made, please do.