I recently finished up the first phase of a complete rebuild of my company's intranet. A big part of the new application is an integrated search that hits not only the intranet pages (which are largely managed using Macromedia Contribute), but also hits our knowledge management tool, which is a CF 5 app that uses Oracle as well as Verity collections as its datasources.
Because the knowledge management (KM) app contains some sensitive information and that team was reluctant to have us hit their datasources directly, our first notion of how we'd integrate the two sides of the intranet was to have our CFMX app hit a CFML page on the CF5 side of things with a query string, then we'd get a WDDX packet as our response. Seemed to work great, but then we fired up Load Runner to do some load and performance testing, and this is when things got interesting.
Load Runner is a really, really great tool. If you haven't used it before, you can easily script virtual users by hitting a "record" button and clicking around on your web app. For things like forms (such as the search form in this case), you can fill it out once and submit it, then in the script on the Load Runner side you can parameterize your search by feeding it a text file. In our case we fed it a text file containing a bunch of different search terms (70 to be exact) and told it to randomly run searches based on those terms. This can help you pretty realistically simulate real users. You can also have it record what they call "think time" as you're building your script, meaning the time you're sitting on a page just looking at it, and then tell it to randomly use values based on a percentage range, again to simulate real users more accurately. Very cool stuff.
In our case we basically had three user types: searchers, who just repeatedly searched, clickers, who just navigated around all the content pages, and a few who did some clicking and then some searching. You then use the Load Runner controller to tell it which virtual users you want to use, how many of each you want to use, give it ramp up, duration, and ramp down times for the test, and fire it off. You can monitor practically everything you can think of during the test, including live stats on the server you're hitting. (Load Runner should run on its own separate, dedicated server, not the same server you're trying to test.)
I won't bore you with all the gory details, but the long and short of the situation is that under a load of even 30 search users things got nasty. REALLY nasty. Page response times started getting into the 20-30 second range, and that's not just the search functionality. Whatever was going on seemed to negatively impact the content pages as well, many of which are cached on the server.
So that's the bad news. Our architecture that works fine with a very small number of users apparently just isn't going to scale. The CFML pages held up extremely well even with 100 users or more really pounding away, but the search functionality was pretty ugly.
The good news is that we figured this out before we launched it to the users. I can't imagine the stress we would have been under had we launched it first and THEN figured out it wouldn't scale. Better to know all of this now than figure it out when my phone's ringing off the hook later.
The other great thing is that between the Load Runner reports, the server stats, and the JRun logs (we're running CFMX on JRun), we at least can figure out where the bottlenecks are. For further testing, we did the following:
- Used WDDX as well as more "standard" XML data locally instead of shipping it over the network
- Created a test database in SQL Server (on a separate physical machine) containing much of the same data so we could hit a database instead of using WDDX over HTTP
- Created local Verity collections containing much of the same data
From these tests and analysis of the Load Runner reports, at this point we've determined the following:
- CPU utilization is always extremely high (averaging 95%) when using WDDX or more "standard" XML, either locally or over HTTP
- Response times are always horrendously bad under load when using WDDX over HTTP
- Response times are quite good when using WDDX or XML locally, but CPU usage still seems high
- Response times and CPU utilization are both excellent when hitting the SQL Server database
- Response times and CPU utilization are both excellent when hitting local Verity collections, but this is very slightly slower than hitting SQL Server (which honestly surprised me a bit)
We're still working through some of our options, which at this point are as follows (feel free to suggest more!):
- Hit the Oracle database and Verity collections on the knowledge management side directly. This may or may not be possible depending on what the KM team will allow us to do.
- Replicate all the data in another database as well as replicate the Verity collections. Upside: distributes load, would perform extremely well. Downside: multiple points of failure, added maintenance headache.
- Ship XML data over to our server on a scheduled basis. Upside: we're hitting local data. Downside: CPU utilization when you're pounding away at XML data seems high.
That's where we are at this point--we're going to make a final determination for a path forward this week. I'm just sharing this because it's been an extremely educational process to go through, and points out the huge importance of load testing your apps.