home
news
about
services
contact

Where am I?


XQUT - Unit Testing in Pure XQuery

September 13, 2011 at 04:06 PM | categories: XQuery, Mark Logic | View Comments

I was working on a couple of pure XQuery projects that needed unit testing. While I could have integrated with JUnit or another existing framework, I really wanted something simple that I could run directly from cq. Hence XQUT.

XQUT will usually be invoked like this:

The cq app server should point to the code you are testing, so that your test suite can import libraries. The eval root is different: it is the location of the XQUT code, so that you only need one copy of XQUT. The external variable SUITE is an XML test suite. A simple test suite might look like this:

The XML is fairly simple. Under the root suite element we have one or more unit elements, each representing a test. The test XQuery can be defined as the lexical value of the element, or as its expr child. The result can be defined by a result attribute or element.

For more sophisticated tests, you can add xsi:type attributes and sequences of result elements. You can also use an optional environment element to import libraries, define variables, and define namespace prefixes. If you add setup elements, these will be evaluated before any tests. The test suite for XQUT itself contains more examples.

Read and Post Comments

Group By in XQuery 1.0 for MarkLogic Server

August 23, 2011 at 02:12 PM | categories: XQuery, Mark Logic | View Comments

XQuery 3.0 introduces new syntax for "group by". At this time, MarkLogic Server 4.2 is the latest release, and it doesn't have support for that syntax. So how can we implement "group by" when writing XQuery for MarkLogic?

Let's start with the W3C use cases. First, let's fetch the sample data and put it into MarkLogic. We can do that using cq. I'll leave out the schemas, since we don't need those. I also won't be exhaustive about optimizing every expression in these examples: suffice to say that there is room for even more improvement.

Sorry about the long block of code, but we need those documents. Paste that into cq, evaluate it, and you should get the empty sequence. That means your documents were inserted correctly: you can use the 'explore' link to check.

The cq explorer shows the W3C test documents.

Now we can write some queries. Here is the first use case (Q1).

And the result should look like this:

We can't write XQuery 3.0 using XQuery 1.0 — but we can get the same result using an extra distinct-values step.

This code is a little awkward, though. Instead of looping through the records once, we have to perform a database lookup on each product name. Normally this would be an unavoidable cost, and perhaps a reason to look forward to XQuery 3.0. But MarkLogic gives us a way to cheat, and use an accumulator model to get the same result more quickly. I'm talking about maps.

This produces the same output, and will scale better the distinct-values() version would. Of course it is also less portable. But database application developers often have to implement non-portable optimizations, and the less portable code can be segregated into its own library modules.

Now let's look at the next example (Q2).

Expected result:

Here is a solution using maps:

Again, this solution produces the same results. This time we had two elements in the grouping key, and the map key must be a string. So we had to use an old database trick and concatenate the two values with a known delimiter. Naturally we have to be careful in our choice of delimiter.

For the remaining queries, I'll skip the W3C examples and output XML. Here are my solutions. Again, these return the desired results, but could benefit from more optimization work.

This final use-case is kind of odd, because the sample code works if you simply comment out the "group by". In other words, the sample data only contains one group. But I reimplemented it anyway.

That's it. I hope this was worth your time.

Read and Post Comments

AWS and High-CPU Instances

July 04, 2011 at 09:52 AM | categories: Performance, AWS, Linux | View Comments

When AWS EC2 tells you that "requested instance type (cc1.4xlarge) is not supported in your requested Availability Zone (us-east-1b)", what they really mean is that you are using the wrong AMI. Switch to an HVM AMI. This enables support within Xen for Hardware Virtual Machine (HVM), using AMD SVM or Intel VT-x instructions.

Read and Post Comments

IPv6 Day

June 07, 2011 at 12:33 PM | categories: home, Linux | View Comments

Tomorrow's event prodded me into setting up IPv6 at home, where I use openwrt. The tutorial I found was helpful: I just had to change the interface names. On my system eth0.1 was eth1, and 6rdtun was called 6to4. Comcast's test page says I'm up and working. I can see the unicorn too.

Visit to ipv6-test.net for more tests.

Read and Post Comments

Intel SSD 510

March 02, 2011 at 07:41 AM | categories: Performance | View Comments

Intel's latest SSD pricing isn't as much of a shift as I had hoped for. As I see it, they have gone from $2.75/GB to $2.50/GB for enterprise-class SSD devices, and capacities have grown to 230-GB per device. That's an improvement, and the performance looks good. The combination of SATA-3 and lower failure rates than consumer-grade SSD may also help justify the price.

While SSD is now the logical choice over 15k-rpm disks, very few deployments use those. Instead, 10k-rpm are the workhorse for disk-heavy enterprise applications. SSD is getting closer, but still costs at least twice as much. Brand-name 10k-rpm SATA disks are available for $0.72-$1 per GB, with capacities up to 600-GB per spindle. It will take a lot of performance-related pain to cross that gap.

Read and Post Comments