XQUT - Unit Testing in Pure XQuery
September 13, 2011 at 04:06 PM | categories: XQuery, Mark Logic | View CommentsI was working on a couple of pure XQuery projects that needed unit testing. While I could have integrated with JUnit or another existing framework, I really wanted something simple that I could run directly from cq. Hence XQUT.
XQUT will usually be invoked like this:
The cq app server should point to the code you are testing,
so that your test suite can import libraries. The eval root is
different: it is the location of the XQUT code, so that you only need
one copy of XQUT. The external variable SUITE is an XML
test suite. A simple test suite might look like this:
The XML is fairly simple. Under the root suite
element we have one or more unit elements, each
representing a test. The test XQuery can be defined as the lexical
value of the element, or as its expr child. The result
can be defined by a result attribute or element.
For more sophisticated tests, you can
add xsi:type attributes and sequences
of result elements. You can also use an
optional environment element to import libraries, define
variables, and define namespace prefixes. If you
add setup elements, these will be evaluated before any
tests. The test
suite for XQUT itself contains more examples.
Group By in XQuery 1.0 for MarkLogic Server
August 23, 2011 at 02:12 PM | categories: XQuery, Mark Logic | View CommentsXQuery 3.0 introduces new syntax for "group by". At this time, MarkLogic Server 4.2 is the latest release, and it doesn't have support for that syntax. So how can we implement "group by" when writing XQuery for MarkLogic?
Let's start with the W3C use cases. First, let's fetch the sample data and put it into MarkLogic. We can do that using cq. I'll leave out the schemas, since we don't need those. I also won't be exhaustive about optimizing every expression in these examples: suffice to say that there is room for even more improvement.
Sorry about the long block of code, but we need those documents. Paste that into cq, evaluate it, and you should get the empty sequence. That means your documents were inserted correctly: you can use the 'explore' link to check.
Now we can write some queries. Here is the first use case (Q1).
And the result should look like this:
We can't write XQuery 3.0 using XQuery 1.0 — but we can get the same result using an extra distinct-values step.
This code is a little awkward, though. Instead of looping through the records once, we have to perform a database lookup on each product name. Normally this would be an unavoidable cost, and perhaps a reason to look forward to XQuery 3.0. But MarkLogic gives us a way to cheat, and use an accumulator model to get the same result more quickly. I'm talking about maps.
This produces the same output, and will scale better the
distinct-values() version would. Of course it is also less
portable. But database application developers often have to implement
non-portable optimizations, and the less portable code can be
segregated into its own library modules.
Now let's look at the next example (Q2).
Expected result:
Here is a solution using maps:
Again, this solution produces the same results. This time we had two elements in the grouping key, and the map key must be a string. So we had to use an old database trick and concatenate the two values with a known delimiter. Naturally we have to be careful in our choice of delimiter.
For the remaining queries, I'll skip the W3C examples and output XML. Here are my solutions. Again, these return the desired results, but could benefit from more optimization work.
This final use-case is kind of odd, because the sample code works if you simply comment out the "group by". In other words, the sample data only contains one group. But I reimplemented it anyway.
That's it. I hope this was worth your time.
AWS and High-CPU Instances
July 04, 2011 at 09:52 AM | categories: Performance, AWS, Linux | View CommentsIPv6 Day
June 07, 2011 at 12:33 PM | categories: home, Linux | View CommentsTomorrow's event prodded me into setting up IPv6 at home, where I use openwrt. The tutorial I found was helpful: I just had to change the interface names. On my system eth0.1 was eth1, and 6rdtun was called 6to4. Comcast's test page says I'm up and working. I can see the unicorn too.
Visit to ipv6-test.net for more tests.
Intel SSD 510
March 02, 2011 at 07:41 AM | categories: Performance | View CommentsIntel's latest SSD pricing isn't as much of a shift as I had hoped for. As I see it, they have gone from $2.75/GB to $2.50/GB for enterprise-class SSD devices, and capacities have grown to 230-GB per device. That's an improvement, and the performance looks good. The combination of SATA-3 and lower failure rates than consumer-grade SSD may also help justify the price.
While SSD is now the logical choice over 15k-rpm disks, very few deployments use those. Instead, 10k-rpm are the workhorse for disk-heavy enterprise applications. SSD is getting closer, but still costs at least twice as much. Brand-name 10k-rpm SATA disks are available for $0.72-$1 per GB, with capacities up to 600-GB per spindle. It will take a lot of performance-related pain to cross that gap.




