• Benchmarking PHP Solr response data handling

    28-02-2012Author: basdenooijer

    Solr supports multiple output formats. Some are for general use (xml, json) and some are even language specific. If you’re using PHP these are the most logical response writer formats:

    • xml
    • json
    • phps (serialized php)
    • php (php code to execute)

    On top of that PHP offers multiple ways to parse XML. I’m benchmarking these options to determine the most efficient decoding to implement in the next major version of Solarium, but the results should be useful for any PHP Solr implementation.

    Before I get to the results some info on how I tested.
    Because this is only the first of several benchmarks I want to do the next few months I needed a good benchmarking tool. I couldn’t really find one that suits, so I’ve created a tool myself. It’s inspired by PHPUnit, but instead of test-cases you write benchmark-cases. It includes concepts like annotation and dataproviders. If you know PHPunit you will probably quickly understand this example:

    class SolrParserBenchmark extends Phperf\Benchmark
    {
    
        /**
         * @return array
         */
        public function solrJsonDataProvider()
        {
            return array(
                'small-results' => array(file_get_contents(__DIR__ . '/data/results.json')),
                'big-results' => array(file_get_contents(__DIR__ . '/data/text-results.json')),
            );
        }
    
        /**
         * Json decode
         *
         * Test data is similar to Solr output with wt=json
         *
         * @dataprovider solrJsonDataProvider
         * @repeat 50
         *
         * @param string $data
         * @return string
         */
        public function benchmarkJsonDecode($data)
        {
            return json_decode($data);
        }
    }
    

    The tool is still very much a prototype, but I intend on improving it for use by others as soon as I find the time. The current code, including this Solr benchmark, can be found on GitHub for those interested.

    Results

    Each decoding method is tested for a small result (approx. 10KB) and a big result (approx. 800KB). The tests are repeated 50 times and the result is the average time.

    View the results here

    Conclusions

    • The differences between decoding methods are quite big, in percentages
    • The size of the result data has a big impact, json_decode is second fastest with a small dataset but slowest with a big set.
    • There is a clear winner for both small and big datasets, unserialize
    • While the differences in percentages are big, even the worst performer with the big dataset (bigger than most real world cases) only takes about 16 thousands of a second.

    Based on these results the next Solarium version will probably use unserialize (phps response format).

    The tests were done on a 2Ghz i7 MBP running PHP 5.3.8. As always with benchmarkts, don’t just trust my tests, but be sure to test on your own environment! Your results might vary…

  • One comment on “Benchmarking PHP Solr response data handling