<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Raspberry</title>
	<atom:link href="http://www.raspberry.nl/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.raspberry.nl</link>
	<description>Specialist in enterprise webapplicaties</description>
	<lastBuildDate>Fri, 02 Mar 2012 20:44:58 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Benchmarking PHP Solr response data handling</title>
		<link>http://www.raspberry.nl/2012/02/28/benchmarking-php-solr-response-data-handling/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=benchmarking-php-solr-response-data-handling</link>
		<comments>http://www.raspberry.nl/2012/02/28/benchmarking-php-solr-response-data-handling/#comments</comments>
		<pubDate>Tue, 28 Feb 2012 16:00:19 +0000</pubDate>
		<dc:creator>basdenooijer</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[Solarium]]></category>
		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://www.raspberry.nl/?p=439</guid>
		<description><![CDATA[Solr supports multiple output formats. Some are for general use (xml, json) and some are even language specific. If you&#8217;re using PHP these are the most logical response writer formats: xml json phps (serialized php) php (php code to execute) On top of that PHP offers multiple ways to parse XML. I&#8217;m benchmarking these options [...]]]></description>
			<content:encoded><![CDATA[<p>Solr supports multiple output formats. Some are for general use (xml, json) and some are even language specific. If you&#8217;re using PHP these are the most logical response writer formats:</p>
<ul>
<li>xml</li>
<li>json</li>
<li>phps (serialized php)</li>
<li>php (php code to execute)</li>
</ul>
<p>On top of that PHP offers multiple ways to parse XML. I&#8217;m benchmarking these options to determine the most efficient decoding to implement in the next major version of <a href="http://www.solarium-project.org" target="_blank">Solarium</a>, but the results should be useful for any PHP Solr implementation.<span id="more-439"></span></p>
<p>Before I get to the results some info on how I tested.<br />
Because this is only the first of several benchmarks I want to do the next few months I needed a good benchmarking tool. I couldn&#8217;t really find one that suits, so I&#8217;ve created a tool myself. It&#8217;s inspired by PHPUnit, but instead of test-cases you write benchmark-cases. It includes concepts like annotation and dataproviders. If you know PHPunit you will probably quickly understand this example:</p>
<pre class="brush: php; title: ; notranslate">
class SolrParserBenchmark extends Phperf\Benchmark
{

    /**
     * @return array
     */
    public function solrJsonDataProvider()
    {
        return array(
            'small-results' =&gt; array(file_get_contents(__DIR__ . '/data/results.json')),
            'big-results' =&gt; array(file_get_contents(__DIR__ . '/data/text-results.json')),
        );
    }

    /**
     * Json decode
     *
     * Test data is similar to Solr output with wt=json
     *
     * @dataprovider solrJsonDataProvider
     * @repeat 50
     *
     * @param string $data
     * @return string
     */
    public function benchmarkJsonDecode($data)
    {
        return json_decode($data);
    }
}
</pre>
<p>The tool is still very much a prototype, but I intend on improving it for use by others as soon as I find the time. The current code, including this Solr benchmark, can be found <a href="https://github.com/basdenooijer/phperf" target="_blank">on GitHub</a> for those interested.</p>
<p><strong>Results</strong></p>
<p>Each decoding method is tested for a small result (approx. 10KB) and a big result (approx. 800KB). The tests are repeated 50 times and the result is the average time.</p>
<p><a href="http://www.raspberry.nl/files/solr-decode-performance.html" target="_blank">View the results here</a></p>
<p><strong>Conclusions</strong></p>
<ul>
<li>The differences between decoding methods are quite big, in percentages</li>
<li>The size of the result data has a big impact, json_decode is second fastest with a small dataset but slowest with a big set.</li>
<li>There is a clear winner for both small and big datasets, unserialize</li>
<li>While the differences in percentages are big, even the worst performer with the big dataset (bigger than most real world cases) only takes about 16 thousands of a second.</li>
</ul>
<p>Based on these results the next Solarium version will probably use unserialize (phps response format).</p>
<p>The tests were done on a 2Ghz i7 MBP running PHP 5.3.8. As always with benchmarkts, don&#8217;t just trust my tests, but be sure to test on your own environment! Your results might vary&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.raspberry.nl/2012/02/28/benchmarking-php-solr-response-data-handling/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Solr delay component</title>
		<link>http://www.raspberry.nl/2012/01/04/solr-delay-component/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=solr-delay-component</link>
		<comments>http://www.raspberry.nl/2012/01/04/solr-delay-component/#comments</comments>
		<pubDate>Wed, 04 Jan 2012 16:55:04 +0000</pubDate>
		<dc:creator>basdenooijer</dc:creator>
				<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://www.raspberry.nl/?p=425</guid>
		<description><![CDATA[First of all, why would you want to slow Solr down? After all, it&#8217;s great speed is one of the main reasons it&#8217;s become so popular. Well, I need to slow Solr down for testing load balancing timeouts / failover and to test parallel execution. Both are hard to do with response times well below [...]]]></description>
			<content:encoded><![CDATA[<p>First of all, why would you want to slow Solr down? After all, it&#8217;s great speed is one of the main reasons it&#8217;s become so popular. Well, I need to slow Solr down for testing load balancing timeouts / failover and to test parallel execution. Both are hard to do with response times well below a second. There are some tricks possible using inefficient queries, however that&#8217;s not very consistent.</p>
<p>So, I&#8217;ve made a simple component that allows you to slow Solr down for with an exact amount of time using the query string. When the component is enabled you can simple add your delay (in millisec) to the request like this: &amp;delay=5000</p>
<p><span id="more-425"></span>Maybe this is useful to more people, so I&#8217;ve placed it on GitHub. Here are the steps to use it:</p>
<ol>
<li>Go to the GitHub project page <a href="https://github.com/basdenooijer/raspberry-solr-plugins" target="_blank">https://github.com/basdenooijer/raspberry-solr-plugins</a> and download the project.</li>
<li>In the build dir of the project is one jar file. Place this in the instanceDir of your Solr core, in a folder called &#8216;lib&#8217;. This folder may not exist yet, in that case create it. For a Solr example install the directory is: /example/solr/lib</li>
<li>Add the two snippets below to your solrconfig.xml (I&#8217;m assuming a default solrconfig.xml, otherwise you might need to adjust the configs)</li>
<li>Restart Solr</li>
<li>Test it by adding a delay param to your request. For instance:<br />
<a href="http://localhost:8983/solr/select/?q=*%3A*&amp;version=2.2&amp;start=0&amp;rows=10&amp;indent=on&amp;debugQuery=on&amp;delay=1000" target="_blank">http://localhost:8983/solr/select/?q=*%3A*&amp;version=2.2&amp;start=0&amp;rows=10&amp;indent=on&amp;debugQuery=on&amp;delay=1000<br />
</a>This should delay the query by one second, you should also be able to see this in the debug output. The delay param is in milliseconds. No param or a value of 0 disables the delay.</li>
</ol>
<p>These are the two snippets you need to add to your solrconfig.xml:</p>
<p>Add this inside the requestHandler definition that is named &#8220;search&#8221;:</p>
<pre class="brush: xml; title: ; notranslate">&lt;arr name=&quot;last-components&quot;&gt;
&lt;str&gt;delay&lt;/str&gt;
&lt;/arr&gt;</pre>
<p>Outside the requestHandler definition add this:</p>
<pre class="brush: xml; title: ; notranslate">&lt;searchComponent name=&quot;delay&quot; class=&quot;nl.raspberry.solr.DelayComponent&quot;&gt;
&lt;/searchComponent&gt;</pre>
<p><strong>Important warning:</strong> this component is intended purely for testing purposes. It&#8217;s not advised to enable this component on a production environment. First of all it&#8217;s useless, in production you should not be testing. But it can also easily be abused to flood the server with lots of queries with a big delay.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.raspberry.nl/2012/01/04/solr-delay-component/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Solr select query GET vs POST request</title>
		<link>http://www.raspberry.nl/2011/12/22/solr-select-query-get-vs-post-request/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=solr-select-query-get-vs-post-request</link>
		<comments>http://www.raspberry.nl/2011/12/22/solr-select-query-get-vs-post-request/#comments</comments>
		<pubDate>Thu, 22 Dec 2011 11:03:40 +0000</pubDate>
		<dc:creator>basdenooijer</dc:creator>
				<category><![CDATA[Solr]]></category>
		<category><![CDATA[solr]]></category>

		<guid isPermaLink="false">http://www.raspberry.nl/?p=412</guid>
		<description><![CDATA[In most cases a GET request is used to send select queries to Solr. This is how it&#8217;s done in most examples, it&#8217;s easy to test in the browser and easy to implement. However, Solr also supports POST requests for select queries. This can for instance be useful if you use complex Solr queries with [...]]]></description>
			<content:encoded><![CDATA[<p>In most cases a GET request is used to send select queries to Solr. This is how it&#8217;s done in most examples, it&#8217;s easy to test in the browser and easy to implement. However, Solr also supports POST requests for select queries. This can for instance be useful if you use complex Solr queries with lots of  facets and/or filterqueries where the total length of your querystring can exceed the limits of servlet containers. But how does GET compare to POST in Solr? <span id="more-412"></span>First of all I want to check performance. I&#8217;m not expecting a big difference here, but I want to be sure. So I&#8217;ve made two simple test scripts, one for a short query and one for a longer one. The short query:</p>
<pre class="brush: plain; title: ; notranslate">facet=true&amp;sort=price+asc&amp;f.price.facet.range.gap=100&amp;facet.range={!key%3Dpriceranges}price&amp;hl.simple.pre=&lt;b&gt;&amp;hl.fl=name,features&amp;wt=json&amp;hl=true&amp;rows=20&amp;fl=id,name,price&amp;start=2&amp;q=*:*&amp;f.price.facet.range.end=1000&amp;hl.simple.post=&lt;/b&gt;&amp;facet.field={!key%3Dstock}inStock&amp;fq=price:[1+TO+300]&amp;f.price.facet.range.start=1</pre>
<p>And the longer one:</p>
<pre class="brush: plain; title: ; notranslate">facet=true&amp;sort=price+asc&amp;f.price.facet.range.gap=100&amp;facet.range={!key%3Dpriceranges}price&amp;hl.simple.pre=&lt;b&gt;&amp;hl.fl=name,features&amp;wt=json&amp;hl=true&amp;rows=20&amp;fl=id,name,price&amp;start=2&amp;q=*:*&amp;f.price.facet.range.end=1000&amp;hl.simple.post=&lt;/b&gt;&amp;facet.field={!key%3Dstock}inStock&amp;fq=price:[1+TO+300]&amp;fq=price:1+OR+price:2+OR+price:3+OR+price:4+OR+price:5+OR+price:6+OR+price:7+OR+price:8+OR+price:9+OR+price:10+OR+price:11+OR+price:12+OR+price:13+OR+price:14+OR+price:15+OR+price:16+OR+price:17+OR+price:18+OR+price:19+OR+price:20+OR+price:21+OR+price:22+OR+price:23+OR+price:24+OR+price:25+OR+price:26+OR+price:27+OR+price:28+OR+price:29+OR+price:30+OR+price:31+OR+price:32+OR+price:33+OR+price:34+OR+price:35+OR+price:36+OR+price:37+OR+price:38+OR+price:39+OR+price:40+OR+price:41+OR+price:42+OR+price:43+OR+price:44+OR+price:45+OR+price:46+OR+price:47+OR+price:48+OR+price:49+OR+price:50+OR+price:51+OR+price:52+OR+price:53+OR+price:54+OR+price:55+OR+price:56+OR+price:57+OR+price:58+OR+price:59+OR+price:60+OR+price:61+OR+price:62+OR+price:63+OR+price:64+OR+price:65+OR+price:66+OR+price:67+OR+price:68+OR+price:69+OR+price:70+OR+price:71+OR+price:72+OR+price:73+OR+price:74+OR+price:75+OR+price:76+OR+price:77+OR+price:78+OR+price:79+OR+price:80+OR+price:81+OR+price:82+OR+price:83+OR+price:84+OR+price:85+OR+price:86+OR+price:87+OR+price:88+OR+price:89+OR+price:90+OR+price:91+OR+price:92+OR+price:93+OR+price:94+OR+price:95+OR+price:96+OR+price:97+OR+price:98+OR+price:99+OR+price:100+OR+price:101+OR+price:102+OR+price:103+OR+price:104+OR+price:105+OR+price:106+OR+price:107+OR+price:108+OR+price:109+OR+price:110+OR+price:111+OR+price:112+OR+price:113+OR+price:114+OR+price:115+OR+price:116+OR+price:117+OR+price:118+OR+price:119+OR+price:120+OR+price:121+OR+price:122+OR+price:123+OR+price:124+OR+price:125+OR+price:126+OR+price:127+OR+price:128+OR+price:129+OR+price:130+OR+price:131+OR+price:132+OR+price:133+OR+price:134+OR+price:135+OR+price:136+OR+price:137+OR+price:138+OR+price:139+OR+price:140+OR+price:141+OR+price:142+OR+price:143+OR+price:144+OR+price:145+OR+price:146+OR+price:147+OR+price:148+OR+price:149+OR+price:150+OR+price:151+OR+price:152+OR+price:153+OR+price:154+OR+price:155+OR+price:156+OR+price:157+OR+price:158+OR+price:159+OR+price:160+OR+price:161+OR+price:162+OR+price:163+OR+price:164+OR+price:165+OR+price:166+OR+price:167+OR+price:168+OR+price:169+OR+price:170+OR+price:171+OR+price:172+OR+price:173+OR+price:174+OR+price:175+OR+price:176+OR+price:177+OR+price:178+OR+price:179+OR+price:180+OR+price:181+OR+price:182+OR+price:183+OR+price:184+OR+price:185+OR+price:186+OR+price:187+OR+price:188+OR+price:189+OR+price:190+OR+price:191+OR+price:192+OR+price:193+OR+price:194+OR+price:195+OR+price:196+OR+price:197+OR+price:198+OR+price:199+OR+price:200+OR+price:201+OR+price:202+OR+price:203+OR+price:204+OR+price:205+OR+price:206+OR+price:207+OR+price:208+OR+price:209+OR+price:210+OR+price:211+OR+price:212+OR+price:213+OR+price:214+OR+price:215+OR+price:216+OR+price:217+OR+price:218+OR+price:219+OR+price:220+OR+price:221+OR+price:222+OR+price:223+OR+price:224+OR+price:225+OR+price:226+OR+price:227+OR+price:228+OR+price:229+OR+price:230+OR+price:231+OR+price:232+OR+price:233+OR+price:234+OR+price:235+OR+price:236+OR+price:237+OR+price:238+OR+price:239+OR+price:240+OR+price:241+OR+price:242+OR+price:243+OR+price:244+OR+price:245+OR+price:246+OR+price:247+OR+price:248+OR+price:249+OR+price:250&amp;f.price.facet.range.start=1</pre>
<p><em>Normally a range would be used instead of that very long filterquery, but it&#8217;s an easy way to create a long request URI.</em></p>
<p>Each query is executed 100 times, and the total time measured. And this is repeated 25 times to check for consistency. Both scripts are tested this way once with GET and once with POST.</p>
<p>The results:</p>
<ul>
<li>100 short query GET requests take an average total time of 0.16 seconds. No noticeable difference by using POST.</li>
<li>100 long query GET requests take an average total time of 0.22 seconds. No noticeable difference by using POST.</li>
<li>The timings are very consistent</li>
</ul>
<p>So, performance is not really a factor in this. How about logging? This depends on your setup, but with a default Solr setup the logging for a query received by GET or POST is the same.</p>
<p><strong>Conclusion</strong></p>
<p>Using GET or POST is doesn&#8217;t really make a big difference in a standard Solr setup.<br />
However a POST request will usually have a higher limit, depending on your servlet container. In Jetty the default ‘headerBufferSize’ is 4kB. Tomcat has a similar setting ‘maxHttpHeaderSize’, also 4kB by default. This limit applies to all the combined headers of a request, so it’s not just the querystring. This might seem more than enough, but I have already seen two cases where a query with many filterqueries and facets reached this limit.<br />
In comparison, the default for POST data in tomcat (‘maxPostSize’) is 4MB. Jetty uses a ‘maxFormContentSize’ setting with a lower default value of 200kB, but still way higher than the header limit and more than enough for even the biggest queries.</p>
<p>So, if you plan on constructing very big queries POST might be a safe choice. In other cases a GET request will be just fine.<br />
And, there is always the option to simple raise your servlet container limits instead of switching to POST.</p>
<p>While writing this post I&#8217;ve started to wonder if moving query parameters to the Solr config by using a custom request handler would make a difference. I&#8217;ll test that in a new post soon.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.raspberry.nl/2011/12/22/solr-select-query-get-vs-post-request/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>New Solarium website</title>
		<link>http://www.raspberry.nl/2011/06/22/new-solarium-website/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=new-solarium-website</link>
		<comments>http://www.raspberry.nl/2011/06/22/new-solarium-website/#comments</comments>
		<pubDate>Wed, 22 Jun 2011 12:10:04 +0000</pubDate>
		<dc:creator>basdenooijer</dc:creator>
				<category><![CDATA[Solarium]]></category>
		<category><![CDATA[opensource]]></category>
		<category><![CDATA[solarium]]></category>
		<category><![CDATA[solr]]></category>

		<guid isPermaLink="false">http://blog.raspberry.nl/?p=287</guid>
		<description><![CDATA[I&#8217;ve made lots of progress since my last post about Solarium 2.0: the first release candidate is out (actually for several weeks already!) At that point I decided that Solarium needs a user friendly website, and not just a bunch of documentation pages. It&#8217;s been in the works for a few weeks now, and I&#8217;ve [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve made lots of progress since my last post about Solarium 2.0: the first release candidate is out (actually for several weeks already!) At that point I decided that Solarium needs a user friendly website, and not just a bunch of documentation pages. It&#8217;s been in the works for a few weeks now, and I&#8217;ve just placed  it online: <a href="http://www.solarium-project.org" target="_blank">www.solarium-project.org</a></p>
<p><span id="more-287"></span>If you take into account that I&#8217;m not a frontend developer (I strongly prefer backend work) I think the result is quite ok. I&#8217;ve even made a logo! The site will be further improved in several areas over the coming months, but it&#8217;s already a big step up from the previous wiki based site. The wiki is still in use for the manuals, as it works perfectly for writing documentation.</p>
<p>The new website also includes a blog. From now on I will post all Solarium related blog entries on the Solarium website. Next step for Solarium is completing the docs for 2.0. I&#8217;ve already made a start on it, but still need to write lots of docs.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.raspberry.nl/2011/06/22/new-solarium-website/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Solarium 2.0</title>
		<link>http://www.raspberry.nl/2011/05/06/solarium-2-0/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=solarium-2-0</link>
		<comments>http://www.raspberry.nl/2011/05/06/solarium-2-0/#comments</comments>
		<pubDate>Fri, 06 May 2011 07:27:09 +0000</pubDate>
		<dc:creator>basdenooijer</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[Solarium]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[solarium]]></category>

		<guid isPermaLink="false">http://blog.raspberry.nl/?p=270</guid>
		<description><![CDATA[Several weeks ago Solarium 1.0 was released. Since then lots of development has been going on. Many features were added: MoreLikeThis support, range facet, multiQuery facet, DisMax support, geospatial search support and highlighting. The target for these features was originally Solarium 1.1, however I&#8217;ve changed the plans. In this post I&#8217;ll explain why, and what [...]]]></description>
			<content:encoded><![CDATA[<p>Several weeks ago Solarium 1.0 was released. Since then lots of development has been going on. Many features were added: MoreLikeThis support, range facet, multiQuery facet, DisMax support, geospatial search support and highlighting. The target for these features was originally Solarium 1.1, however I&#8217;ve changed the plans. In this post I&#8217;ll explain why, and what the important changes in 2.0 will be.</p>
<p><span id="more-270"></span>When I started to add new features I quickly discovered that it was no option to add all new functionality to the existing select query object. It would become an unmanageable and inefficient class with 100+ methods. As a result I created a component structure. The query object only has an API for the Solr common query parameters, all other functionality is in component classes. This is comparable to how Solr works. So there is a MoreLikeThis component, a Highlighter component etcetera. Each component is only loaded when used, so Solarium is not slowed down by features you don&#8217;t use.</p>
<p>There is one downside to this solution though, it breaks backwards compatibility as I moved faceting into a &#8216;FacetSet&#8217; component (because facets are not part of the common query parameters). The changes required to get existing code working with this new structure are relatively small, but it&#8217;s not compatible with 1.0. As a minor release may not break compatibility this requires a new major release, 2.0.</p>
<p>In the meantime I got several questions from people using Solarium how to add features they needed. Solarium 1.0 does not really offer a good way of extending or modifying it. Something as simple as added a few custom params to the request is currently not supported.<br />
This was never a design goal for 1.0, but I realise people will need this and Solarium would also need it in the future to add optional features like debugging or caching. Not all features can be added to the main code, this would create way to much overhead.</p>
<p>Since a new major release was already needed for the query components I&#8217;ve decided to resolve these issues at the same time, so I wouldn&#8217;t need to do a 3.0 release shortly after 2.0. These are the changes currently planned for Solarium 2.0:</p>
<ol>
<li>Adjust query flow. Currently this process is not very suitable for customization. The flow will be altered in such a way that it&#8217;s possible to customize all parts of the flow.<br />
Because the flow is currently completely internal to Solarium and not directly available to users this has no impact. The external interface will be the same as in 1.0.</li>
<li>The client object will become more of a central manager. Solr communication is already done by the adapters, now all related settings will also be moved to the adapters. This way the client object only has the role of main API access point.<br />
Impact for 1.0 users is limited, some connection settings need to be moved to the adapter.</li>
<li>Add a plugin system for all important concepts. This includes querytypes, requestbuilders , responseparsers and query components. Existing code will also be refactored into this structure.<br />
This only adds new features, no impact for 1.0 users.</li>
<li>Add an event-hook system. All important phases of the flow get a &#8216;pre&#8217; and &#8216;post&#8217; event, with the possibility to modify data. This can be used by end-users for doing things like custom params, and by Solarium for adding optional features like debugging without slowing down Solarium.<br />
This only adds new features, no impact for 1.0 users.</li>
<li>Add query components, as described above.<br />
Impact for 1.0 users limited to faceting, facet methods need to be called in a slightly different way.</li>
</ol>
<p>I will describe the changes in more detail later, but they might still change during development. The focus of the changes is on improving the structure for future development, while limiting the impact for 1.0 users to only a few settings / methods.</p>
<p>To demonstrate the benefits of the new structure here are some use cases possible in 2.0:</p>
<ul>
<li>Use only the query API of Solarium and handle the request in your own code. This can make sense if you already have a good solution in place and only want to replace a string-based query builder with the Solarium query API.</li>
<li>Add custom params or headers to the Solr request. You can still use the normal API, but easily customize the request using the preExecuteRequest event-hook. <em>(names of the event-hooks are yet to be determined)</em></li>
<li>Add a custom querytype</li>
<li>Add a custom query component</li>
<li>Cache some parts of the Solarium flow to optimize performance</li>
</ul>
<p>Development has already started, but will need some time. I hope to have a working prototype within a few weeks, but a lot of work will also need to go into testing and documenting. Development will be done in the branch &#8216;feature/new-structure&#8217;.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.raspberry.nl/2011/05/06/solarium-2-0/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>My experiences with writing documentation</title>
		<link>http://www.raspberry.nl/2011/04/12/documentation/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=documentation</link>
		<comments>http://www.raspberry.nl/2011/04/12/documentation/#comments</comments>
		<pubDate>Tue, 12 Apr 2011 13:49:28 +0000</pubDate>
		<dc:creator>basdenooijer</dc:creator>
				<category><![CDATA[Solarium]]></category>

		<guid isPermaLink="false">http://blog.raspberry.nl/?p=255</guid>
		<description><![CDATA[Like most developers I don&#8217;t like writing documentation. But when I decided to turn Solarium into an opensource project some months ago I really needed to write quite a bit of documentation, because I feel good documentation is very important for an opensource project (actually, for any project&#8230;) But there are many ways to document [...]]]></description>
			<content:encoded><![CDATA[<p>Like most developers I don&#8217;t like writing documentation. But when I decided to turn Solarium into an opensource project some months ago I really needed to write quite a bit of documentation, because I feel good documentation is very important for an opensource project (actually, for any project&#8230;) But there are many ways to document code. I want to share my experiences in finding a solution.</p>
<p>First of all I decided to make good use of phpdoc, a no-brainer. This way I can generate API docs and Phpdoc works great with IDE features like autocompletion and inline documentation.</p>
<p>But API docs alone are not enough, somekind of manual was needed for background info, examples, guidelines etcetera. I&#8217;ve seen or worked with multiple solutions in the past, ranging from word docs to wikis, custom websites and docbook. Some were easy to rule out (like Word docs&#8230;) but this still left me with several options with their own pros and cons.<span id="more-255"></span></p>
<p>I started out by using the wiki option of the GitHub project. The big advantage is that you can enable it with the click of a button and only need to add some content. No hosting, no setup, no development. But after creating several pages I realised there are several drawbacks:</p>
<ul>
<li>You are very limited by what the wiki syntax offers, no custom scripts or html allowed</li>
<li>GitHub wikis don&#8217;t support Gists (strangely enough)</li>
<li>As with most wikis there is no way to create a page hierarchy</li>
<li>And several other small annoyances</li>
</ul>
<p>So I started searching for another solution. After some research I ended up with Sphinx or DocBook. Both use markup files for generating docs in multiple formats. I really liked the idea of making the documentation part of the github project files, and exporting into multiple formats.<br />
I tried DocBook. While this looked ideal and the result was ok, editing DocBook was tedious. I could not find a good DocBook WYSIWYG editor, and XML editing is very time consuming. And after editing I needed to &#8216;build&#8217; the XML into an output format to check the result.</p>
<p>After making little progress I decided this was not the way to go. Writing documentation is never fun, but this was really tedious. So, again, I decided to look for an alternative. This time with ease of use as a top priority. I want to be able to easily write documentation, and anyone else wanting to help should be able to do so without a big learning curve.</p>
<p>The best experiences I had with documenting were with wikis. They allow for easy editting, can be customized and allow for collaborative editing. The biggest drawback of wikis is lack of structure, the pages are only loosely related. In most wikis you need to find your own way by searching for keywords or following links in the content. But besides from that issue, a wiki would be good solution.</p>
<p>So I started with a basic mediawiki install. Added a skin, and some settings to disable several unwanted options. Next, I needed to add some structure for creating a manual. I found the <a title="hierarchy extension" href="http://www.mediawiki.org/wiki/Extension:Hierarchy" target="_blank">hierarchy extension</a> that seemed ok. It did require some fiddling to get working, but nothing too complicated.<br />
After some testing I found several issues, but I knew I could get it working. With some customization I&#8217;ve created a solution that I&#8217;m very happy with. You can see the index on the <a title="wiki homepage" href="http://wiki.solarium-project.org/index.php/Main_Page" target="_blank">wiki homepage</a> and navigation on all manual pages, like <a title="this one" href="http://wiki.solarium-project.org/index.php/Query_flow" target="_blank">this one</a>.<br />
I think it&#8217;s the best of both worlds: I have a structured manual that is easy to maintain, and still have the freedom of a wiki for all other pages.</p>
<p>I did add some other small options to the wiki. The <a title="GitHub extension" href="http://www.mediawiki.org/wiki/Extension:GitHub" target="_blank">GitHub extension</a> was installed (and slightly modified to support inclusion of a specific gist file) to included Gists, this works much nicer than editing inline code examples. And I created a template for API doc integration. Finally I added the <a title="Ambox" href="http://en.wikipedia.org/wiki/Template:Ambox" target="_blank">ambox template</a> for nice alerts in the manual.</p>
<p>With the wiki in place the writing of documentation speeded up. For the first time I actually enjoyed writing docs and while writing docs I also came across several issues in the code that I didn&#8217;t notice before, similar to how writing unittests can point to design issues. Writing documentation forces you to describe the working of your code in detail, exposing any unlogical parts.</p>
<p>While it still needs more work I think the current result is quite nice, you can see it at <a title="Solarium Project Wiki" href="http://wiki.solarium-project.org/index.php/Main_Page" target="_blank">http://wiki.solarium-project.org/</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.raspberry.nl/2011/04/12/documentation/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>What features would you like to get in Solarium?</title>
		<link>http://www.raspberry.nl/2011/04/08/what-features-would-you-like-to-get-in-solarium/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=what-features-would-you-like-to-get-in-solarium</link>
		<comments>http://www.raspberry.nl/2011/04/08/what-features-would-you-like-to-get-in-solarium/#comments</comments>
		<pubDate>Fri, 08 Apr 2011 08:09:22 +0000</pubDate>
		<dc:creator>basdenooijer</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[Solarium]]></category>
		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://blog.raspberry.nl/?p=227</guid>
		<description><![CDATA[Solarium is quite a young project, and there are still a lot of features to add. The project has been gaining some interest recently and I would really like to know which features are most wanted. So, I&#8217;ve created a poll. The most requested features will be placed at the top of the roadmap. [I'm [...]]]></description>
			<content:encoded><![CDATA[<p>Solarium is quite a young project, and there are still a lot of features to add. The project has been gaining some interest recently and I would really like to know which features are most wanted.</p>
<p>So, I&#8217;ve created a poll. The most requested features will be placed at the top of the roadmap.</p>
<p><span id="more-240"></span>[I'm sorry, the poll has been closed by now...]</p>
]]></content:encoded>
			<wfw:commentRss>http://www.raspberry.nl/2011/04/08/what-features-would-you-like-to-get-in-solarium/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Solr update performance</title>
		<link>http://www.raspberry.nl/2011/04/08/solr-update-performance/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=solr-update-performance</link>
		<comments>http://www.raspberry.nl/2011/04/08/solr-update-performance/#comments</comments>
		<pubDate>Fri, 08 Apr 2011 08:06:09 +0000</pubDate>
		<dc:creator>basdenooijer</dc:creator>
				<category><![CDATA[Solarium]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[High performance]]></category>
		<category><![CDATA[solr]]></category>

		<guid isPermaLink="false">http://blog.raspberry.nl/?p=234</guid>
		<description><![CDATA[When I started working with Solr I issued updates just like I was used to do with databases: a single command followed by a commit. Later I discovered this was far from optimal, and started using different update strategies. To demonstrate the differences I&#8217;ve done some simple benchmarks with three different update strategies, and as [...]]]></description>
			<content:encoded><![CDATA[<p>When I started working with Solr I issued updates just like I was used to do with databases: a single command followed by a commit. Later I discovered this was far from optimal, and started using different update strategies.</p>
<p>To demonstrate the differences I&#8217;ve done some simple benchmarks with three different update strategies, and as you will see the performance difference can be huge. I will also give some tips on how to easily optimize the updates in your application.</p>
<p><span id="more-241"></span>The benchmark scripts were built in PHP with <a title="Solarium" href="http://www.solarium-project.org" target="_blank">Solarium</a>. I&#8217;ve left out the setup part in the tests, see the Solarium wiki for more info about using Solarium.<br />
I used two systems on a local network, one running Solr and one running the PHP client. This adds some network latency, so results for a local Solr might vary.<br />
The Solr index has just over 100K documents. Each of the test scripts will add 1000 documents to this index.</p>
<h3>First test: adding and committing each document</h3>
<p>This test commits each single document. A very common scenario. Normally this would be spread out over time, but for the benchmark I do a thousand add/commits in a loop:</p>
<pre class="brush: php; title: ; notranslate">
$start = microtime(true);

for ($id = 8000000; $id &lt; 8001000; $id++) {
    $document = new Solarium_Document_ReadWrite;
    $document-&gt;id = $id;
    $document-&gt;name = 'test';

    $query = new Solarium_Query_Update;
    $query-&gt;addDocument($document);
    $query-&gt;addCommit();
    $client-&gt;update($query);
}

echo round(microtime(true)-$start, 2);
</pre>
<p>Result: 55.74 seconds (18 documents per second)</p>
<h3>Second test: adding each document, single commit</h3>
<p>This test adds the same number of documents, but only commits once. This test is comparable in performance to a Solr setup with the &#8216;autoCommit&#8217; feature (in that case you should not use the commit command)</p>
<pre class="brush: php; title: ; notranslate">
$start = microtime(true);

for ($id = 8001000; $id &lt; 8002000; $id++) {
    $document = new Solarium_Document_ReadWrite;
    $document-&gt;id = $id;
    $document-&gt;name = 'test';

    $query = new Solarium_Query_Update;
    $query-&gt;addDocument($document);
    $client-&gt;update($query);
}

$query = new Solarium_Query_Update;
$query-&gt;addCommit();
$client-&gt;update($query);

echo round(microtime(true)-$start, 2);
</pre>
<p>Result: 12.04 seconds (83 documents per second)</p>
<p>The performance difference is caused by Solr having to do only a single commit instead of a thousand. This test issues the same number of Solr requests (actually +1 for the commit) so network latency should be the same.</p>
<h3>Third test: adding and committing all documents in a single request</h3>
<p>The final test combines the complete update into a single Solr request. While the commands are now issues in a single request, it is still the exact same set of commands as used in the second test.</p>
<pre class="brush: php; title: ; notranslate">
$start = microtime(true);

$query = new Solarium_Query_Update;
for ($id = 8002000; $id &lt; 8003000; $id++) {
    $document = new Solarium_Document_ReadWrite;
    $document-&gt;id = $id;
    $document-&gt;name = 'test';
    $query-&gt;addDocument($document);
}

$query-&gt;addCommit();
$client-&gt;update($query);

echo round(microtime(true)-$start, 2);
</pre>
<p>Result: 0.45 seconds (2220 documents per second)</p>
<p>The performance difference is caused by a lower number of requests. A lower number of requests is more optimal in multiple layers. Most importantly network latency, but less requests also mean less overhead in Solarium and Solr.</p>
<h3>How to optimize updates</h3>
<p>As you can see from the results the performance difference can be huge. The most optimal update strategy for your application will depend on multiple factors. First of all you need to determine how you are going to update:</p>
<ul>
<li>batch updates, e.g. a nightly update cron or other fixed interval</li>
<li>continuous (maybe even concurrent) updates, e.g. based on user input</li>
<li>or maybe a combination of both</li>
</ul>
<p>For batch updates you will probably get the best performance with a solution similar to the third test. For very big updates you might need to break it up into several requests, followed by a single commit. How to implement this depends on the Solr client library you use. As you can see it is quite easy using Solarium.</p>
<p>If you have continuous updates you should prevent issuing commits with each update. This can cause a high number of commits or even concurrent commits.<br />
The easiest solution for this scenario is using the Solr &#8216;autoCommit&#8217; feature. This way you can add documents without worrying about when to commit. You only need to a a single setting to your Solr config and remove commits.</p>
<p><strong>Disclaimer</strong><br />
The results of these benchmarks are influenced by many factors: index size, document size, index schema, update frequency, hardware, network, solr configuration and many more factors. The tests are a worst-case scenario, if you use single update+commits but they are spread out enough it might not be an issue at all.<br />
You should really run your own tests in your own environment with a realistic workload to validate the results.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.raspberry.nl/2011/04/08/solr-update-performance/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Solarium PHP Solr client</title>
		<link>http://www.raspberry.nl/2011/03/09/solarium-php-solr-client/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=solarium-php-solr-client</link>
		<comments>http://www.raspberry.nl/2011/03/09/solarium-php-solr-client/#comments</comments>
		<pubDate>Wed, 09 Mar 2011 10:27:22 +0000</pubDate>
		<dc:creator>basdenooijer</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[Solarium]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[opensource]]></category>
		<category><![CDATA[solarium]]></category>
		<category><![CDATA[solr]]></category>

		<guid isPermaLink="false">http://blog.raspberry.nl/?p=202</guid>
		<description><![CDATA[I&#8217;ve worked on a lot of Solr implementations in PHP applications. There are multiple solutions: manual HTTP requests, the solr-php-client library, custom implementations etcetera. However they all have one issue in common: they only handle the communication with Solr, many other important parts like query building are not covered at all. And the parts that [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve worked on a lot of Solr implementations in PHP applications. There are multiple solutions: manual HTTP requests, the solr-php-client library, custom implementations etcetera. However they all have one issue in common: they only handle the communication with Solr, many other important parts like query building are not covered at all. And the parts that are covered are usually over-simplified.</p>
<p>In my previous post <a title="Integrating Solr with PHP" href="http://www.raspberry.nl/2010/07/integrating-solr-with-php/">Integrating Solr with PHP</a> I did a comparison of several of the available options. Since then I&#8217;ve done more research and started to make notes of all issues I came across and all features I missed. Based on these notes I&#8217;ve started a project that tries to accurately model Solr and go one step beyond the existing solutions.</p>
<p><span id="more-239"></span></p>
<p>This process has taken several months, with lots of refactoring to reach what I think is the optimal model. At first I developed it as a library for my own projects, but I&#8217;ve decided to turn it into an opensource project. The project is called &#8216;Solarium&#8217; and can be found on github:<br />
<a href="https://github.com/basdenooijer/solarium">https://github.com/basdenooijer/solarium</a></p>
<p>Please see the GitHub project wiki for more info about Solarium, the current status and some examples.<br />
I think the Solarium library in it&#8217;s current state is already very useful, but lot&#8217;s of great features are upcoming. I will regularly post updates.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.raspberry.nl/2011/03/09/solarium-php-solr-client/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Testing Solr update XML messages</title>
		<link>http://www.raspberry.nl/2011/01/28/complex-solr-updates/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=complex-solr-updates</link>
		<comments>http://www.raspberry.nl/2011/01/28/complex-solr-updates/#comments</comments>
		<pubDate>Fri, 28 Jan 2011 12:45:38 +0000</pubDate>
		<dc:creator>basdenooijer</dc:creator>
				<category><![CDATA[Solr]]></category>
		<category><![CDATA[solr]]></category>

		<guid isPermaLink="false">http://blog.raspberry.nl/?p=171</guid>
		<description><![CDATA[When updating a Solr index with the DataImportHandler or one of the available Solr clients you don&#8217;t really need to bother with all the details of updates. Most clients just give a simplified &#8220;add&#8221;, &#8220;delete&#8221; and &#8220;commit&#8221; interface to Solr updates, issued as separate commands. For most clients these commands are actually translated into Solr [...]]]></description>
			<content:encoded><![CDATA[<p>When updating a Solr index with the DataImportHandler or one of the available Solr clients you don&#8217;t really need to bother with all the details of updates. Most clients just give a simplified &#8220;add&#8221;, &#8220;delete&#8221; and &#8220;commit&#8221; interface to Solr updates, issued as separate commands.</p>
<p>For most clients these commands are actually translated into Solr XML update messages. Taking a look at the documentation in the Solr wiki (<a href="http://wiki.apache.org/solr/UpdateXmlMessages" target="_blank">http://wiki.apache.org/solr/UpdateXmlMessages</a>) shows all the options available. It also shows that it is possible to combine multiple commands in a single request.<br />
So if you need to do several deletes and add some documents this can all be done in a single request. This got me wondering, how does this actually work? Are all &#8216;commands&#8217; executed in the order of the XML message?</p>
<p><span id="more-238"></span>To test this I have created a very simple Solr index with just two fields:</p>
<ul>
<li>id (int)</li>
<li>name (string)</li>
</ul>
<p>For posting XML update messages I use Curl:</p>
<pre class="brush: plain; title: ; notranslate">curl http://localhost:8983/solr/core0/update? -H &amp;quot;Content-Type: text/xml&amp;quot; --data-binary @data.xml</pre>
<p>This is the content of the data.xml file:</p>
<pre class="brush: xml; title: ; notranslate">
&lt;update&gt;
  &lt;delete&gt;
    &lt;query&gt;*:*&lt;/query&gt;
  &lt;/delete&gt;
  &lt;add&gt;
    &lt;doc&gt;
      &lt;field name=&quot;id&quot;&gt;1&lt;/field&gt;
      &lt;field name=&quot;name&quot;&gt;item 1&lt;/field&gt;
    &lt;/doc&gt;
    &lt;doc&gt;
      &lt;field name=&quot;id&quot;&gt;2&lt;/field&gt;
      &lt;field name=&quot;name&quot;&gt;item 2&lt;/field&gt;
    &lt;/doc&gt;
    &lt;doc&gt;
      &lt;field name=&quot;id&quot;&gt;3&lt;/field&gt;
      &lt;field name=&quot;name&quot;&gt;item 3&lt;/field&gt;
    &lt;/doc&gt;
  &lt;/add&gt;
  &lt;commit/&gt;
&lt;/update&gt;
</pre>
<p>The index can easily be reset by re-posting this XML. The delete all query ensures a complete cleanup.<br />
For testing a complex update I created this XML:</p>
<pre class="brush: xml; title: ; notranslate">
&lt;update&gt;
  &lt;delete&gt;
    &lt;id&gt;3&lt;/id&gt;
  &lt;/delete&gt;
  &lt;commit/&gt;
  &lt;add&gt;
    &lt;doc&gt;
      &lt;field name=&quot;id&quot;&gt;4&lt;/field&gt;
      &lt;field name=&quot;name&quot;&gt;item 4&lt;/field&gt;
    &lt;/doc&gt;
  &lt;/add&gt;
  &lt;rollback/&gt;
  &lt;add&gt;
    &lt;doc&gt;
      &lt;field name=&quot;id&quot;&gt;5&lt;/field&gt;
      &lt;field name=&quot;name&quot;&gt;item 5&lt;/field&gt;
    &lt;/doc&gt;
    &lt;doc&gt;
      &lt;field name=&quot;id&quot;&gt;6&lt;/field&gt;
      &lt;field name=&quot;name&quot;&gt;item 6&lt;/field&gt;
    &lt;/doc&gt;
  &lt;/add&gt;
  &lt;delete&gt;
    &lt;id&gt;1&lt;/id&gt;
  &lt;/delete&gt;
  &lt;commit/&gt;
  &lt;optimize/&gt;
&lt;/update&gt;
</pre>
<p>The expected behaviour is:</p>
<ol>
<li>we start with [1,2,3] in the index.</li>
<li>we delete document 3 and commit, so we now have [1,2]</li>
<li>add document 4 and rollback, so no change: [1,2]</li>
<li>add document 5 and 6, delete 1 and commit+optimize: [2, 5, 6]</li>
</ol>
<p>The outcome: just as expected! After posting the update the index contains document 2, 5 and 6. So Solr does handle the commands in the update XML message in the exact given order.</p>
<p><em>An important note: this example was made to test Solr. In most cases where the order of the commands matters something is wrong. In this test the order only matters because of the add and rollback of document 4. But in that case document 4 shouldn&#8217;t be in the update anyway. And normally you should only commit once, below all add/delete commands.<br />
</em></p>
<p>The other parts of this test are more useful. Like using Curl for testing XML update messages and adding multiple commands to a single update message. The most obvious example is adding multiple documents in a single update, and most clients support this. But combining multiple delete queries or even combining delete queries with adding of documents is not supported by most clients, while it can save a lot of requests.<br />
There is a downside though, if any command fails everything below it will not be executed.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.raspberry.nl/2011/01/28/complex-solr-updates/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

