sdKiBi’s bloghttp://mraw.org/blog/tags/sd/KiBi’s blogikiwiki2013-12-17T02:11:28ZSD: Bugzilla gets history supporthttp://mraw.org/blog/2010/09/10/SD_bugzilla_gets_history_support/2013-12-17T02:11:28Z2010-09-09T23:55:00Z
<p><em>(Disclaimer: Possibly one of my longest post ever, you may want to
scroll to the bottom and only look at the pictures.)</em></p>
<p>So, what’s new in SD since
<a href="http://mraw.org/blog/2010/09/01/SD_Travel_with_your_bugs/">last week</a>?</p>
<h2>Fetching everything</h2>
<p>From <a href="http://bugs.freedesktop.org/">http://bugs.freedesktop.org/</a>, it was possible to download all
the bugs I reported there (that means 2…), but trying to download
those reported by Julien led to a crash without any trace. Dichotomy
FTW, I came up with a bunch of bugs which would lead to the same
result, which confirmed that the amount of bugs processed at once
wasn’t the issue.</p>
<p>Playing around with <code>strace</code>, it appeared that exchanges with the remote
server were apparently fine, so an issue on the client-side SOAP
machinery got suspected. So let’s enable tracing:</p>
<pre><code>use SOAP::Lite +trace => 'all';
</code></pre>
<p>Tada! The issue was indeed local: malformed XML got received, leading
to a <code>die</code> call issued from the XML-RPC layer. Thankfully, one can set
up a fault handler, which I used to display the range of bug IDs which
triggered that bug, so that people can look into it and determine what
to add to the blacklist until the underlying issue’s been investigated
(presumably, bugzilla’s at fault).</p>
<h2>Speeding things up</h2>
<p>With that blacklisting of buggy bugs, one can then try to move to
other queries, dealing with more bugs. Some examples follow:</p>
<pre><code>reporter=kibi@d.o
product=xorg&component=Driver/VMWare
product=xorg&component=Driver/nouveau
product=xorg&component=Server/general
product=xorg&component=Driver/intel
</code></pre>
<p>Moving from 2 bugs to 10-100 bugs (Julien’s or VMWare’s) was OK. But
then moving to several hundreds of bugs (<code>Driver/nouveau</code> = 500+ bugs) led to
noticeable performance issues. Not to mention what happened when one
reaches several thousands of bugs (<code>Driver/intel</code> = 2500+ bugs). Indeed, even
with network exchanges cached into a local file, processing data was
taking up to several dozens of minutes.</p>
<p>I knew about Perl’s <code>-d:DProf</code>, which helps figuring out where time is
spent, but was pointed to <code>-d:NYTProf</code> (and its accompanying tool,
<code>nytprofhtml</code>). Some hotspots got noticed:</p>
<ul>
<li>There’s a huge pile of stuff relying on UUIDs heavily, and a cache
is going to be introduced to avoid later calls once a value’s been
computed once. That’s going to benefit all replica types, not just
bugzilla.</li>
<li>I didn’t care much about date/time at the beginning, but that
turned out to be a very bad idea: since the format returned by
bugzilla wasn’t matching a “well-known” format, time was spent in
the <code>DateTime::Format::Natural</code> fallback, leading to a big
performance penalty. Fixed with a trivial regular expression.</li>
</ul>
<p>Things got better, but not good enough. There are several
<a href="http://syncwith.us/"><code>Prophet</code></a> (the engine under the hood) backends,
so one can play with:</p>
<pre><code>PROPHET_REPLICA_TYPE=sqlite # the default for SD
PROPHET_REPLICA_TYPE=prophet
</code></pre>
<p>Switching to <code>prophet</code> was a big win, but still not good
enough. Indeed, many tiny files are written, and most of the time is
spent in I/O. Although I’m nothing like a performance guru, I guessed
that running on an average laptop, with <code>ext3</code> and its default commit
interval of 5 seconds might not be helping, so I gave a quick try to
<code>-o remount,commit=60</code>, and that seemed to help.</p>
<p>Even though there are probably other tricks to find in that area
(which hopefully won’t require <code>root</code> privileges…), there’s already a
patch which landed in <code>prophet</code>’s <code>master</code> branch, replacing
<code>File::Spec->catfile</code> with an optimized version: that function alone
was eating 10% of runtime…</p>
<p>In the <code>sqlite</code> case, disabling the auto-commit feature helped
reducing the I/O load, but a proper patch is still lacking for now
(running into locked database issues, or into missing tables after
having created them isn’t fun, so I postponed debugging that).</p>
<p>Since performance issues looked like they could be solved eventually,
I switched back to implementing missing features.</p>
<h2>Handling more than comments</h2>
<p>Currently 3 types of stuff are currently fetched from the bugzilla
server:</p>
<ul>
<li>Bug status: plenty of properties.</li>
<li>Bug comments: comments that are linked to bugs.</li>
<li>Bug history: changes that impacted bugs.</li>
</ul>
<p>(Yes, that means that attachments are totally ignored for now.)</p>
<p>Until now, only bug comments were considered. The first comment was
used to determine a pseudo-title (using its first line), the reporter,
and the creation date. This approach was chosen to try and get a basic
sync working quickly, so as to get:</p>
<ul>
<li>A list of bugs matching the query.</li>
<li>All comments for each of these bugs.</li>
</ul>
<p>Now, the algorithm is the following: from the bug status, determine a
set of properties of interest; then walk the history backwards, and
update the properties incrementally until the (presumed) “initial
state” is reached. Then create the ticket using this “initial
state”. Adding the incremental property changes to that initial
ticket makes it possible to represent the bug’s life as a list of
<code>Prophet::ChangeSet</code> objects.</p>
<p>That’s where the fun begins, since properties in the bug status may
not match properties in the bug history, so one needs to establish
property correspondence. Also, some properties can be multivalued for
added fun. I believe that’s where most of the time is going to be
spent while developing a new replica type in SD: once one knows how to
get a hand on needed info on the remote server, the main question is
what to do with it. For now, I decided to ignore many fields to make
it possible to do a “big” sync like <code>Server/general</code>, property support
will be improved later on.</p>
<h2>Screenshots</h2>
<p>Instead of pasting lengthy terminal excerpt, let’s use some
screenshots instead (sorry, I’m not sure how to present such things in
an accessible way, suggestions welcome).</p>
<p>Cloning Julien’s bugs, listing all <code>open</code> tickets, listing <em>all</em>
tickets, searching using a regular expression:</p>
<p><a href="http://mraw.org/blog/2010/09/10/sd-1.png"><img src="http://mraw.org/blog/2010/09/10/sd-1.png" width="497" height="743" alt="Cloning, listing, searching" class="img" /></a></p>
<p>Displaying bug <code>42</code> (that’s the local ID):</p>
<p><a href="http://mraw.org/blog/2010/09/10/sd-2.png"><img src="http://mraw.org/blog/2010/09/10/sd-2.png" width="497" height="743" alt="Displaying" class="img" /></a></p>
<p>Now, let’s start the embedded web server through <code>sd server --port
1234</code> and point the browser there.</p>
<p>List of <code>RESOLVED</code> bugs:</p>
<p><a href="http://mraw.org/blog/2010/09/10/sd-3.png"><img src="http://mraw.org/blog/2010/09/10/sd-3.png" width="640" height="743" alt="List of RESOLVED bugs" class="img" /></a></p>
<p>Status and comments for bug <code>42</code>:</p>
<p><a href="http://mraw.org/blog/2010/09/10/sd-4.png"><img src="http://mraw.org/blog/2010/09/10/sd-4.png" width="640" height="743" alt="Status and comments for bug 42" class="img" /></a></p>
<p>History for bug <code>42</code>:</p>
<p><a href="http://mraw.org/blog/2010/09/10/sd-5.png"><img src="http://mraw.org/blog/2010/09/10/sd-5.png" width="640" height="743" alt="History for bug 42" class="img" /></a></p>
<p>Compared to the
<a href="http://bugs.freedesktop.org/show_bug.cgi?id=9697">original bugzilla page</a>:</p>
<p><a href="http://mraw.org/blog/2010/09/10/sd-6.png"><img src="http://mraw.org/blog/2010/09/10/sd-6.png" width="640" height="743" alt="Same bug on FreeDesktop.org" class="img" /></a></p>
<h2>Next time</h2>
<p>Some items which need work:</p>
<ul>
<li>Tweak properties to address the issues raised above.</li>
<li>Start fetching attachments as well.</li>
<li>Support further syncs. Currently, a big sync is done once, and
there’s no way to tell <code>sd</code> to sync new changes since last time, if
any. This will probably lead to rewriting how fetching is currently
done, which is: discover all bugs, then fetch all comments and all
history items, for all of them. Properties like <code>last_change_time</code>
will probably be of some help here.</li>
<li>Have a look at what happens with other bugzilla instances, like
<a href="http://bugzilla.gnome.org/">Gnome’s</a>.</li>
</ul>
SD: Travel with your bugshttp://mraw.org/blog/2010/09/01/SD_Travel_with_your_bugs/2013-12-17T02:11:28Z2010-09-01T11:40:00Z
<p><em>(For Those Who Care About An Introduction:
<a href="http://spang.cc/">Christine Spang</a> gave a
<a href="http://penta.debconf.org/dc10_schedule/events/591.en.html">talk during DebConf10</a>
about <a href="http://syncwith.us/sd/">Simple Defects (SD)</a>, and
<a href="http://blog.spang.cc/posts/DebConf_10_postmortem_and_SD_talk_followup/">blogged about it</a>
later on.)</em></p>
<p>Folks maintaining Debian packages are already able to partially-clone
<code>bugs.debian.org</code>’s bug database thanks to the
<a href="http://packages.debian.org/debbugs-local">local-debbugs</a> tool. But
what about upstream’s bug tracker? Taking a (shamelessly
self-centered) example: <code>X.Org</code> packages are hosted on
<code>FreeDesktop.org</code>’s bugzilla. Thanks to SD, it’s possible to fetch
bugs from there as well! Here’s the obligatory picture:</p>
<p><a href="http://mraw.org/blog/2010/09/01/sd-example.png"><img src="http://mraw.org/blog/2010/09/01/sd-example.png" width="500" height="304" alt="SD example" class="img" /></a></p>
<p>This means that you can browse/search them locally while being offline
(or well-connected, but without having to use that !$\§%$^ bugzilla
web interface). Many of the replica types support both reading and
writing, meaning you can also queue some changes locally, and push
them later. Currently, <code>sd help sync</code> says that read-write support is
available for RT, Hiveminder, Trac, Google Code, and GitHub. There’s
also read-only support for redmine. Debbugs is being worked on, see
Christine’s
<a href="http://blog.spang.cc/posts/DebConf_10_postmortem_and_SD_talk_followup/">blog post about her SD talk</a>
for more info.</p>
<p>Given there was no support for bugzilla, I had a quick look and
<a href="http://lists.bestpractical.com/pipermail/sd/2010-August/000024.html">reported my findings</a>. The
main point being: <code>\o/ Bugzilla’s XMLRPC \o/</code></p>
<p>A little while later (I’m not exactly fluent in Perl…), I came up with
a tentatively-mergeable
<a href="http://lists.bestpractical.com/pipermail/sd/2010-August/000040.html">branch adding preliminary read-only support for bugzilla</a>. There’s
still a lot of work, but I’m trying to work on it on a regular basis,
adding support for more properties, and fixing bugs (tests should be
written some day, too).</p>