<?xml version="1.0" encoding="ISO-8859-1" ?>
    <rss version="2.0">
    <channel>
    <title>SETI@home</title>
    <link>http://setiathome.berkeley.edu/</link>
    <description>BOINC project SETI@home: Technical News</description>
    <copyright>University of California</copyright>
    <lastBuildDate>Sat, 07 Nov 2009 18:50:10 GMT</lastBuildDate>
    <language>en-us</language>
    <image>
        <url>http://setiathome.berkeley.edu/rss_image.gif</url>
        <title>SETI@home</title>
        <link>http://setiathome.berkeley.edu/</link>
    </image>
<item>
            <title>Technical News 5 Nov 2009 22:53:58 UTC</title>
            <link>http://setiathome.berkeley.edu/tech_news.php#134</link>
            <guid isPermaLink="true">http://setiathome.berkeley.edu/tech_news.php#134</guid>
            <description>Eeeeoooo. Looks like this minor corruption in the science database is really snagging us, at least right now. We're talking one or two rows of the zillions in the astropulse signal table - but informix isn't being very informative about which row or two, nor what to do about it. Meanwhile, this broke the replication of astropulse - or at least we think it broke replication. This may very well have failed for some other reason.

This hasn't been a public data flow issue - we can still split/assimilate multibeam and astropulse work for the most part. Still, it's been preventing us from doing any science for a while now. So it's roll-up-our-sleeves time. We're doing a more robust table check (and hopefully repair) overnight tonight, and had to shut off astropulse splitting for now. Which means only multibeam workunits for the near term.

Meanwhile we filled up the raw data drive during all this software blanking analysis. I forgot to carry the one or something. Anyway, no big deal, some minor cleanup this morning, and we're back on track with that.

- Matt
</description>
            <pubDate>Thu, 05 Nov 2009 22:53:58 GMT</pubDate>
            </item>
        <item>
            <title>Technical News 4 Nov 2009 23:28:41 UTC</title>
            <link>http://setiathome.berkeley.edu/tech_news.php#133</link>
            <guid isPermaLink="true">http://setiathome.berkeley.edu/tech_news.php#133</guid>
            <description>Our internal file server ptolemy crashed again early this morning and Eric had it rebooted by the time I got in. This is getting to be more than a minor concern. We're going to start collecting kernel crash dumps so we can at least get a clue what's wrong if this happens again.

Informix tweaking continues. Some page corruption did get uncovered during the last science database backup, probably due to the RAID hiccup last week. Not a big deal, but that's just another thing on the list of &quot;maybe that's the problem&quot; when trying to get the database to do anything outside of the usual splitting/assimilating.

Meanwhile, version 2 of the raw data pipeline is getting more and more automated - you'll should see a few more files appear on the to-split queue throughout the evening without any intervention from me.

- Matt
</description>
            <pubDate>Wed, 04 Nov 2009 23:28:41 GMT</pubDate>
            </item>
        <item>
            <title>Technical News 3 Nov 2009 22:13:35 UTC</title>
            <link>http://setiathome.berkeley.edu/tech_news.php#132</link>
            <guid isPermaLink="true">http://setiathome.berkeley.edu/tech_news.php#132</guid>
            <description>Tuesday is our outage/maintenance day. This was the first database compression/backup using the solid state drives on mork for the innodb logs - there are a lot of variables at play (like the result table only being 80% the size it was last week), but at first glance it seems like that alone shaved quite a bit off the compression time. Cool. Bob also tweaked another informix parameter, bounced the science database, did some table checks, etc. - maybe this will improve our science database performance (which has been strangely prone to &quot;locking up&quot; as of late). Or maybe not (after restarting the project we still had some queries lock everything up - some work still to be done, I guess).

I also got a couple scripts in order such that I'm getting on top of the data pipeline again. Hopefully we won't run out of workunits again as badly as this past weekend.

Just got back from a meeting discussing the university's current furlough plan - yeah, due to state budget cuts we are being forced to take days off - a kind, gentle way of enacting pay cuts, but not pay cuts really in our case - since we aren't paid by state funds (it's all donations) we are only being forced to take days off for &quot;parity&quot; but SETI still gets to keep its funds. Fair enough, as I understand we're all swimming in the same bowl of soup and belts are being tightened all around. And I already take several days off a month without pay, so in my particular case it's a complete wash.

- Matt
</description>
            <pubDate>Tue, 03 Nov 2009 22:13:35 GMT</pubDate>
            </item>
        <item>
            <title>Technical News 2 Nov 2009 22:57:50 UTC</title>
            <link>http://setiathome.berkeley.edu/tech_news.php#131</link>
            <guid isPermaLink="true">http://setiathome.berkeley.edu/tech_news.php#131</guid>
            <description>In case you haven't noticed, we've been low on workunits. As warned in several previous tech news items (and now on the front page) we're still in the process of converting our data pipeline to use the new radar blanking suite (to vastly reduce noise/interference). This conversion process has been slowed by several factors, including these two: it takes a long time to bring up old data from our archives (approximately 4 hours per 50 GB file), and it turns out a lot of these files contain garbage that make it impossible to process (which we can only discover after spending the time to bring the files up here). We are also low of current data because ALFA has been offline for a month due to maintenance.

In better news, ALFA is back up and we're collecting new data again. As well I moved the &quot;testing phase&quot; version of the data pipeline onto the main production data file server, which should generally help as we'll at least speed up disk i/o. Also our assimilator queue finally drained to zero again. I see that people are complaining about lack of work on various threads. We don't guarantee a steady stream of work, but do understand that such a steady stream is important for maintaining public interest. We're doing what we can. I'm getting another file on line as I type this - should be splittable (I hope) sometime this evening.

Our science database server (thumper) lost another disk over the weekend. No big deal, and the RAID recovered with a spare just fine - but nevertheless this is just another reminder that we really need to reconfigure the disk arrays on that system - they are unwieldy and inefficient. 

- Matt
</description>
            <pubDate>Mon, 02 Nov 2009 22:57:50 GMT</pubDate>
            </item>
        <item>
            <title>Technical News 29 Oct 2009 21:35:05 UTC</title>
            <link>http://setiathome.berkeley.edu/tech_news.php#130</link>
            <guid isPermaLink="true">http://setiathome.berkeley.edu/tech_news.php#130</guid>
            <description>As predicted the data well temporarily ran dry overnight, but I'm trying my best to keep up with demand today (and set it up for over the weekend).

Weird thing today - I've been noticing intermittent problems connecting to the science database to make the most trivial queries. We thought this, and the assimilator queue backing up, were probably due to Bob's recent configuration changes to the informix database engine perhaps not helping so much. But then I noticed one of assimilators was inserting thousands and thousands of signals as fast as it possibly could from a single result file... since 7:40am yesterday morning!

This is not normal. Result files usually contain a handful of signals, maybe a few dozen tops. If they reach 30K in size they are automatically &quot;cut off&quot; and sent back to us. I tracked down the result file with all the signals - it was 1.6 gigabytes in size! Not sure how this happened, nor how it passed validation (though I have my theories), but it sure contained a lot of signals repeated over and over and over again. I moved that out of the way and hopefully that'll improve performance in general around here.

- Matt</description>
            <pubDate>Thu, 29 Oct 2009 21:35:05 GMT</pubDate>
            </item>
        <item>
            <title>Technical News 28 Oct 2009 22:43:56 UTC</title>
            <link>http://setiathome.berkeley.edu/tech_news.php#129</link>
            <guid isPermaLink="true">http://setiathome.berkeley.edu/tech_news.php#129</guid>
            <description>Jeff is back in town and back in action here at the lab. He's now working on the NTPCkr/RFI stuff (which has been languishing due to lack of effort and the science database throughput woes which I've been alluding to lately).

As predicted, I did finally get the astropulse version of the splitter to compile (just some library/linking bugs that had to be hunted down and exterminated). So astropulse workunits using the software radar blanking system are going out! Meanwhile, I hit some more management snags with the multibeam stuff - I'm trying to blank/split really old files which we recorded before we had all the kinks worked out. Long story short, some files I spent a lot of time (days) pulling up from our archives and doing the first stages of radar analysis are unsplittable. Darn. I was hoping to just get beyond the dearth of data in the nick of time, but it looks like I got to pull more files up from the archives, and we'll run a bit dry before they are splittable.

Today is a particularly windy day, which means it's fairly clear. Here's a picture taken from my iPhone looking out from the lab patio onto the Bay. That the Lawrence Hall of science directly below me, then downtown Berkeley, then the Bay itself, then San Francisco, the Golden Gate Bridge, and the Marin Headlands in the distance. The detail isn't so great, so you can't see that the Bay Bridge is completely devoid of cars right now (it's shut down due to technically difficulties), which is quite rare and quite odd.



- Matt
</description>
            <pubDate>Wed, 28 Oct 2009 22:43:56 GMT</pubDate>
            </item>
        <item>
            <title>Technical News 27 Oct 2009 21:59:49 UTC</title>
            <link>http://setiathome.berkeley.edu/tech_news.php#128</link>
            <guid isPermaLink="true">http://setiathome.berkeley.edu/tech_news.php#128</guid>
            <description>As many of you already know, Tuesday is the regular outage day where we dry clean the mysql database and pack it down tight. We're recovering from it now. Today I also did some testing of the newly employed solid state RAID 1 on mork (the master mysql server). It seemed fine, so this device now holds the mysql/innodb logical logs, thus resulting in far less competing writes with the data RAID 10 (where the logs used to be kept). Will this help much? I dunno. A non-zero amount at least.

I'm still assembling the new data pipeline. Got a few files in the queue now for multibeam analysis, but I can't seem to get a new astropulse splitter to compile. I need to recompile so that it reads the software radar blanking bit instead of the hardware one, but I'm hitting some library/include issues. Sigh. One of those problem you know you'll get working eventually but right now the path isn't exactly clear, and everything will be annoying until you finally get a successful &quot;make.&quot;

- Matt
</description>
            <pubDate>Tue, 27 Oct 2009 21:59:49 GMT</pubDate>
            </item>
        <item>
            <title>Technical News 26 Oct 2009 21:51:34 UTC</title>
            <link>http://setiathome.berkeley.edu/tech_news.php#127</link>
            <guid isPermaLink="true">http://setiathome.berkeley.edu/tech_news.php#127</guid>
            <description>Okay, so where are we... Over the weekend the raw data queue shrunk down pretty far, but don't fear. Astropulse ran out of work to do, and multibeam has maybe another day or two, tops. Meanwhile I'm working behind the scenes actually splitting a bunch of software-radar-blanked data from 2006. This is actually going out now to people, but just doesn't show up on the server status pages. I'd have to do some minor hacking to get these files to show up on that page, but that'll be moot fairly soon as all data will be software-radar-blanked and I'll just point the script to look in the new data directory (as opposed to having it look through two directories and figure out the combined status of everything).

Anyway, there's that. We might run a little dry over the next few days as I'm still scraping together disk/memory resources to get these old files pulled up from the archives, analysed, and embedded with the new blanking signal. Only then can these files be split into workunits. I'm working on it.

Meanwhile, we're still having sporadic problems with informix locking up on us. It's getting to be really frustrating, as you don't really notice anything is wrong until the workunit queue runs dry or something like that. The idea of migrating to another database engine is on the table again. Also, bruno was having some nagging mount issues so I just now rebooted it. You may have noticed the whole project disappearing for a half hour there. That was me.

Rumor has it Jeff is back in town. He was away for several weeks hiking in the Himalayas. I imagine he has jet lag and other kinds of recovery to deal with, and he'll appear maybe later this week.

- Matt
</description>
            <pubDate>Mon, 26 Oct 2009 21:51:34 GMT</pubDate>
            </item>
        <item>
            <title>Technical News 22 Oct 2009 20:52:51 UTC</title>
            <link>http://setiathome.berkeley.edu/tech_news.php#126</link>
            <guid isPermaLink="true">http://setiathome.berkeley.edu/tech_news.php#126</guid>
            <description>Eeewww. Last night ptolemy (an internal-use file server) crashed. Eric rebooted it this morning, and I still had a bunch of cleanup to do after that which took me until just about now. Other systems had to be rebooted, nfs/autofs daemons kicked, stale trigger files removed, etc. I also bounced informix as it seems like the science database was locked, but this happened to be two different coincident problems, one affecting the splitters, one affected the assimilators, and making it seem like both were hanging on the science database.

The latter problem was a real nuisance. I had to reboot vader, mess around with iptables/network configs, /etc/exports, etc. all of which seemed to do nothing. The problem was that vader couldn't mount the result storage device (which is exported from bruno) while all other systems had no trouble mounting it. I never figured out the exact problem, but yum'ing in the latest nfs-utils package seemed to massage the right muscle and suddenly it was visible on vader. Fine. Everything is sort of catching up now. Bob also got the mysql replica in working order again, so that's good.

Hopefully this isn't a sign that ptolemy is on its way out... Ugh.

- Matt
</description>
            <pubDate>Thu, 22 Oct 2009 20:52:51 GMT</pubDate>
            </item>
        
    </channel>
    </rss>
