Tuesday, December 18, 2012

Stupid MySQL Help and Parser

Just stumbled upon this:

mysql> help CAST
Name: 'CAST'
Description:
Syntax:
CAST(expr AS type)

The CAST() function takes a value of one type and produce a value of
another type, similar to CONVERT(). See the description of CONVERT()
for more information.

URL: http://dev.mysql.com/doc/refman/5.1/en/cast-functions.html

mysql> select CAST ('2012-01-01 12:00:00' AS DATETIME);
ERROR 1305 (42000): FUNCTION CAST does not exist
mysql> select CAST('2012-01-01 12:00:00' AS DATETIME);
+-----------------------------------------+
| CAST('2012-01-01 12:00:00' AS DATETIME) |
+-----------------------------------------+
| 2012-01-01 12:00:00                     |
+-----------------------------------------+
1 row in set (0,00 sec)

I.e. the help claims that CAST is a function, but if you call it with a stray space after the name, it does not exist.

Saturday, October 6, 2012

Monkey-patching bash functions

Disclaimer: the whole post is a big bad hack.

Suppose that there is a big regularly-updated library of bash functions, and your script sources it or is sourced by it. One of these functions is not exactly right for your purpose (e.g. it contains a bug or misses a feature), but fixing the bug or adding the feature there is not practical. This might happen if the file containing the function library is managed by a package manager, and your bugfix will be overwritten at the next package update.

A straightforward sledgehammer-like solution is to make a copy of the library, fix the bug there, and source or be sourced by your modified copy (thus losing all future updates). This is not good.

If the offending function is called by your script directly, then, of course, you can define a differently-named function that is otherwise-identical to the original one, but has the needed fix, directly in your script, and use it. However, this approach does not work (or, rather, requires you to duplicate the whole call chain) if your script calls the offending function only indirectly.

A possibly-better solution (that may or may not work) is to redefine the offending function in your script. Indeed, out of many possibly existing definitions for a function bash uses the last one it encountered. Here is an interactive example of such overloading:

$ barf() { echo "123" ; }
$ barf
123
$ barf() { echo "456" ; }
$ barf
456

So, now you know how to completely replace a function. But, what if only a small change is required? E.g., if one command is missing at the end? This is also solvable thanks to an introspection feature of bash. I am talking about the "type" builtin. Here is what it does when applied to a function:


$ type barf
barf is a function
barf ()
{
    echo "456"
}

So, you have one line of a meaningless header and then the full slightly-reformatted source of the function on standard output. Let's grab this into a variable:

$ def=$( type barf )

You can then post-process it. E.g., let's transform this into a definition of a function that does exactly the same but also prints "789" at the end. The easiest way to do that is to remove the first line (the header) and insert echo "789" before the last line. Uhm, it is not easy to remove a line from a variable in pure bash... no problem, we'll comment it out instead!

$ def="# $( type barf )"
$ echo "$def"
# barf is a function
barf ()
{
    echo "456"
}

And now remove the last character (a closing brace) and replace it with the correct code:

$ def="${def%\}}echo \"789\" ; }"
$ echo "$def"
# barf is a function
barf ()
{
    echo "456"
echo "789" ; }

All that remains is to feed the contents of this variable back to bash as a piece of code. That's what eval does:

$ eval "$def"

Now the function looks correct and does what is intended:

$ type barf
barf is a function
barf ()
{
    echo "456";
    echo "789"
}
$ barf
456
789

Monday, August 27, 2012

SafeDNS.com is now open for registration

The company I work for decided to establish its presence in USA. So, please welcome a new player in the international market of content filtering solutions: SafeDNS. Strictly speaking, we are still in beta and need YOU to help us kill the last bugs.

This service can be useful if you want to protect yourself or your kids against accidentally opening sites with unsuitable content. Or, to prevent your employees from wasting time at work on such things as social networking and videos. Or even to evade a bad filter set up by your ISP :)

We have more than 4 mln sites sorted into more than 50 categories, and it's you who decides what to block and what to let through. All you need is an e-mail address, a public static IPv4 address on your router and the ability to change DNS settings on your computers.

Register now (it's free!), read the guide, and help us improve the service by sending feedback.

Saturday, July 28, 2012

On the negative side of duties

Yesterday Lennart Poettering presented a talk with the title "The UI and the OS". The main idea of the talk was simplification of the architecture of the whole OS. On one of the slides, there was approximately this seemingly-obvious text: "The distributions' job is to distribute your software".

This phrase (while completely appropriate for the slide in question) has reminded me one of the first lessons on philosophy that I received from G. V. Boldygin in the Urals State University while studying there. On that lesson, we discussed the role of science (including theoretical science) in the society. The obvious idea immediately proposed: theoretical science leads to technological progress by telling what to do that nobody else did before. The surprising fact given to us as students was that there were significant practical technological advances (achieved by trial-and-error) even before theorecical science was formed. The conclusion was that it is almost always incorrect to refer to only one side of something's role of duties (i.e. to something that is done, as opposed to something that is prevented, or vice versa). Indeed, we were told that science not only provides technological progress directly, but also saves us from wasting money and time on certain "scientifically impossible" projects such as trying to build a perpetual motion machine.

So, let's apply this lesson to refute Lennart's phrase: distributions exist not only to distribute your software. Indeed, as illustrated by the Windows world, one doesn't need any help for this. The other side of their duties is that they reject bad (in their opinion) software by not distributing it, thus creating an anchor of trust for software they do provide to users.

Thursday, July 26, 2012

GUADEC

I am going to GUADEC 2012. In fact, I have already arrived, and will soon listen to a keynote: "The Tor Project: Anonymity online". Somewhat scary, given that I work for a company that implements a (completely opt-in) web content filtering solution.

Ironically, we have some common not-yet implemented desktop intergation requirements: the ability to force the use of a particular DNS server across all connections, or to say "Use this VPN or refuse to send any packet to the Internet".


Sunday, April 29, 2012

Verifying the Corresponding Source

(This post is a troll, don't take it too seriously!)

GNU GPL and some other software licenses require you to make the source available if you publish the covered software in the binary form. At first, this seems to be a very simple requirement. However, theoretically, as I will show below, it can create problems for you even if you comply. What matters for this post is whether you can prove the compliance, and this turns out to be unexpectedly hard. Please consider this (theoretical so far) danger when deciding to publish your own work under GPL or similar licenses.

Let's take Arch Linux as an imaginary victim. They distribute a lot of software in the binary form, and some of the software is covered by GNU GPL. Let's take GNU Bison as a simple example. At the time of this writing, the version of Bison distributed by Arch Linux is 2.5-3. Here is the x86-64 package, here are the purported machine-readable build instructions and patches, and the source can be fetched from gnu.org. Now imagine that someone makes a (false in this case) claim that the source, build instructions and patches do not correspond to the published binary package. How can this claim be refuted?

The obvious idea would be to rebuild the binary package from sources and compare its contents to the published binary package. For simplicity, let's limit this comparison to the /usr/bin/bison binary. Unfortunately, this simple idea fails. The published binary is 377456 bytes long, while my attempt to rebuild it in an up-to-date Arch resulted in a different length:

$ makepkg
<lots of output snipped>
$ tar tvf bison-2.5-3-x86_64.pkg.tar.xz | grep usr/bin/bison
-rwxr-xr-x root/root    377488 2012-04-29 17:41 usr/bin/bison

This mismatch means that the content of the resulting binary depends not only on what is distributed as the Corresponding Source, and that something has changed.

Of course, one of the changed factors is the compiler version. Different compilers implement different optimizations, and thus generate different code. The original bison binary in the original package is dated by November 9th, 2011. So, to remove this factor, we need the same compiler as was available on that day. Fortunately, Arch has a Rollback Machine that has every version of every package back to 2008. So, the next attempt to produce an identical binary: download and install gcc from the same date. This downgrade fails to produce a working gcc, because the run-time dependencies of gcc have to be downgraded too. This means cloog, gcc-libs, isl and ppl. Surprise: the downgraded gcc still doesn't produce the bison binary identical to the official one!

The other factor is that not only object code from bison's *.o files goes into the resulting bison binary. The linker also inserts code from /usr/lib/libc_nonshared.a which belongs to glibc. So, in order to cancel this factor, one would need to downgrade glibc, which is impossible to do safely: on today's Arch Linux there is a lot of software, including Bash, that depends on glibc >= 2.15.

The solution is to install another Arch Linux system into an initially-empty chroot, using the Rollback Machine instead of a mirror. I.e. with this line in /etc/pacman.d/mirrorlist:

Server = http://arm.konnichi.com/2011/11/09/$repo/os/$arch

The system installed this way produced the bison binary identical to the official one (MD5 sums matched).

So, in the end, I was able to prove that Arch Linux indeed distributes the source that corresponds to their bison binary. However, I would not have succeeded if they didn't have their Rollback Machine — i.e. essentially a way to reconstruct the whole build environment, far beyond what GPL seemingly requires.

Tuesday, February 21, 2012

Recovering a bricked D-Link router

Today my D-Link DSL-2600U ADSL/Wi-Fi router decided to die, without any obvious reason. Only the power and Ethernet LEDs were active, and the computer directly connected to it via Ethernet could not receive an IP address. Fortunately, I had an old ADSL router still around, and thus was able to find information that, in the end, allowed me to recover the DSL-2600U.

The first thing I thought, however, was that a problem is on my local computer. So I started tcpdump, only to find out that the router sends strange packets: a gratious ARP for 192.168.1.1, followed by ARP "Who has 192.168.1.159? Tell 192.168.1.1". It didn't respond to pings even from 192.168.1.2. At this point, it became obvious that it is broken or at least needs its settings to be reset, without any access to the web interface. Pressing the Reset button in the pinhole did not change anything.

When assigning my computer the IP 192.168.1.159, the conversation went a bit further, namely, there appeared a TFTP packet with a string "308nfs6358" in it. Googling for it (via the spare router) gave mostly results about reflashing that same router model that I had, via the COM port, as a last resort. They were useless, as adding a COM port to the router is not a modification that I wanted to do.

Still, I wanted to try to revive the router, and these pages also suggested that TFTP is involved somehow. They even told the reader to put the official firmware to 192.168.1.1 via TFTP after pinging it - but obviously this could not work at this stage.

So, as the router asked for the "308nfs6358" file, and also the I had to provide it via TFTP. On Linux, there is a tftp server, atftpd, so I set it up. It is actually easy. Just create the /tftpboot directory and chmod it to 777, then run:

atftpd --no-fork --daemon

However, it was not an obvious question what to put in that directory. I had a firmware file, downloaded from the official site. When unzipped, this yields DSL-2600U_3.12_6338_12_RU_1.25_20101129.img. However, the web pages containing the reference to the "308nfs6358" file also mentioned "bcm963xx_fs_kernel", so I assumed (maybe wrongly) that I had to split the original firmware file into two.

This OpenWRT page ended up being a useful resource. It contains a program to dump the firmware structure. As the program calculates and verifies CRC of various portions of the firmware file, I guessed that I could modify it to dump these portions. The implementation is obvious: there is a compute_crc32() function that receives the offset and length of the region being checksummed, and it is trivial to dump the bytes into a separate numbered file. So, I ended up with a short 256-byte header and two files. For the reference, here are the offsets and lengths:

rootfs: start=256, length=3063808
kernel: start=3064064, length=560856

As the router was broken anyway, and because I had no better idea, I decided to determine the correct file by trial and error. To be able to understand what happens, I started Wireshark and told it to sniff all packets on the eth0 interface of my computer. I copied one of the files (the shorter one) into /tftpboot/308nfs6358 and power-cycled the router. It downloaded the file via TFTP (as evidenced by Wireshark), and then nothing happened.

With the other file, I had better luck: the router not only downloaded it, but also changed its MAC address, sent some gratious ARP packets and became pingable for some time. Still, it didn't attempt to download anything else. OK, so it means that it waits for me to upload the second file to it, I thought. This didn't work. The correct thing to do was to upload the original unzipped firmware.

So here is the complete sequence of actions that worked:

1. Assign the IP address 192.168.1.159 to the computer.
2. Unzip the firmware, split it.
3. Copy the rootfs (the 3.6 MB part of the firmware) to /tftpboot/308nfs6358
4. Run atftpd as follows: atftpd --no-fork --daemon
5. Power-cycle the router
6. When it changes the MAC address, upload the firmware to it: atftp --no-source-port-checking -p -l  DSL-2600U_3.12_6338_12_RU_1.25_20101129.img -r bcm963xx_fs_kernel 192.168.1.1
7. Try again from the beginning, as the default timeouts are rather short.

This blog post has been sent through the revived router. Still, I guess it won't last long, and I need to buy a replacement.