Archiv der Kategorie: English

Your diff viewer is right

I stumbled upon the most ridiculous article tonight. The author claims diff viewers are wrong for displaying deletions in red and additions in green. Why? Because, he says, that is tantamount to passing a value judgment, red being associated with evil and danger, and green with good, and all:

Our diff viewer, then, tells us that deletions are bad, dangerous, and possibly an error, while insertions are good, safe, and successful. More code good. Less code bad.

At this point we know the article is utterly flawed, because of course it is not the deletions that are colored red by diff viewers such as GitHub’s. It is the old code. The author acknowledges this objection:

Edit: multiple people have suggested a different interpretation: old code bad, new code good.

But he still tries to save his argument:

However, since that would be a similarly invalid value judgment, the argument below is still valid.

Invalid value judgment? Why, of course the old code is bad, or at least worse than whatever replaced it – hopefully! Otherwise, why would we have deleted it? Perhaps what the author is thinking about is that we may have made a mistake and don’t know whether we really improved the program:

In reality, insertion/deletion is orthogonal to good/bad. There are good insertions, good deletions, bad insertions, bad deletions. Only we humans get to judge which changes are good and which are bad, but during code review, the diff viewer is constantly subtly trying to influence our judgment.

But he got it all backwards. A human already made the decision that the old code is bad, and the diff viewer had better be doing its job and reflect that judgment! Software cannot spot the programmer’s mistakes – it should make her intent clear so she or others will hopefully notice.

As far-fetched as the author’s complaints about diff viewers trying to influence our judgment is his theory of why red came to accompany deletions and green, additions:

I believe the reason for this strange color scheme is the lack of a revision control system. Back in the dark ages of programming, we didn’t use them. We edited files on disk, and that was that. In that environment, a deletion is dangerous. (…) But we don’t live in a world without revision control. It is peculiarly ironic that the ‘deletion is dangerous’ sermon is being delivered by our version control systems. That same revision control system which tells us that ‘it’s okay to delete things, because it’s all still there in the history.’

Far from it – there are still very good reasons to associate deletions with danger, and therefore, the color that stands out the most (red). Philosophically appealing though the author’s God-like perspective on revision histories may be – there is no time, all is one, deletions and additions are just the same thing seen from two sides – in reality, software development still happens from past to future.

This firstly means that deleted lines are, though not lost, quickly forgotten – and hopefully for a good reason. Highlighting them in a color that warns us to check our deletions carefully helps avoid relegating important stuff to history and later not being able or bothered to retrieve it.

Secondly, the function of a diff – at least the type that the author shows us, which is not side-by-side – is not to show us an impartial view between two versions of equal status. What we are typically interested in is the new version and how it compares to the previous one. And the new version is right there: it consists of the white and green lines. The green ones are marked for being new, but other than that are not really different from the white ones. In addition, there are red lines showing us what was deleted. Mistaking a red line for part of the new version would be dangerously misleading – hence, again, the signal color.

Don’t ban academic whisteblowing

→ Sign the open letter ←

Academic freedom in Germany is in acute danger. As soon as tomorrow, the Deutsche Forschungsgemeinschaft, a very important funding agency, is going to vote on a proposal that would make whistleblowing on academic misconduct a sanctionable offense. Universities would be forced to implement this rule at the risk of losing their funding. Under such a rule, to name an example, Andreas Fischer-Lescano, who in a review article in 2011 first pointed out that large parts of defense minister Karl-Theodor zu Guttenberg’s doctoral thesis were plagiarized, would have faced disciplinary measures – just for making his findings public, which is what research is all about. The proposed rule, whose exact wording is being kept secret, seems to be an alarming attempt to silence academic whistleblowers and protect those who have something to fear from public discourse about their academic conduct. This is a direct attack on academic freedom, one of our precious safeguards against authoritarianism. I consider it of utmost importance to stop this measure, or at least make our protest heard clearly.

For more details, this blog post has an excellent list of articles (some German, some English).

→ Sign the open letter ←

Palindrome

I’m beginning to teach myself Haskell, because we all have to. I started doing the 99 Haskell problems and came across a beautifully cunning solution to problem 6, “Find out whether a list is a palindrome.” Let’s first look at the classic solution, which is maximally declarative. I use Prolog here to formulate it:

palindrome(X) :-
  reverse(X,X).

It reverses the list, then checks if the result is the same as the original (that’s the definition of a palindrome). It checks that by going through both lists and comparing elements at corresponding positions.

What’s ugly about this is that this is at least twice as many comparisons as needed. Since we know one list is the reverse of the other, it suffices to compare the first half of one to the last half of the other. (In lists of odd length, the center element does not need to be compared at all, since it is always identical to itself.)

Alternatively, we could just traverse the list to check, carrying along a reversed version of what we have traversed so far, stop in the middle and then compare the reversed first half to the remainder (i.e. to the last half). The problem is: where to stop? We don’t know the length of the list until we have traversed the whole of it, hence we also don’t know what half its length is.

Enter the intriguing solution that was given on the Haskell Wiki, humbly titled “Here’s one that does half as many compares”, and that gave me a very nice lightbulb moment when I had gotten my head around it. Here’s my Prolog translation:

palindrome(List) :-
palindrome(List,[],List).

palindrome([First|Rest],Rev,[_,_|Rest2]) :-
  palindrome(Rest,[First|Rev],Rest2).
palindrome([_|Rev],Rev,[_]).
palindrome(Rev,Rev,[]).

The trick is to carry along a second copy of the original list, popping two elements from it every time we pop one from the main copy. This way, when we reach the end of the second copy, we know we have reached the center of the first. There’s two recursion-ending clauses, one for odd-length and one for even-length lists. Ingenious!

Unconditionally Make Implicit Prerequisites

I’m pretty new to make so maybe the following is trivial and/or horribly bad practice, but here goes: I have this bunch of output directories, each containing a file called en.tok from which I want to make a corrected version, en.tok.corr. Apart from en.tok, en.tok.corr also depends on the script that applies the corrections, and on a MySQL database that contains the corrections. Since make doesn’t know about databases, I chose to represent the database by an empty file en.tok.db and use touch in a second rule to set its timestamp to that of the latest relevant correction so make knows whether to rerun the first rule:

$(OUT)%/en.tok.corr : $(OUT)%/en.tok $(OUT)%/en.tok.db ${PYTHON}/correct_tokenization.py
	${PYTHON}/correct_tokenization.py $> $@

$(OUT)%/en.tok.db :
	touch -t $$(${PYTHON}/latest_correction.py $@) $@

But how can I force make to apply that second rule every time? We need to know if there are new corrections in the database, after all. My first idea was to declare the target $(OUT)%/en.tok.db phony by making it a prerequisite of the special target .PHONY, but that doesn’t work since the % wildcard is apparently only interpreted in rules whose target contains it. Thanks to this post by James T. Kim, I found a solution: instead of declaring $(OUT)%/en.tok.db phony itself, just make it depend on an explicit phony dummy target:

$(OUT)%/en.tok.db : dummy
	touch -t $$(${PYTHON}/latest_correction.py $@) $@

.PHONY : dummy

And Then There Were None

Ten guests are invited to a large house on a small island. When they arrive, their host is nowhere to be found. Soon, they hear a mysterious voice that accuses them of being guilty of murder – then, suddenly, one of the guests drops dead – poisoned! One down, nine to go! The excitement never lets up in this classic and brilliant murder mystery presented by the Anglo-Irish Theater Group.

In this English-language production of Agatha Christie’s And Then There Were None, which through its take on the afterlife of the murdered guests adds a whole new edge to the classic murder mystery, I myself will be playing the role of General MacKenzie. Texttheater readers with physical access to Tübingen should not miss this opportunity to see us, the Anglo-Irish Theatre Group, on the most important theatrical stage of our small university town, the Großer Saal of Landestheater Tübingen (LTT).

We perform on Tuesday, July 19th and Wednesday, July 20th at 8 PM. Tickets should be available on line (if the booking system works, it didn’t for me), through booking offices and from the box office located, like the theatre, at Eberhardstr. 6.

Looking forward to seeing you there!

My Debian Initiation

Having switched from Ubuntu to Debian Squeeze and pondering ways to combine the security of a largely stable operating system with the additional functionality afforded by individual newer software packages, I recently wondered: Apt pinning seems complicated, why not just add testing sources to sources.list and use apt-get -t testing to get whatever newer packages I need? I can now answer this question for myself: because if you are under the impression that upgrade tools like apt-get and Synaptic are aware of the “current distribution” and will never upgrade beyond that unless explicitly told so, then that impression is wrong, even if apt-get’s occasional “keeping back” packages and the name of the command to override this (dist-upgrade) may suggest it. You will thus inadvertently upgrade your whole system to a non-stable branch. And when you finally notice it, you will then, more out of a desire for purity than out of actual concern for your system’s security, use Apt pinning to try and perform a downgrade. The downgrade will fail halfway through because the pre-remove script for something as obscure as openoffice.org-filter-binfilter has an obscure problem, leaving you with a crippled system and without even Internet access to try and get information on how to resolve the issue. By this point, reinstalling from scratch seems more fun than any other option. And so I did.

Another lesson learned: Do use the first DVD to install Debian, it contains a whole lot of very useful things such as network-manager-gnome or synaptic that are not included with the CD and that are a hassle to install one by one. And there’s also a new unanswered question: why did the i386 DVD install an amd64 kernel?