End Of Year Home Directory Clean Out

Sat, 22 Dec 2007

I tend to accumulate lots of junk in my home directory. Today seems like a good day to get rid of it. Occasionally I find a quote funny enough to save in case I need it later, other times I need to run a quick program (or see if something compiles), basically there is lots of garbage, but hopefully some of it will be amusing enough. So presenting the slightly more interesting things which I've just deleted:

Quotes

"As old hands will be well aware, it's not a new C standard without an entirely new meaning for the static keyword." - Peter Seebach

"Too many errors on one line (make fewer)" - Apple MPW C compiler

"Open letter = Can't find your email address" (I'm not sure who wrote this)

Media

Wallpaper thumbnail

An old wallpaper I used to use.

Some drum patterns (these were done in rosegarden a while ago using a different drum patch, so they sound nothing like what they sounded like originally)

Various attempts at ripping just the audio track of a dvd. Why can't someone make this easy (bonus points if each chapter gets put into its own file).

Image from log file from an AIBO

This was taken from a log file I had of a Sony AIBO running around when I was at uni.

Other

Web site and executable served by a storm worm infested computer. Just out of curiosity.

Instructions for the automatic timer I use to turn my air conditioner off at night.

Rant about whitespace dependance in programming languages and why I don't like it.

One of the files I included in my vim configuration (which I shouldn't have deleted)

Programs

A program for finding square triangular numbers (like 36 = 6 * 6 = 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8)

This snippet (and testing code to make sure it works in the range I needed):

int v = i;
v--;
v |= v >> 1; v |= v >> 2; v |= v >> 4; v |= v >> 8; v |= v >> 16;
v++;

(probably from Bit Twiddling Hacks)

And finally:

cat /*dev/null; echo "Happy New Year"\!
cat <<c*/ /*dev/null | cat > /dev/null
c */ () {} /*
c */ main() { cat(); printf("Happy New Year!\n"); } /*
17      format('  Happy New Year!')
         write (6,17)
         stop
         end
c This program runs in four languages with the same effect.
c The languages are C, Fortran, C Shell and Bourne Shell.
c Written by Vadim Antonov, avg@hq.demos.su
c*/

Fixing Things

Sat, 27 Oct 2007

One of the features in the latest version of ubuntu is a really simple idea for installing software. You just make a link, like this one and when the user clicks on it, they get a prompt offering to install the package, if they say yes then it gets their password and installs it without any fuss.

Because it is a new feature, and it is pretty simple to understand, some people have gone a bit crazy with it - which is cool. But, a usability problem comes up when you have no idea what it is you are actually installing:

Install qtpfsgui

It is difficult to answer this question meaninfully. My thought process would be that it uses qt (which is a GUI library), has a gui (well I already suspected that from the qt part), and does something with "pfs", which according to google is "Professional Freight Services". But since all the packages in Ubuntu have a description associated with them, we can just add an extra bit to the interface:

Install qtpfsgui Install qtpfsgui with description

So we can find out what it is.

Adding such a feature is quite simple, and you'd hope that with my computer science degree and years of experience that it wouldn't be much of an issue. So, I downloaded the code for it, added the new bits and then generated a file which says what changes I made, so then someone can integrate it into the main codebase that everyone uses.

Unfortunately, I tend to get really stupid when it comes to the final step of getting things integrated. My understanding of things is that every patch has a 'bug' that goes with it. So for a patch to be accepted, it needs to be fixing something. So I went to the place that handles all the bugs and found this:

There are currently no open bugs.

Now I don't think for an instant that a newly added feature will have no open bugs, so I look around and find on the overview page that it "Doesn’t use Bugs". So obviously the bugs are tracked somewhere else. Because I get stupid here, I have no idea where to look.

This happened yesterday, so today I thought that I'd give it another shot. So I google for the bug page again, and find this:

1 - 15 of 15 results

On a page that looks otherwise very similar to the one yesterday (which is also similar to this one). There I found that someone had already had this idea, so I put my patch in with that bug, and I guess if it gets accepted then we'll see it in ubuntu later.

The reason all of this is interesting to me is that I think contributing needs to be really easy - even if I'm really dumb. I've been scared off reporting bugs for some software packages before because I'm faced with screens like this:

OpenOffice.org bug report

I don't mean to specifically bag out one group. But, I can't imagine many people being able to correctly answer the question about what subcomponent they are talking about.

But it is a bit of a balancing act. The developers want the bugs neatly categorised by the users, and the user just wants a textbox where they can rant about whatever. I don't think I've dealt with a system where you can report bugs without logging in, but I think that is important. If someone is reporting a bug, I think you need to make it as easy as possible for them, and then have slightly more knowledgable people come along and do the categorisation. The problem is that that is a sucky and boring job for most people, so it is a bit of a tough sell.

I think in the case of the form above, the improvements I would make would be:

  • Remove the requirement for having an account
  • Give every field an 'unsure' option
  • By default, hide the fields other than the summary and the description
  • Add an (optional) email address field for a non-registered user, so they can stay in the loop

I think it is a lot easier to categorise a vague bug description than to work out what a user was about to post, than to guess what the bug was that a user was going to post, but then didn't because they couldn't be bothered to figure out what hoops they need to jump through.

Prefetching

Mon, 17 Sep 2007

Out of curiousity, I turned on the setting in firefox that asks you about every single cookie that a website wants to set. My thinking was that it would be annoying for the first few weeks, and then become tolerable (as it learns my preferences for the sites that I visit). The weird thing that I did notice was that when searching for various things on google, I would get requests for cookies from the website that came up as the #1 result, without me even clicking on the link. This made me a little nervous, as it seems like a case of my browser doing something that I didn't ask for, which is typically a sign of a security issue.

The cause of all this isn't very hard to understand. The html code which triggered my browser to visit another site was a link tag with a "rel" attribute of "prefetch": <link rel="prefetch" href="http://www.example.com/">. This is a hint from the webpage that we are probably about to visit http://www.example.com/ and that we might want to download that webpage now, so when we click on the link, then the page is ready to be displayed already. If it guesses correctly, it should speed up your web browsing lots, because you typically read a page before clicking any of the links on it, so if those links are already loaded before you click on them, then that will be great.

So what types of things could a malicious user do? The thing which comes to my mind is faking clicks on an ad (to get money), faking clicking on links as the user (if the user is logged into a website, then an attacker may be able to do stuff without your permission on that website). Those are the two big things I can think of, but when you look at what an attacker has to do, I think the threat is quite small. The main problem is that once the attacker is in a position where they can get your browser to do the fetching of a link behind your back, there are heaps of other ways they could do the same thing (like an invisible iframe, or some javascript, or whatever) (actually they can do far worse things, since a prefetch doesn't mean that the html will be rendered, only downloaded). This is typically called a XSS attack, and it is a problem when someone can put whatever they want on your website. The comments form on this page has a very simple prevention of such attacks by stripping out almost everything.

An example http request sent when I search for "yahoo"

Now, although I don't think this prefetch thing is a hugely big problem, there is a danger in having google doing prefetches (note that it doesn't do prefetches for every search, I think it is only for pages with a high enough pagerank - compare gmail vs friendly robot overlord). Suppose you are the person who gets the #1 result from google when people search for "bimijinklix" (which will probably be this site in a few weeks) and your page has a high enough page-rank to get prefetched. Once you have this, anyone who searches for "bimijinklix" will automatically download your page. Now all you have to do is change what they download, so instead of getting your page, maybe they get the page that they go to when they click on one of your banner ads (which earns you $400 per click). This is a bit more difficult, because at least at this stage, firefox doesn't follow redirects in prefetches. What you can do is modify your DNS records, so instead of going to the real server for your site, they go to the ad company's server. There are a whole lot of places where this can go wrong (and you only get to choose the IP address of the server, not the actual page requested, and any ad server worth its salt will always be handing out unique links every time an ad is requested, so it would just look like one ad being clicked on lots of times), and it is probably too complicated to be worth doing (especially since a user will probably click on the first link from google anyway).

The only other thing which I'm slightly nervous about is that the website you search for can get and set cookies without you visiting their site. So if you own the #1 spot on google for a particular query and you have high enough pagerank to get prefetches, you can automatically profile how many people are searching for a particular keyword, and you can track how many times an individual does such a search.

You can turn this off if you don't like all this stuff (go to about:config and search for "prefetch"). I'm not sure, but you might need to restart firefox to get it back on. I can't seem to get it to work now I've turned it off. I will be keeping it on, since the risks are so minor, and I'm blocking nearly every cookie now, so there isn't much of a privacy issue.

Several Unrelated Things

Fri, 10 Aug 2007
  • I had an office space like moment yesterday. For those who know my working arrangements, I have it pretty good only working 4 days a week. For some reasons, the boss asked me to come in on Friday. 5 days is way too many days in a row to work, I am so tired now because of it. To anyone thinking of getting a job, ask for 4 days, you won't regret it. To anyone who has a job, next time they offer you a raise, ask for a cut in hours. Make 4 days a week the standard.

  • I still haven't fully mastered the 4x4 rubik's cube, I can solve it all the time 50% of the time, so my algorithm is basically 'solve it as best you can, if this fails (50% chance), scramble and try again'. I'd like to make my algorithm a bit more deterministic, but the final parity error is very stable - all of the methods I've used to mess up the cube have just got me back to the same position. I know that shortish solutions exist, but I'd even be happy with a longish solution (provided I can remember it).

  • I think most programmers don't like writing applications, they prefer solving problems. The part where you have to actually build an interface to the outside world is so boring compared to the actual solving of the problem. I can't decide what the solution is. It could be to turn everyone into programmers, so you don't need the normal-person-usable interface. It might be to make interface building fun (I'm not just talking about graphical interfaces here either). It might be to make some sort of automated interface builder. The programmer in me knows that this solution would be the most fun to implement, but I don't think it is very easy to do well (I'm picturing some system where you give it a description of the data you want, and it can turn that into an API, a command-line interface, a GUI, an argument-taking text based process, an ncurses thing or whatever). The problem is that it needs to actually understand the meaning of the data it is collecting, or else the interface won't make much sense.

  • Driving regularly seems to make you think about driving lots. I'm wondering what the roads would be like if everyone drove at the Nash equilibrium. I suspect that there would be larger gaps between cars, less lane changing, less braking, and faster taking off from a red light. Actually, my guess is that there is no equilibrium, and if everyone was driving in a way which minimised transport time for the whole group, then a greedier more aggressive driving style would be able to have shorter trips than the average driver.

  • I discovered the problem I was having with my stealth distortion. It was just normal boring distortion, being added in a place I didn't expect. My volume control goes up to 11, and the setting I had it at (81%) was just over unity gain (which is at 80%), so it was just clipping any samples which were very close to clipping. Quite boring, but whoever thought it would be a good idea to have a volume control that goes past 100% at this stage in the signal chain should have second thoughts.

  • I'm puzzled by why people have anti-virus software but no backups. The big problem with having a virus on your computer is that it will delete/corrupt/encrypt your data. Sending spams to people is bad, but it doesn't get in the way of doing your work. Slowing down your computer is bad, but it doesn't destroy your work. There is also the other problem of hard drive failure, or laptop theft, or whatever. It seems to me that the problem is the same here - you can get a new computer, but it won't come with all the stuff you were working on before - that is the stuff with value. Why not just have a good backup system, so the cost of a virus infection is low (you just restore your backup)? Having an antivirus and no backup does nothing to stop your hard drive from failing. As a word of warning, for me as soon as I bought an external hard drive to backup to, my main hard drive was corrupted. Then the other day at work a guy tried out some backup software, and then the next day his machine wouldn't boot. Sometimes I think that thinking about backup stuff means that your computer decides that it doesn't have to worry.

  • I recently joined a private health insurance fund, and when I was signing the forms and stuff, I crossed out all the stuff about allowing them to send me spam. Well, they are sending me spam and I can't decide what to do about it. I don't find it particularly annoying (it is all identified as coming from them, it all goes to its own address, and it is all marked as spam by someone before it gets to me, and I'm sure I could just ask them to stop sending one more time and they would), but it is the principle of the matter which makes it so difficult - and asking them to stop the normal way would just be boring.

Zero Knowledge Proofs

Fri, 27 Jul 2007

So one day at uni I was taught about a thing called a zero knowledge proof. We were told at the time - "You probably won't ever need this, but it is cool so you are going to learn it." or something along those lines. Ever since then I've been trying to find an excuse to use one, and I finally succeeded the other day at work.

Basically, I had two parts of a program, and they each wanted to make sure the other part was there, and not some evil imposter program. There are boring ways to do this (A sends B a random string, B signs the string and returns it then vice versa) but, because they are boring (and vulnerable to a man in the middle attack) I didn't use them.

Instead, I generated a large graph (as in dots joined by lines, not bar-graph) where I knew a way to start from dot, and by following the lines, I could get to every other dot exactly once and return to where I started (this is called a Hamiltonian cycle). I then put the description of this path(*) in one of my programs (which will be called the prover).

When the other program (provee) wanted to check that the prover was actually the prover, it would make sure the prover knew what the path was. But, the prover couldn't just say what the path was, because if the provee was actually some jerk program just pretending to be the provee, so then it later can pretend to be the prover then we would have given up our secret.

So, instead of directly telling what the path was, the prover woud send the provee a new graph, one which has the same structure as the original graph (so it still has a Hamiltonian cycle) but where all the labels are different, so it isn't obvious which dot corresponds to which dot in the original. The provee then asks the prover either - "how did you get from the original graph to this?" or "where is the Hamiltonian cycle in this new graph?". It is only allowed to ask one of these questions, and the prover will only answer one question. The prover can easily answer both questions, because it knows how it relabelled the graph itself, and if it just relabels the dots in the secret path, then it can answer the second question too. Then the prover sends a new graph, and they repeat the process as many times as they want (each time increases the certainty - at first it is 1/2, then 3/4, then 7/8 etc).

This relies on two things being hard:

  1. Finding such a cycle.
  2. Finding how you got from one graph to another (called finding an isomorphism).

If you can do either of them easily, then the security of this whole thing falls apart.

So the first thing is - why is it called a zero knowledge proof? This type of proof doesn't tell anyone except for the provee anything except for the fact that the prover knows the cycle. Anyone else who listens to them talk will not have a satisfactory proof (the prover and the provee might have conspired before, so the provee only asks questions the prover will know the answer to) and the provee can't find out what the cycle was, because they would have to find the relabelling in the graph (which is hard).

So next, how hard is it really? Obviously that depends on the size of the graph.

So to get some idea how hard this really is, I wrote a program which would try to find a hamiltonian path in a random graph. It is slightly smarter than just trying every possible path, but it is by no means the best (it is especially inefficient since it is written in ruby).

I was actually expecting my program to be much faster than it was. It is a randomised algorithm (it picks a random path until it reaches a node that it has reached before, then it reverses the direction of the nodes found in the loop it just found, and keeps trying. Every now and then it restarts the search.

So there is a chance that it will be lucky and find the solution straight off, or it could be very unlucky and never find the solution.

My very unscientific results:

A graph with 50 nodes and 50 edges takes 0.0 seconds (there aren't many choices)

A graph with 50 nodes and 60 edges takes 0.6 seconds

A graph with 50 nodes and 70 edges takes 3.7 seconds

A graph with 50 nodes and 80 edges takes 0.8 seconds (lots of different answers now)

A graph with 100 nodes and 120 edges takes 11.2 seconds.

A graph with 100 nodes and 150 edges takes more than a minute.

Now I'm going to bed. Don't rely on these results if you are designing a crypto system.

(*) Technically it is called a cycle (because it starts and finishes at the same spot). But path sounds more natural.

The Rubik's Cube Alphabet

Fri, 13 Jul 2007

My 4x4 Rubik's cube died the other week, so in my sadness I decided to make a new typeface by using my 5x5 cube. I put some paper on my desk to give a mostly white background, and then spelt out every upper case letter on the 5x5 cube. I took 2 high resolution photos of each (with the camera on a tripod and the cube in a mostly consistent spot) and then selected one of the images, resized it, stitched it with all the other images, edited out the desk and increased the brightness and saturation to bring you this:

Rubik's Cube Alphabet

(note that I stuffed up when I did the resizing and accidentally saved the resulting images as jpegs, so there is some dodgy artifacts there which are then made even more obvious when I increased the brightness and saturation, then to save bandwidth I did another round of jpeg compression at the end just for a laugh. There is a png file with the same name as the jpeg if you want slightly better quality, email me if you want the higher-quality source images.)

In pleasant news, since it was recently my birthday, I now have a new 4x4 cube, which has inspired me to go back to my search for a method to fix the "parity error". For the non-Rubik's cube people, when you have a normal 3x3 Rubik's cube, there are actually multiple solutions since the centre square on each face can be rotated differently. This isn't very obvious, because they are just blocks of solid colour, so they don't seem to face a particular direction. But if you get one of those stupid sudokube's then it is much more obvious - you can get all the numbers in the right spots, but have some of the 5's upside down or whatever. But even though there are 6 faces where each could be facing 4 ways there aren't 24 different solutions, because a lot of these combinations are impossible. I don't know enough about group theory to be able to be certain about this, but I'm pretty sure that rotating a single face somewhere means that you will rotate one of the other faces (this is talking about the final solved state of the cube - you can easily rotate a face without moving others, but that will mess up the colours). So I'd guess that there are 12 different solutions (even though they all appear identical).

With a 4x4 cube, you can think of it as a 3x3 cube, but where the middle face is split up into 4 and the edge pieces are split up into 2. So to solve it, you will have to get the 4 parts to each face back together, and the 2 parts to each edge together as well. However there are multiple ways you can reconstruct each edge and each face - even though they look identical, it is possible that you are reconstructing it in a way that means you have one of the edge pieces flipped. So you can think you have solved it, and do your thing, but then you get to the very end, and you have 2 edge pieces (corresponding to a single edge in the 3x3 cube) which are flipped. This is the parity issue, and to fix it, you can either find a website which gives you a sequence of moves you can memorize (boring), scramble it up and try again (50% chance of success) or try to figure out how to fix it properly. In the past I've used the scramble method, but now I feel like working out a method for fixing it with 100% chance of success.

In other Rubik's cube news, I figured out my own algorithm for solving 3x3 cubes, and that was a few years ago, and while I don't play with a rubik's cube every day, I would say that I thought I had done it enough to have encountered every situation. But last night when playing with a 3x3, I got into a situation where there was a hole in my algorithm (i.e. a situation I'd never seen before). It was easy to fix, but it still surprised me a lot.

x things I hate about python

Sat, 02 Jun 2007

So I just realised that both the applications I've demonstrated here have been in python. This makes me laugh, because these I think are the first two python applications I've built (I've modified plenty of python before, but never started a python project). So I'll be clear and say that this isn't a place where I use python for everything. So then, on to the python bashing:

  1. Built in functions - I don't care if they are named sanely, there are too many of these (len(dir(__builtins__)) == 134), and I won't remember them. These functions don't look anything like the basis set of vectors I would use for describing every program. Obviously the language designers didn't view things the same way as me, and that is cool, but I much prefer a small language.
  2. Recommending spaces for indentation - another popular holy war. But using spaces for indentation is wrong. If I'm viewing code, I want to choose how much indentation there is (in fact, on my widescreen laptop, I use 4 spaces but on this desktop I use 3 spaces - and at work I think I use 2). I disagree about mixing tabs and spaces also. Spaces should be used for alignment, and tabs for indentation. They have different purposes, so they shouldn't be mutually exclusive (for example, alignment of a multi-line condition should be done with spaces). Finally, if I'm wrong, then it is much easier to convert from tabs to spaces than from spaces to tabs (and one of my internal heuristics says that this means that tabs are right).
  3. Ugliness. After reading the start of any python tutorial, you will think that python is a nice looking language. Then you get the the section on classes, and are greeted by functions like __init__, or is that _init_ or ___init___ - fortunately everyone writes code in a monospace font. But I'm still struggling to come up with a reason why we needed these extra __ at the end. Why not just __init? (or just say what you mean: def private init(self):. Which then exposes another ugliness which I'm sure people have complained about before - the "self" parameter. Unless I'm missing something, it could be made a keyword, and then we wouldn't have to include this parameter in every method.
  4. Anonymous functions suck. Simple functions are fine inc = lambda x: x+1, not so simple functions aren't: factorial = lambda n: reduce(lambda x,y: x*y, xrange(1, n+1), 1), isprime = lambda n: itIsEasierToJustWriteANamedFunctionAndCallItHere(n). Maybe this whole restriction of having a one-liner for an anonymous function was a design decision, because any function more than 1 line long should have a name. I think that this should be the programmer's judgement call, not the language's.
  5. (Just because the documentation is in front of me): The function raw_input gives you the raw input... minus the newline.
  6. Most of my python experience was in a situation where a debug cycle took about 30 seconds. I don't care that you can make a .pyc file that has abs("string") represented in it, that doesn't help me.
  7. The fate of reduce(). Well, I suppose my previous point about anonymous functions is being taken care of... by getting rid of them. How about getting rid of anonymous integer literals too? Sure, it would prevent you from finding code with magic number sprinkled all over the place with no indication about where they came from, but I'd love to see your "using python as a calculator" tutorial after the change. Why do people hate functions?

There are other things that irk me about it, generally the typing situation annoys me the most, then the whitespace, then the anger towards functional programming. I don't think it is going away, and I'll probably keep on using it because it has lots of good libraries for it. But I'll never be happy about it, and it looks like things are only going to get worse, since these seem to be things where I have different opinions to the language designer, not just because these things were oversights.

Super Permalinks

Fri, 18 May 2007

Someone asked me if my site will have permalinks (yes - I'm not changing any links) and it reminded me of a technique which I read about quite some time ago by Thomas Phelps and Robert Wilensky:

Traditional hyperlinks are very brittle, in that they are useless if the page later moves to a different URL. This project improves upon traditional hyperlinks by creating a signature of the target page, selecting a set of very rare words that uniquely identify the page, and relying on a search engine query for those rare words to find the page in the future. For example, the Google programming contest can be found using this link. (source)

The idea is that instead of remembering the location of something, you remember a little bit of the content there, since that is what you acually care about. Then when the location changes, the content stays the same, so you can still find it. I thought it was a cool idea at the time, so whenever I was at a website, I would try to remember a few unusual words that it contained so if I ever wanted to get back there, I could just google for those words. It worked, but I've stopped doing it for no real reason.

a nocturnal, melodic, bag of phonetic sugar in a cliffhanger situation that is indistinguishable from its buddy, and has been shortchanged by the beverage machine. Take my word on it that it is metaphysical, and that it is somehow subcontracting.

So, if you want to try it. Then just remember a few of the following words: indistinguishable, subcontracting, metaphysical, cliffhanger, melodic, phonetic, shortchanged, nocturnal, sugar. At the moment a search for all of these words comes up with 48 hits but hopefully that will change in the future. At the moment google doesn't even know this page exists, so it will be a while before this is useful. I've produced a visual aid for you to remember them by - simple (though I couldn't work out how to make the sugar metaphysical, and drawing a contract above a bag of sugar is supposed to suggest "sub" contracting which is a bit loose).

As a closing note, I tried to look at the actual paper, and I got a server error. I don't think that is irony, since the link to it was just a normal link. But I still found it amusing.

 

About

I'm a nerd living in Sydney. This is a place where I can write stuff about my interests and not care that no one else is reading.

I like music, maths, programming, pretty pictures, filters and other good things.

(more info)

It should be fairly obvious that this isn't connected to my employer at all.

Email me (not a catchpa)

Email policy

Subscribe

RSS Feed RSS

Get an aggregator

Liferea (Linux)

Vienna (OSX)

Feedreader (Windows)

Google Reader (Web based)

I've only used Liferea, so I can't vouch for the other ones.

About this site

This site runs a (modified) version of blosxom.

The host is GeekISP, and they seem to do an excellent job.