Subgrep
I've had this program lying around for a while, and not really known what to do with it. I wrote it a while ago when I was trying to work out my coding style conventions in C. I thought that the best way to do this would be to build a mediumish project and think a little bit about lots of the decisions I made, and when I realised I stuffed up, I would try to go back and change everything. I don't know how successful it was (as in, "I don't know", not "it wasn't successful, but I'll say 'I don't know' to avoid admitting failure") and at least one thing has changed since then (I now name functions and variables using underscores not camel case, so code looks prettier when using GTK+ libraries).
Skip this if you know what regular expresions are: A regular expresion is a description of a set of strings. Like "the set of all strings with 3 vowels in a row" or "the set of all strings ending in "bacon". They are useful for computer nerds who have a big file which has a little bit of stuff in it they want, but they don't know exactly what the stuff is (or else they wouldn't need to find it). So you create a regular expression that matches things that look like what you are after (say "four numbers, then a space, then 4 more numbers, optionally preceeded by two numbers in brackets" for phone numbers) and then run a program which looks for things which match.
Anyway, the program is a tool which I have no idea what you would actually use it for, but still is more a tool than a toy. It takes a regular expression, and a text file, and it then treats each line in the text file as a regular expression and tells you if it is a subset of the given expresion, or the other way around, or if they are equivalent or otherwise. I call it subgrep, because it is like grep, but it does subsets. (For those not in the know, "grep" takes a regular expresion and a file and tells you which lines in the file match the regular expression).
Going through uni, I think I was taught regular expressions about 3 times in different subjects (one was practical, one was maths-purist, one was computer science-purist), but I don't think any of them really cared about subsets. Pretty much the only thing the purists cared about was the equivalence to DFAs and NDFAs (which this code makes use of, a regular expression is converted to an NDFA, then to a DFA). The practical people didn't care about much. So, this code gives a bit of a bigger spectrum, it has a tokeniser, a parser, a parser to NDFA converter, an NDFA to DFA converter, a DFA simplifier, and a few other operations on DFAs (like unions and equivalence, which are then combined to do subsets).
The challenge now is to find an actual use for this tool. I haven't thought of one, but maybe someone will find it useful: C source code (GPL3).

RSS