.chomp


It’s shocking but true. I bet you thought you were rid of that degenerate, vile, baby-eating language feature, that you had left it behind with the dark days of C. Think again, douchebag!

But first, what exactly is a pointer? When you describe pointers to the uninformed, you probably say something like “a value that points at another value”, and that is correct. As the name implies, this is their definitive purpose.

In C, pointers are overloaded to serve three distinct roles: pointers, arrays and iterators. They are pointers when you create them with the address operator (&) and use them with the dereferencing operator (*). They are arrays when you apply subscripts to them as in buffer[i]. They are iterators when you perform arithmetic on them as in *d++ = *s++. Despite their arguably excessive versatility, their primary role remains as pointers. If they lacked the functionality to serve the other two roles, we would still call them pointers.

In Java, the statement Object obj; creates something called obj that has something to do with the class Object, but no actual Object has been created here. Instead, we have what Java calls a reference, or more precisely, a variable that can hold a reference. When you actually assign an Object to the variable, it will then be referring to that Object, one might even say pointing at it.

Indeed, that is exactly what it’s doing. The semantics are identical to C pointers. If you copy the variable to another variable, they will then both point at the same Object. The only difference is syntactic: in Java, address-taking and dereferencing are implicit whereas in C, explicit operators are used to distinguish pointer semantics from other operations that don’t exist in Java.

Java has simpler syntax, a simpler type system and a simpler memory model and all this typically makes it much easier to use than C. It isn’t lack of pointers that makes it easy because it doesn’t lack them. Ditto for nearly every other programming language.

The point is (har), if you fundamentally can’t wrap your head around pointers in C then you are unlikely to fully grasp their thinly veiled counterparts in other languages, though you might be able to fake it for much longer. The next time somebody tells you that you don’t need to know how pointers work because you don’t use C, kick them in the balls. Or disagree with them, whatever works.

I am generally opposed to FUD on principle, but I’ll make an exception for this.

Ruby 1.8.7 introduced the Enumerator class, which is used to represent enumerations as first class objects. If you call almost any iterator method like map or times and don’t pass a block, you will get back an Enumerator object. You can call next on this object to incrementally generate successive terms of the enumeration, but that would be boring. Because Enumerator itself is Enumerable, you can instead call further iterator methods on it, chaining them to your heart’s content. I was doing exactly that when it all went horribly wrong:

('a'..'z').cycle.drop(13).take(26).join

I assumed that would return "nopqrstuvwxyzabcdefghijklm" but it actually returns EXPLODE. This is because drop(13) skips 13 terms and then, rather than returning an Enumerator for the rest of the terms, it tries to convert them to an array. Since cycle is unbounded, that array is bigger than Jesus. The implementations for cycle and drop must be very efficient because my computer flatlined in the fraction of a second it took me to lunge for Ctrl-C.

Attached is the “correct” version of drop which I’ve called skip. In theory, cycle is working properly but I’ve caught my balls in that bear trap a few times now and I have to wonder if a one-shot server killer like this should really exist in such a safe and friendly language.

Baked into Ruby 1.9 (which you are using, right?) is the scary sounding Oniguruma, a regular expression library that is better than the other one somehow. One way in which it is better is that it supports named groups, which means you can give a name to a sub-expression and reuse it. It looks like this:

(?<woot>abc+)\g<woot>

The first part defines a sub-expression named woot as ‘abc+’ and matches it. The second part matches it again by reference. As my fellow cleverites have figured out, appending {0} to the definition makes it do nothing except define the name, which allows you to define a bunch of named subs at the beginning of your regex and keep everything tidy and modular.

But the fun doesn’t stop there. It turns out that the references are dynamically bound, which means that they can be forward references and self-references, which means that you can build elaborate recursive expressions, which means that they aren’t really regular expressions at all, more like parsing expression grammars. Radical!

Unfortunately, they still look like regular expressions and are thus inscrutable to all but the most holy of Perl monks. We can fix this the same way we fix everything in Ruby: with a DSL.

The attached code implements a Regexp builder language. The builder provides readable methods corresponding to the Regexp operators:

/abc*/                   => /abc*/
regexp "abc*"            => /abc*/    # compile Regexp from string
"abc*"                   => /abc\*/   # quoted string
cat 'a','b','c'          => /abc/     # concatenation
alt 'a','b','c'          => /a|b|c/   # alternation
zom 'x'                  => /x*/      # zero or more
oom 'x'                  => /x+/      # one or more
zoo 'x'                  => /x?/      # zero or one
rep 3,'x'                => /x{3}/    # repetition
rep 1..3,'x'             => /x{1,3}/
rep [3,nil],'x'          => /x{3,}/
rep [nil,3],'x'          => /x{,3}/
cap /abc/                => /(abc)/   # capture group

define(:foo) { /abc/ }   # prepend "(?<foo>abc){0}" to final result

define :foo => /abc/,    # alternate syntax for short patterns
       :bar => /xyz/     # prepends "(?<foo>abc){0}(?<bar>xyz){0}"

The builder block returns the final expression, which is prepended with any named group definitions that appear. The definitions can be used to create a symbolic parser, much as you would with yacc or antlr. A sample grammar is also attached.

The critical flaw with this parser is that it can’t generate a parse tree. The MatchData object returned by Regexp matches can only have one sub-match per named group, which is utterly useless for any non-trivial grammar. You can use this parser to search for complex structures, but you can’t use it to deconstruct them, which is a total bummer.

Please do share any ideas you might have about how to work around this limitation.

Programming is better than electronics.

With programming, you don’t have to worry about how strong your code is or whether it will burn out if it processes data too fast. You don’t have to collect bits of code in a bin and rummage through them looking for an else. And the else isn’t called TQN12C983-B, it’s just called else. And when you have an idea for a program, you can just open your text editor and write it. You don’t have to etch or drill or melt any lead. And if you make a mistake, you can just fix it. And your code won’t kill you if you touch it.

So then why is electronics so fucking cool???

Here’s a spooky Ruby feature I didn’t know about until just now:

flipflop = proc{|a,b| if a..b
                        true
                      else
                        false }           => #<Proc>
flipflop[nil,nil]                         => false
flipflop[1,nil]                           => true
flipflop[nil,nil]                         => true
flipflop[nil,43]                          => true
flipflop[nil,nil]                         => false
flipflop[nil,nil]                         => false
flipflop['woot',nil]                      => true
flipflop[nil,nil]                         => true

As you may have guessed from the variable name, or deduced from the behavior, this proc acts as a stateful toggle switch. Pass something truthy as the first argument to toggle it on and likewise for the second argument to toggle it off. Where is the state of this toggle switch stored? In that range used as the condition of the if statement.

…what?

That’s right, apparently if you use a range literal alone in a conditional statement, it means “between this being true and that being true”, which totally makes sense when you’re on acid. Every time the range literal is evaluated, it evaluates each of its operands in boolean context and adjusts its internal state accordingly.

But wait, that range literal is re-created every time the condition is evaluated, so how can it remember a state? How is that possible? What the fucking fuck!?

Well, it isn’t possible. Sadly, this all comes down to filthy, filthy magic. The parser syntactically detects that the range is in a boolean context and grants special powers to its AST node. Because the flip-flop state is associated with the code itself, all distinct evaluations of that code will share the same state. One conditional range literal == one toggle state, no matter how many closures you wrap it in or how many times you #clone them. If this strikes you as a potential reentrancy problem then you’re starting to get the hang of this.

Ruby is chock-full of these zany features, though this is the weirdest one I’ve found so far. If you wonder why all of the Ruby VM projects are taking so long, this should give you a subtle hint.

If you’re actually considering using this, one of the IronRuby developers reverse engineered it in excrutiating detail. Godspeed and may luck be with you.

Cannot find an attribute directive with a name attribute with a value “name”, the value of this name-from-attribute attribute

— actual Java error message

This is a screensaver of sorts for the StackOverflow DevDays conference. I have no hope of actually attending this thing, but at least my code will be there, assuming I “win” the contest.

It gave me a chance to dive into the oft overlooked Windows Console API. It is remarkable how much engineering Microsoft puts into even the most obscure APIs. I bet they have an API generating script in Ruby to which you can give a few lines of declarative DSL and it will generate a hundred functions worth of blub.

The caveat of keeping every little project on my hard drive in a git repo is that I often forget to commit/push when I make changes, leading to mega-commits with descriptions like “everything since last march, whatever that is”. Also, I tend to forget about topic branches, accidentally committing maintenance fixes to some radical experimental branch that’s been broken forever.

So, I rigged my bash prompt to remind my about all this:

Purple is the current branch, red files are changed but not added, green are changed and added. If the line gets too long, it starts condensing things to (31 files) or the like.

The magic behind this is the Ruby script below. To make it work, point the PROMPT_COMMAND environment variable at the script and set PS1 to whatever you want on the second line (:: in that screenshot).

Update: This has already been done in pure bash, with more features: http://volnitsky.com/project/git-prompt/

Guess #{“which #{“programming #{“language #{“supports”} unlimited”} nested”} interpolated”} strings?

These embedded gists are nicely marked up for styling.. enough that you can do nifty things like limit the height and add a scrollbar:

.gist-data {
  max-height: 400px;
  overflow: auto;
}

You can also mess with the syntax highlighting. I used this test file and the Firefox DOM inspector to build this template: