Monday, December 31, 2007

Defensio: smarter than the average troll.

So, I've been using the Defensio API lately. There's a "Defensio on Rails" plugin that I've had success implementing (with a few hacks specifically for the site using it, thank God for open source).

I've always been meaning to integrate Defensio into the anonymous blogging site I run, since I figured that sooner or later someone was going to target it for spam.

And I was right.

But the spam that eventually came wasn't your every-day automated "buy viagra now" spam. What the site got hit with was curious hybridization of trolling and spam; it was spam, no doubt, but it was clearly posted by actual people attempting to troll the site. Let's call the phenomenon "Troll Spam" for now.

After swatting away the first few posts by hand, I quickly looked into Akismet and Defensio's APIs. I was on a "short schedule" since I hadn't planned on spending my entire day implementing an anti-spam filter or doing the filtering by hand.

I eventually chose Defensio, and had it up and running in less than 2 hours. Why?

* Good API documentation: even though I was using the Defensio on Rails plugin (see above), I still needed to know the mechanics. Going through the Defensio API page was a breeze.

* RSS feed for "innocent" and "spam" pages. This was incredibly convenient for checking the results of the Defensio on the production site. I have my own little admin section that looks like it came out from a cracker-jack box, but Defensio's RSS feeds make it much better to monitor, real-time, what's ham and what's spam on the production site.

* Statistics. Tells me how the system is working. Not useful now, but fun to watch!

But by far the most important feature was that Defensio learns. The Troll Spam all had a similar theme, but different content. If you've ever seen a "raid" by 4chan or Something Awful goons, you'll know what I mean. Certain words, phrases, etc, repeated together. After training Defensio to HULK-SMASH the next few bursts of Troll Spam it's caught on to my intentions and started filtering out the bull-shit while still letting in the good-shit.

Meanwhile, the site continues to operate normally, the regulars not even realizing there's a Secret War happening!

Now, I know I could have spent 20 minutes creating my own filter that would have whacked posts that had these phrases in them, but this is so much better: Defensio has even started marking other posts by these trolls as spam as well. One of them thought they were going to be clever and posted an entire 3-page "ABC REPORTS" article. Never hit the front-page.

At the time of this writing they've basically just given up and started posting a long stream of obscenities (which is easier for them I guess), but none of it is getting through...

Statistically, Defensio has about a 50% accuracy, but it's been implemented less than 24 hours and its learning very, very quickly.

Smells like victory to me.

My only regret is that I didn't get Defensio implemented sooner -- I missed out on a lot of good learning data.

HTML5's Canvas tag: are we using it?

Dear Lazy Web (all 6 of you that read my blog in other words),

Are we still using the Canvas tag? I've been researching it, looking for a "cheap" way to do some simple graphics manipulation for Firefox-based browsers, and as I did I came across Apple pulling the patent card for Canvas. I remember seeing some radically awesome stuff done using Canvas but if there's a chance that it'll be pulled from the browser in the future because of the patent I'll be looking to use something else.

So, we using it or not?

PS: Blogger's fucked up Rich HTML editor doesn't escape HTML tags when you type them into the editor, so guess what happened when I posted this the first time?

Friday, December 14, 2007

The tools! Use the oDesk tools!

Wow, I totally cannot stress this enough: monitor your providers. If the job is really important, hire a project manager.

I was browsing through this thread on the oDesk community forums; posted a few months ago, but it recently surfaced again when someone posted a little ditty on "Providers that lie" further down the thread.

One problem both buyers encountered was the assumption that they didn't need to monitor their providers at all. It is vital to communicate constantly and ask for tangible updates from your provider when starting a new relationship. You don't know him, he doesn't know you. He might think he's doing "OK work" and in your eyes it's garbage. He might be trying to cheat you and sweet-talk his way out of it. You might not have explained your project clearly and now the provider is wandering astray.

oDesk has plenty of tools -- the work diary, the time analyzer, etc -- to keep you and your provider on the same page, but you have to use them to be an effective buyer. This is less of an issue once you and your provider have established a bond of trust, but before then you're just tenuous associates.

Another problem both buyers faced was not knowing the difference between the hourly model and the fixed-price model.

The hourly model is based on labor: they work, you pay them. This is similar to hiring a contractor to come paint your house. You pay him by the hour, he paints. If you're going over-budget, you can stop paying him, but that means he'll stop working -- even if he's only painted half the house.

The fixed-price model is based on the tangible end-result. You're not paying for the labor involved in creating the product

With both the hourly model and the fixed-price model, you have the same goal: completing the product. However, in the hourly model you pay for the labor to complete the product, where-as in the fixed-price model you're paying for the product itself.

By now, I'm sure you're wondering what this has to do with the thread I mentioned earlier. Here's the answer.

You can't refund labor.

You can't turn back the clock and give the provider his hours back. If he's been working on your project for 6 hours straight and suddenly you decide you don't like how it's turning out, you can't just straight gank his cash -- you can ask for a refund, and they may feel obligated to give you one out of a sense of professional courtesy (this happened to me with one provider), but not necessarily.

Ah, but what about fraud, right? What happens to cheaters?

First we have to distinguish the difference between fraud and disputed hours. Fraud means the provider was "cheating" -- they said they were working, but they weren't. No labor, no pay. However, disputed hours can stem from a few issues: maybe the provider was working on your project, but he wasn't working diligently because he was too busy chatting with his girlfriend by IM in the background. Maybe he was working, but doing it abnormally slow, trying to bill extra hours.

oDesk has policies in place for dealing with disputed hours and fraud. I've never dealt with disputed hours myself, but oDesk publishes their dispute policy online.

You're probably asking yourself right about now why you should bother with the hourly model at all. After all, isn't it safer to work with a fixed price project?


For complex projects, an hourly model is the way to go. When you use fixed-price, the more time a provider spends working on your project, the less they make an hour. So it's in the providers best interest to take short-cuts and be messy. For a smaller project, such as a personal website, fixed-price is much more suitable: the provider has plenty of time to finish the assignment. No need to rush or be sloppy.

An hourly model assures the provider that he has time to work on polishing your project and making sure everything is going as smoothly as possible. If a serious issue comes up without warning, he has the option of fixing it the correct way instead of the fast way.

To sum it up, let's say you're going to a restaurant:

The restaurant employs the hourly model to pay the chefs to cook food for the customers.
The restaurant is paying for the labor.

The customers employ the fixed-price model to pay the restaurant for their delicious meal.
The customer is paying for the product.

I hope that analogy made more sense to you than it did to me.

Anyway, use the tools. Keep an eye on your provider. Ask for tangible goods -- screenshots aren't good enough. If the provider says he isn't ready to set it up the project for you yet, ask the provider to use oDesk Share. oDesk Share allows you to view a provider's desktop in real time.

oDesk will connect you with qualified (and sometimes not) providers: it's up to you to manage them. If that sounds like hard, try to hire a project manager who will micromanage everything for you.

Man, I love me some bold text. Bold, yet smooth.

Wednesday, December 12, 2007

Logins! Logins! Logins!

A few days ago I received the IT equivalent of a battlefield promotion and suddenly found myself tasked with commanding and conquering a team of developers that I had to first assemble. I was given end-to-end hiring responsibilities, and since we decided to use oDesk to search for talented developers, it meant that I now had three accounts: provider, buyer, and now the enigmatic "Company User."

Also, there's this reality-bending feat:

Yeah, that's two of me. As if one weren't bad enough.

Frankly, what I need right now is some kind of keychain. I've got 3 logins for oDesk alone, and God knows how many more for various sites on the Internet. Hotmail, Yahoo!, Google... I know there's software out there to "coalesce" the experience, but that software doesn't integrate with the browser at all, isn't stored in the cloud so I can easily move from one computer to another seamlessly, doesn't have an easy-to-use every day interface...

Hey, I'm a PROGRAMMER! Maybe I'll make a website and Firefox Addin that does just that and become a millionaire. Man can dream, can't he?

Hm, I'll probably post more about the oDesk hiring experience later. Seems like good blog fodder, don't you think?

Saturday, December 08, 2007

oDesk Team Software vs Privacy Concerns.

You know, when I first started using oDesk Team (that's the software that monitors your keyboard / mouse activity and periodically took screenshots), I used to be super paranoid about what kind of pictures it was taking.

I was always staring at my Work Diary screenshots, mumbling to myself when I accidentally tabbed to the wrong browser window and oDesk Team happened to take a screen-shot right at that very moment.

Hasn't been like that for months, to be honest. Months? More like a year. Christ, how long have I been on oDesk? Feels like forever.

One thing I learned about the Work Diary's screenshots: Nobody cares. Nobody. Not unless you're cheating the buyer, or the buyer thinks you're cheating him, or the buyer's new to the system and is staring in wonderment at the screenshots page.

The real appeal must be that "safety-net" feeling for both buyers and providers: the buyer can't cheat you because you've got proof (work diary) that you worked those 8 hours, and the provider can't cheat the buyer because the buyer has proof the provider wasn't working at all (again, work diary to the rescue).

Don't get me wrong, providers try every once in awhile: you'll see an angry buyer posting on the community forums because some guy in Urbekestianiza was running a simple mouse macro and stole about $1k from this guy, followed shortly by oDesk personnel announcing they're "handling the issue." I suspect they've got some guy that just monitors the forums all day, waiting for those kinds of posts to crop up so he can alert the rest of the A-Team.

Honestly, I'm surprised at the number of "cheats" that still try to make some quick illegitimate cash. There's, like, a 2 week lag time before the money is actually available for withdrawl in your account, and then there's another 3-5 days for the withdrawl to be posted and end up on your Payoneer MasterCard or ACH-enabled bank account.

There are way easier, way more successful ways to scam someone out of money than trying to fool them on oDesk. The company is just way too alert and quick when handling these kinds of issues.

Back to the issue of privacy: eh? Seriously, what are you doing while you're working that makes you feel as if your privacy is being invaded? Typically when I'm working that's all I'm doing -- working. Not chatting, maybe listening to music, definitely not video-camming nubile young women.

Of course, maybe it is because I have another, dedicated work computer -- the one I'm typing on right now has a billion different windows I wouldn't want anyone to see. My work computer is completely clean: some dev tools, that's it.

This post went on way too long.

From the past, a blast.

Here's a post someone made about 6 years back: "Why Trillian Sucks"

Not all things open source go the way of, say, Ruby on Rails.

Right now when I see Jabber instant messaging mentioned I'm looking at an "ecosystem" of poorly written clients that, aside from the Google Talk system, haven't gained much tracton. I suspect that Google Talk hits the sweet spot because it integrates email contacts with live IM, voice calling (which, apparently, no other Jabber client has), reliable file transfer, and the fact that it was integrated into an existing product, which means at launch it had a few million users.

As I see it, the instant messaging world has stopped moving forward. No major, innovative improvements have happened for a few years. So, why is Jabber so far behind? They apparently still don't have a video / audio chat standard, or if they do there's no thrust to get them implemented into popular clients.

I guess what Jabber needs is an innovator: someone to blast into the Jabber instant messaging space, and drop all those features people are waiting for (audio, video, emoticons -- features regular consumers are waiting for) and leave the rest of the Jabber community in shambles.

Things on the 'net seem to work best that way. Firefox's destructive (marketplace-wise) rampage across the Internet is a testament to that. For the first time in a long time Microsoft started ramping up Internet Explorer development when they realized Firefox was here to stay. Why not the same with Jabber?

For now, though, I'm comfortable using three separate instant messaging clients. I got the resources to spare, and I gave up on Trillian a long time ago.

Friday, December 07, 2007

Having problems updating to Rails 2.0 under Windows XP?

Seeing this error while trying to install Rails 2.0?
ERROR: While executing gem … (Zlib::BufError) buffer error
gem update --system

Then run
gem install rails -y --source
and you're all set!

Yeah, I didn't know my RubyGems was that far out of date, either.