There's more to being an Internet superstar than endless hookers and blow, and it may shock some of you to know that I have a real job that funds my lavish lifestyle. For the last few months my web developer colleagues and I have been building a website that has just moved into beta stage, and we need some testers from the general public (hint) to run it through its paces. But first, a long-winded introduction.
Freedom of speech has moved beyond a feel-good ideal and into the driving seat of society's progression. In fact, there is a penalty in not writing more. Information sources, from the loftiest BBC News to the trashiest entertainment blogger, are often judged on the volume of their output as much as the quality. This is good news for people loyal to the one source, say the barfly who reads The Daily Telegraph every afternoon down at the TAB, but it's a death by a thousand cuts for those who actively seek out information.
There was once a time when knowledge was a commodity, those who had it could influence those who didn't by selectively releasing crumbs to keep the plebs in line. I guess that's still true today, but the constraint doesn't lie with devious social engineers, but by the sheer volume data we're assaulted with. One solution is to offer teasers to information, those who are interested will bite and those that aren't are granted a quick exit. In the Internet land, these teasers are called RSS (Really Simple Syndication) feeds and these days you'd be hard pressed to find a website without several of them. RSS feeds can be received and stored in things called aggregators. I call them "things" because an aggregator can exist in a number of forms, such as a program, a website, or indeed most modern browsers.
RSS feeds are old news for some of you, but for those of you who aren't sure what I'm talking about, try clicking on one of those orange and white icons that are all over the Internet. There are two on this page alone. Boredomistan has a feed for new articles and each article has one for new comments. Read that sentence again. Every speck of content has an associated RSS feed, and this level of access is not exclusive to Boredomistan. These teasers can be so numerous that they can now been seen as content themselves, taking us back to our original headache.

Enter FeedZero.com. FeedZero.com is an RSS feed aggregator that uses Bayesian filtering to sort out what you like from what you don't. It's free, hopefully simple and honestly, rather innovative. Enough of my grandstanding, if I tell you all about it then you won't use the site like a joe user, which is what I want you to do.
If you've never heard of Bayesian filtering before, then good, that means that it's doing its job. Bayesian filtering is a method of sorting data based on a set of rules that grows to be more intelligent the more that the user interacts with it, and it's very good at its job. Bayesian filtering sits behind the scenes and does a lot of the grunt work that we take for granted. Search engine spiders use it to make better sense of webpages. eBay uses it to detect fraudulent users. Anti-spam products run your emails through Bayesian filters to determine the difference between an email full of LOLCats and one that's selling cheap C1Al1$. The modern Internet would be unrecognisable without it.
Enough about the background, give the site a try. If you're devious, try and break it. We need feedback on bugs, user interface experiences and suggested features. Feedback can be emailed to me or a comment can be left here. There's also a forum thread with some early impressions. Stop reading. Go!
«PreviousNext»

Sorry, due to my incredibly hectic, time-poor modern lifestyle, i don't have time to beta test for you free of charge ;) I'm too busy reading every single post on sites/blogs that simply don't have any content i'm not interested in, hence why i read them (Eg. Dereck Lowe's Blog, Pharyngula, EurekAlert, Bad Science, Physorg, Dans Data, plus journals, etc)
Some things i would ask off the top of my head before even using it:
Should we be huddled under a Markov Blanket while using it? (Hooray for maths jokes!)
Once you've fed it a few criteria clicks and it starts to do it's thing "transparently", how are you supposed to know if you've missed anything you actually would've preferred to read, but which has been filtered out? If you have to review a list of filtered content to make sure you're not missing anything, that would defeat the purpose of the algorithm. I guess maybe there's an initial period where you check until you're more-or-less confident with the choices it makes?
If it works on certain strings, does it include synonyms? For example, if i mark a story about new advances in X-ray Diffraction, will it know to show me news on Crystallography, since it's the same shit? Does it work on headers/titles of stories or do RSS items come with their own universal standard meta data tags, so your software can start showing me anything evolutionary biology-related and then figure out i don't care about things that are marked "Intelligent Design" (unless it's being ridiculed - in which case, are there tags denoting sarcasm?)
I can't see how this kind of thing won't run up against the same problem the aforementioned porn-filter/censorship software does, ie. that some things will slip through the cracks that shouldn't and others will be filtered when they shouldn't. It's honestly not a difficult thing to quickly scan a page of headings and figure out what you want to read and not read, and doing it manually like that ensures you never miss anything you would want, albeit it takes more time. Plus, i'm not sure many people can actually digest so much information that they have call to use something like this.
One last thing: What's with the wanky name/logo? "Zero" what? Unwanted stories i'm guessing? What's with the little superscript zero at the end? Is that Zero to the power of zero, in which case you might think you'll always only ever get one result out of it? (more maths jokes!) Or is it Zero degrees, indicating the amount of latitude your awesome algorithm gives to evil unwanted stories? Or, as i actually suspect, is it completely superfluous and a good indication that that corporate image package you forked out for was a complete waste of money?





...how are you supposed to know if you've missed anything you actually would've preferred to read, but which has been filtered out?
Trust in the system, after about a dozen manual ratings it does its job spookily well. If you find that something is misclassified, you can correct it and the system will become smarter (in fact, you can literally see it grow smarter with the changing probabilities that a manual liking/disliking triggers). Of course nothing is foolproof, but I can give a guarantee that you will miss more interesting articles through human fatigue/ignorance than the system not picking up on it.
"Remember, a Jedi can feel the Force flowing through him."
"You mean it controls your actions?"
"Partially. But it also obeys your commands."
Does it work on...
It works on a large set of feed data, but puts the most weight on article bodies, titles and authors.
...does it include synonyms?
No. Shhh. Words important enough to have synonyms (thankfully they're mostly nouns) are usually used in articles with the same gist, so are treated similarly. That's a good way of thinking about the system really, it understands the "gist" of the article.
I can't see how this kind of thing won't run up against the same problem the aforementioned porn-filter/censorship software does...
Porn and censorship filters take the easy (and stupid) way out and work off black/whitelists. FeedZero works more like an email spam filter, though with one massive difference. Spam email is deliberately crafted to avoid detection, whereas syndicated articles are not. I know that "IT'S GREAT SO SHUT UP" isn't much of an answer, so the best measure of its effectiveness will be how well it works for you.
What's with the wanky name/logo?
Yeah I'm not a fan of the name either, but good luck in finding an available .com (or even .com.au) that's even half sensical. It's no coincidence that a lot of newish sites end in a hard "r", e.g. Flickr, Frappr. Even Digg had to be spelt imaginatively.
There should have been a logocomp in the channel!
Photo(logo)comps fill up my cache with far too much homosexual porn. For the uninitiated, don't click and don't click.



Comment posting is disabled.