~ Loki on Yahoo's filter - a short pointer ~
By Loki (slightly edited by fravia+)

to essays
Published @ http://www.searchlores.org in late February 2006 | Version 0.01

~ Loki on Yahoo's filter - a short pointer ~
By Loki

Lately, all eyes are turned on the diva Google, and every sidestep is noticed, blogged, commented, flamed etc.. For good reason or not. I don't really care. But people tend to forget about the other big players in this industry, be it for good or bad things.
New interesting features coming out of Yahoo's labs are ignored, useful MSN sliders are underused, yet nobody miss the latest crappy packaged solution promoted by Google and his partners. And it goes the same for all bad stuff..

Recently, Google launched his China version of websearch, generating a lot of discussions about censoring results for chinese users at the request of China government.
Everybody knows by now the visual proof of this censor, by performing the (infamous) following queries:

http://images.google.cn/images?q=Tiananmen
http://images.google.com/images?q=Tiananmen

Shortly after, some guys published a way to bypass this filter by using capitalised queries, and managed to output uncensored results
(ie: [Tienanmen] instead of [tienanmen]).
See here : http://www.crypticide.com/dropsafe/articles/security/post20060129233439.html
But it was quickly corrected and this trick isn't working anymore.

But what about Yahoo (or MSN) ? Are they also filtering the results ?
Compare the same query on Yahoo (tld .com) and Yahoo China :

http://images.search.yahoo.com/search/images?p=tiananmen
http://image.yahoo.com.cn/search?p=tiananmen

It's not even filtered. You do not have ANY result at all. Do you really think that is a better solution ?
Same goes on for the web search, but instead of having no results you are redirected to the news results, where sources are obviously filtered
and subject to censorship.

http://www.yahoo.com.cn/search?p=test

No problem.

http://www.yahoo.com.cn/search?p=tiananmen

Response: HTTP/1.x 302 Found
Location: http://xinwen.yahoo.com.cn/search.html?p=tiananmen&ei=utf-8&source=ysearch_www_filter_noresult

Bounced to yahoo news. also note the source parameter : ysearch_www_filter_noresult
The usual one is 'ysearch_www_result_topsearch' when it's not filtered.

So. Is there anything we can do, as some did for Google, to bypass this filter ? And how long will it take to be spotted and corrected by
Yahoo's teams ? Yahoo and other competitors of Google don't have the same hype around them, and if you publish something about them,
it won't spread like any Google related news.

I tried to bypass the filter, using similar "poke around" techniques. tried different approaches, mixing caps, adding useless keywords (-dsfasdfds for ex),
multiple quotes etc.. Nothing. But finally, I tried to 'overflow' it, by feeding the query parameter with big numbers of chars.. and it worked !
Apparently, if you add enough '+' before your queries, the filter is bypassed, and you get censor free output.

[tianamen] : http://xinwen.yahoo.com.cn/search.html?p=tiananmen&ei=utf-8&source=ysearch_www_filter_noresult
['+'(338 times) tiananmen] : http://xinwen.yahoo.com.cn/search.html?p=%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2Btiananmen&ei=utf-8&source=ysearch_www_filter_noresult
['+'(339 times) tiananmen] : http://www.yahoo.com.cn/search?ei=UTF-8&fr=fp-tab-web-ycn&p=%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2Btiananmen&meta=vl%3Dlang_zh-CN%26vl%3Dlang_zh-TW&pid=ysearch&source1=ysearch_www_hp_button

'+' is encoded for urls in '%3B'

338 times '+' -> 338*3 = 1014
339 times '+' -> 339*3 = 1017

tiananmen -> 9 chars

338 '+' and tiananmen -> 1023 chars
339 '+' and tiananmen -> 1026 chars

We've reached and crossed the 1024 bytes limit for the value used for the filter. So this query does bypass it :)

But this is quickly changing, between the time when I made those tests and now, they seem to have added more limits, and the query field seems to be restricted to 1024 chars. But if you feed the parameter directly into the URL is will still work (as per late february 2006).

Also, I did not manage to make it work on Yahoo Images.


(c) Loki 2006    nem0@nowhere.org   ahem!  'linux'+'mail'

Petit image

(c) III Millennium: [fravia+], all rights reserved, reversed, reviled and revealed