Posts Tagged ‘search engines’

Google Rolls Out a Browser-Based Content Farm Blocker, Helping Users Sort the Wheat from the Chaff

It seems like everyone in the twitterverse, the blogosphere, and tumblrdom is getting fed up with so-called content farms--those mostly-useless text generators that turn out articles based on the terms people most commonly search for. Now the Googleplex is getting involved, creating an extension that allows Chrome users to tag and block certain sites that come up in their Google searches.

The extension, called Personal Blocklist, lets users bar sites they deem to be worthless or untrustworthy from future search results using an extra button embedded in each search result. Anytime Personal Blocklist scrubs results from a page of search results it notes their removal at the bottom of the page and gives you the option to unblock them.

That doesn’t just empower users to customize their search criteria--it also provides Google with a strong indicator of what sites its users would like to see pushed down in their search results, helping Google refine its own search parameters. The extension won’t kill the content farm, but with a little help from users (like you!) it should help push them down so more relevant cream can rise to the top.

[NY Times]

Google Instant Search Displays Full, Real-Time Results As You Type

Google’s newest search tool, unveiled today, starts giving you results the instant you start typing. With Google Instant, the Internet overlords are taking search-engine prediction to a new level.

Sign in with your Google account and go to the main search page, and you’ll see it still looks the same. But start typing something — “Popular Science,” for instance — and Google brings up a page of instant results, which change as you keep typing. You don’t even have to type a full search term to find answers, because Google Suggest completes your thoughts for you.

Refining as you go allows for more effective searching — you can change topics mid-sentence if you don't see what you're looking for.

According to Google’s blog, developers realized that people read much faster than they type — they take about 300 milliseconds between keystrokes, but only about 30 milliseconds to glance at another part of the page.

Google Instant capitalizes on this, allowing you to scan a page full of search results even as you type.

There’s more coming from Google’s press conference today at the San Francisco Museum of Modern Art. You can get the real-time updates from CNET’s insta-blog.

Microsoft’s Engkoo Scans the Web to Teach Itself How to Teach You Languages

It sounds a bit Google-ey, what with all the data mining across the Web and all that, but it’s Microsoft researchers in Beijing that are crafting an online Chinese-to-English dictionary that could become a model for language learning tools bridging any two tongues. Engkoo.com pulls its database from the Web itself, cross-referencing sites that exist in both English and Chinese, searching existing online dictionaries, and mining other sources to create a rich resource for both learning and translation.

By drawing on the ever-evolving organism that is the Internet, Engkoo (loosely meaning “English vault” in Chinese) should be able to stay abreast of changes in colloquialisms and idioms in both the source language and the one it is translating to. In theory, it should also be able to catch errors or mistranslations easier, since an error is unlikely to be prevalent across the entire Web.

When a user searches for a word or sentence in either language – Microsoft plans to adapt the system for other languages but this initial phase is focused on Chinese-to-English translation – the software driving Engkoo searches through the database for the relevant data and draws upon statistics to translate as accurately as possible. Where possible it links to the sources where it drew the initial data from and often can provide example sentences for a word or phrase.

Engkoo is also a multimedia experience. Computer generated audio translations exist for many English words and sentences to help Chinese speakers with their pronunciation, and researchers are cultivating a video dictation library so users can see the way native speakers’ lips move as they enunciate.

Next up? Ultrasound images that show the movement of the tongue inside the mouth, a critical step in learning pronunciation but one that is often hidden from plain view. Researchers are already gathering ultrasound data for the library, but those of you who find that kind of imagery less-than-savory, worry not; the black-and-white ultrasounds will be converted into cartoon animation to make them a bit more – how do you say? – palatable.

There’s also a mobile app in the works that will run on Windows phones – other mobile OS types are being considered – that allows for translation on the go. Which means perhaps we’re seeing the first real baby steps toward the universal translator you can keep in your pocket for real-time translation of any language into your own.

[WSJ]

Algorithms for Searching Among Chinese Characters Could Provide Effective Genome Search Engine

As scientists decode more and more genomes, the tree of life gets pretty complicated. It makes tough work for geneticists or other researchers who want to understand which organisms share which genes -- there are just so many comparisons. So there's a growing need for a better, easily searchable bioinformatics database.

A Chinese computer scientist has a suggestion: mimic the way search engines index Chinese characters.

Technology Review's blog helpfully describes why search engines like Google are so fast and why current bioinformatics search systems are not. Most search engines use an inverted index -- rather than compiling a list of every single Web page and all its words, for every single word, they compile a list of the places where it appears.

Bioinformatics searches, by contrast, use a couple algorithms that basically compare the data from one genome to the data from another. This is relatively fast when there are only a few genomes, but as they grow exponentially, the searches take much longer.

A simple solution would be to switch to the Google approach -- for every base pair "word," make a list of the genes where they appear. But words are easy to spot, because they have spaces between them. Base pairs do not.

As it happens, Chinese characters don't, either, but search engines have gotten around this. Wang Liang, a computer scientist at SOSO.com, one of the big three search engines in China, says the trick is to segment the words into "n-grams," words that are n letters long.

Tech Review explains: There are 1-grams for one-letter words, 2-grams for two-letter words and so on. A search for a 3-letter word, like ABC, can be done by searching for AB and BC. Some Chinese search engines work this way, by indexing all the 2-gram combinations.

OK, then, how many n-grams are in a genetic word? The nucleotides A, T, G and C are only 1-grams, which makes them pretty useless as search terms. So some fuzzy math is required. Liang says DNA sequences follow Zipf's law, which basically states that in any long document, half the words appear only once. This theory can be used to find an average length for DNA "words."

Liang studied the genomes of arabidopsis, aspergillus, the fruit fly and the mouse, and found that a good average word length is 12 letters. Therefore, the best way to index genome data is to use 12-grams -- that is, 12-letter combinations of A, T, G and C.

With that vocabulary, a Google-like inverted index becomes possible.

[Technology Review]


Warning: require_once() [function.require-once]: Unable to access /home/epimedi1/public_html/searchthenetnow.com/a1fb980257ffa48e266b1a95eca89c01b4e64d4d/linkfeed.php in /home/epimedi1/public_html/searchthenetnow.com/wp-content/themes/searchthenetnow/footer.php on line 29

Warning: require_once(/home/epimedi1/public_html/searchthenetnow.com/a1fb980257ffa48e266b1a95eca89c01b4e64d4d/linkfeed.php) [function.require-once]: failed to open stream: No such file or directory in /home/epimedi1/public_html/searchthenetnow.com/wp-content/themes/searchthenetnow/footer.php on line 29

Fatal error: require_once() [function.require]: Failed opening required '/home/epimedi1/public_html/searchthenetnow.com/a1fb980257ffa48e266b1a95eca89c01b4e64d4d/linkfeed.php' (include_path='.:/usr/lib/php:/usr/local/lib/php') in /home/epimedi1/public_html/searchthenetnow.com/wp-content/themes/searchthenetnow/footer.php on line 29