Enter the maze

Oh No! NOT Paris Hilton

Fed up of reading about B-list celebs when you were interested in seeing the Eiffel Tower? Sam Tazzyman, a student at UCL, may be able to help.

One person standing out from the crowd

You think you're pretty Internet-savvy, right? Think you know the best web sites, the best places to find out news, games, information, see videos and hear new music? What about search engines - think you know the best of those? Chances are you do - you're probably great at using the Internet to find out information for yourself pretty quickly. But do you know how those search engines work? Do you know about the logic behind search engines? Novices use simple searches. If you have one of the computational thinking skills - thinking in logic - then you can be a real whiz at searches, meaning you can find what you want even more quickly, and stay out front!

The basics of how search engines work is pretty simple. They trawl through the vast amount of information that is the Web and store information on the pages they find on gigantic hard drives. They pull out the words and store the words in them in a way that makes them easy to search. When you search the Internet, the engine actually searches this big collection of words. How does it know what to look for? It does this using something called Boolean logic.

When you type in a one word search, the search engine simply brings up every page that contains that word. But what about when you type two words? Then you probably want to find all the web pages that contain both words. This is where Boolean logic comes in. You need the idea of a "logical AND".

Suppose you love celebrity gossip so type in "Paris Hilton". You don't of course want pages about the city. That's fine though because the engine will search for all the web sites that contain BOTH the word "Paris" AND "Hilton" too.

A Venn diagram of Paris Hilton

It turns out that Boolean logic is closely linked to the idea of "sets" that you might have come across in school maths. A kind of picture used by Mathematicians called a Venn diagram shows the link.

The rectangle represents all the pages on the search engine's hard drive. The pink circle on the left represents all the web pages out there containing the word "Paris", and the blue one on the right represents all the pages containing the word "Hilton". Notice that these are both completely contained within the rectangle representing all of the pages on the hard drive, since any page in either one of these groups must also be on the hard drive. The area we are searching for is the area shaded in purple - this is the cross-over between the pink "Paris" set and the blue "Hilton" set. In Boolean logic terms it is the logical AND operation (things that are both about Paris AND about Hilton). In terms of sets exactly the same thing is called the intersection between the two sets.

Oh No! Not her again!

Now, what if we weren't actually interested in Paris Hilton but the city and it was really irritating to be inundated with gossip pages on a D list celeb who just happens to have been in the news recently when you search for just "Paris".

We want to be able to search for all the web sites not containing "Hilton" that contain the word "Paris". Just as with AND it can be shown in a Venn diagram. First let's look at finding all pages without the word Hilton in.

A Venn diagram of Paris Hilton: NOT Hilton!

The rectangle still represents all of the web sites on the engine's hard drive (called the universe by mathematicians). The white circle on the left represents all the sites containing the word Paris, and the blue one on the right represents all the sites containing the word Hilton. This time the whole white area is what we want - it's all the sites except those that contain the word Hilton - which as it includes the outer rectangle area too is most of the web pages on the internet! (Too much!) It is known as the complement to the set of web sites containing the word Hilton. It is all the pages that are NOT Hilton pages so is called the NOT operation in Boolean logic.

To get the pages about Paris that don't mention Hilton, we just combine what we've learnt so far. We want Paris AND (NOT Hilton). Again thinking in sets it can be seen in the Venn diagrams. It is the same as finding the overlap - the intersection - between the pages that do mention Paris, and the pages that don't mention Hilton (areas in the Paris circle that are also white in second diagram). It is the pink area of the first Venn Diagram, the set of web sites that mention Paris AND don't mention Hilton.

Posh or Sporty Venn Diagram

Finally, suppose you were a feature writer and had been given the job of finding out what the Spice Girls had been up to in their solo careers. (What you mean they aren't famous this month either !?) You might then want to find web sites that contain either the word Posh or the word Sporty. The engine will need to search all of its web pages (the universe) and return any that contain either one of the two search words (see the Venn diagram)

As you've probably worked out, this time we want the green shaded area, which is the combination of the set of web sites containing the word Posh with the set of web sites containing Sporty. This is called the union of the two sets, and in Boolean logic, the logical OR operation.

Set and Logic Notation Equivalence

To work with these ideas quickly and easily, mathematicians and computer scientists have invented a notation - a code - for them. Using the code we can write down our searches in a much shorter, more handy way using symbols for the operations. Mathematicians tend to use the set symbols and Computer Scientists their logical equivalents.

Now you know the basics of how a search engine looks for things, how can you use this knowledge to help? Well, different search engines want you to type different things to represent "AND", "OR" and "NOT" - they each have their own code just like mathematicians and Computer Scientists. The following is an example of the way you would do it on one search engine:

To search for: Paris AND Hilton type Paris Hilton
To search for: Paris AND NOT Hilton type Paris -Hilton
To search for: Paris OR Hilton type Paris OR Hilton

So to put in a logical "AND" you just leave a space, to put in a logical "NOT" you put a minus sign "-" before the word you don't want, and to put in a logical "OR" you write "OR" in capitals. You have to be careful though. Each search engine has its own notation - some use a space for AND and some use a space for OR, which as the Venn diagrams show mean they will give completely different results.

Now, understanding a little bit of the logic the search engines use, with a bit of thought you can write much cleverer searches and find what you want more quickly! You can stay ahead of the game.