WTF Is … The SQL WHERE Clause
After the SELECT statement, the SQL WHERE clause might just be the second most important piece of code you learn in your data analysis career. I know that sounds like hyperbole but I really believe it.
SELECT allows us to choose which fields we want to retrieve from our database table.
WHERE lets us laser focus EXACTLY which records within that we need. And that will make all the difference.
Let’s explore the make-up of the WHERE clause, which operators we use for each purpose and how to tie more than one of them together.
SQL WHERE CLAUSE STRUCTURE
We already know that WHERE is the vital keyword but “where” exactly does it go in our overall SQL query structure?
We start with our SELECT statement to specify the field names.
Then our FROM statement to dictate which table we are querying from.
Then we get to the WHERE clause:
SELECT * FROM myTable WHERE ........
What can we put into the WHERE clause?
Good question. Basically anything that will allow us to filter down our results to the exact slice of data we want to see in our results. To do that we use a collection of different operators to specify how we want the data to be filtered.
SQL WHERE Clause Comparison Operators.
We’ll look now at a list of the different comparison operators we can use and see an example of each in action. You should familiar with these from most school maths classes but it never hurts to have a refresher.
Most common comparison operators.
|!=||NOT Equal to|
|<>||NOT Equal to|
|<=||Less than OR Equal to|
|>=||Greater than OR Equal to|
Let’s have a look at some examples of these in action.
If we only want staff members with the first name “Alan”, we can use the WHERE clause and the EQUAL TO operator (i.e. the equals sign =) e.g.
WHERE firstname = 'Alan'
NOT Equal To.
The other side of the coin. If we want all colours of Ferrari OTHER THAN red we can filter for that in two different ways e.g.
WHERE CarColour != 'RED'
WHERE CarColour <> 'RED'
If we wanted to see all of our e-commerce products where sales were less than 100 units, we use the LESS THAN symbol (<) e.g.
WHERE ProductSales < 100
Less Than OR Equal To.
If we want to filter on all customers who are aged 18 years OR below, we use the LESS THAN OR EQUAL TO symbols (<=) e.g.
WHERE CustomerAge <= 18
And so on and so forth.
We can also step things up a gear when it comes to comparison operators by using the LIKE operator. Whereas the previous operators allowed us to filter on specific strings or values, LIKE allows us to use patterns.
By specifying a string fragment plus a wildcard character, we can do basic pattern matching within our data fields to really boost the power of our WHERE clause.
For example, let’s say we have a table called BANDS with a field name BANDNAME.
The table contains a list of musical bands with colour themed names.
|The Blue Notes|
If we only want to return those with the string ”Blue” in their names and had to use the basic comparison operators above only, we would need to know exactly which band names were in there and be VERY specific on each one e.g.
WHERE BANDNAME = 'The BlueTones'
In the real world.
In reality, the table could contain millions of rows that could conceivably contain our ‘Blue’ string and it’s not practical to hardcode each one.
So we use the LIKE operator and the percentage sign wildcard character (%), like this:
WHERE BANDNAME like '%Blue%'
|The Blue Notes|
Putting a wildcard either side lets us pattern match for our string anywhere in the data. It means ANY string of characters can both come before and come after our pattern string (Blue).
If we only wanted to return strings that had ‘Blue’ at the start of the string we would do this instead:
WHERE BANDNAME like 'Blue%'
Which would get us zero results from our table above.
Another wildcard option.
If we want to be very specific on the number of characters our wildcard allows for, we can’t use the % sign.
Instead we use the underscore character (_) instead. It lets us dictate exactly how many wildcard characters we want to search for e.g. if we only wanted band names of 10 characters we could put ten underscores in our WHERE clause with LIKE:
WHERE BANDNAME like '__________'
Combining more than one condition together in the WHERE clause.
We can power up our WHERE clause even further by using more than one condition at a time. We have two main options at this point:
- joining conditions that BOTH have to be met (the AND operator)
- joining conditions where EITHER of them can be met (the OR operator).
I’m on a travel comparison website looking for a holiday for my family. I know which country I want to visit and what type of holiday we are looking for. Laser focused. Very specific.
As a punter I’ll be clicking on drop down boxes on the website which build up the following WHERE clause in the background e.g.
WHERE Country = 'SPAIN' AND HolidayType = 'BEACH'
I want to see all results which are BOTH in Spain and are Beach holidays. Nothing else. No city breaks. No camping trips. Not in Italy, France or Greece. Just beach holidays in Spain.
Maybe I’m not quite so worried about the particular destination and am open to available suggestions on the type of holiday I’m looking for.
I’d quite like to visit a European country that I know to have a hot climate during the summer months. But I’m not currently decided on exactly which I’ll go for e.g.
WHERE Country = 'SPAIN' OR Country = 'PORTUGAL' OR Country = 'GREECE'
I would get back a list of holiday destinations in Spain, Portugal and Greece. They could include both beach holidays and city breaks as I haven’t specified which I would prefer. I could then look through them and decide which caught my eye the best.
Tidying up our WHERE clause with the IN statement.
You can see from the OR operator example above how your WHERE clause could get a little unwieldy after a while when using multiple different conditions. If we were also specifying date ranges, upper and lower price limits, types of holiday, number of passengers etc. as well it could become a little untidy to read and maintain.
We can do a little housekeeping to protect against that by using brackets and liberal use of tabs. To tidy up the multiple OR conditions above we could also use the IN statement.
IN allows us to group multiple OR conditions together in one statement which makes it easier to maintain and to read. For example, instead of the three OR conditions above we could write:
WHERE Country IN ('SPAIN', 'PORTUGAL', 'GREECE')
Altogether much tidier, less space consuming, easier to maintain and less chance of inadvertantly setting the clause to process in the wrong order. Safety first.
What could go wrong?
It’s SQL. There is ALWAYS the chance that something catastrophic could go wrong.
If we are using our WHERE clause with a SELECT statement then filtering on the wrong conditions could bring back the wrong data. If we are a little too laissez-faire with our SELECT statement and use SELECT * with a poorly constrained WHERE clause, it could put unbearable pressure on our database server and infrastructure.
Not paying attention to grouping conditions is also a particular worry if we have a complicated collection of ANDs and ORs. Brackets are your friend.
As is taking time to properly layout your query. I’ve taken to pen and paper multiple times to sketch out what I want and how to get it. It never hurts to be careful.
The power is in your hands.
Mastering the SQL WHERE clause will be vital to your ultimate success as a SQL developer. And with SQL being vital to your overall success in most aspects of the Data Science world, you’ll want to get right on top of it.
Trial and error are your friends especially when testing queries out with the SELECT statement. What you’ll have learnt in this tutorial will stand you in good stead for getting even deeper into the rest of SQL. Use it wisely.
Check out the next lesson from my Learn SQL For Data Analysis series.