# How to select random rows in MySQL

The easiest way to generate random rows in MySQL is to use the ORDER BY RAND() clause.

SELECT col1 FROM tbl ORDER BY RAND() LIMIT 10; |

This can work fine for small tables. However, for big table, it will have a serious performance problem as in order to generate the list of random rows, MySQL need to assign random number to each row and then sort them.

Even if you want only 10 random rows from a set of 100k rows, MySQL need to sort all the 100k rows and then, extract only 10 of them.

My solution for this problem, is to use RAND in the WHERE clause and not in the ORDER BY clause. First, you need to calculate the fragment of your desired result set rows number from the total rows in your table. Second, use this fragment in the WHERE clause and ask only for RAND numbers that smallest (or equal) from this fragment.

For example, suppose you have a table with 200K rows and you need only 100 random rows from the table. The fragment of the result set from the total rows is: 100 / 200k = 0.0005.

The query will look like:

SELECT col1 FROM tbl WHERE RAND()<=0.0005; |

In order to get exactly 100 row in the result set, we can increase the fragment number a bit and limit the query:

For example:

SELECT col1 FROM tbl WHERE RAND()<=0.0006 limit 100; |

Feel free to leave a comment.

Thank you very much, buddy! You made my day!

Your reasoning is sound and I appreciate you sharing this great idea with us!

You just Helped me to Show 5 random pictures from a web gallery.

Thanks

How can RAND() guarantee to provide an evenly distributed set?

Cause this is the condition to use your above statement.

I found no detailed documentation how RAND() works.

The only thing I’ve found: “Returns a random floating-point value v in the range 0 <= v < 1.0" from http://dev.mysql.com/doc/refman/5.0/en/mathematical-functions.html#function_rand

So it could happen no rand-Value is below 0.0005.

Hi Markus,

There is never guaranties about random function. However, we can assume that for big numbers of generations the distribution is somewhat even.

Thanks a lot!

PS:a tiny problem, “get exactly 10 row in the result set” maybe “get exactly 100 row in the result set”

Hi,

I’m not sure this method is really reliable…

I can see 2 issues :

– you’re not sure to have X results. You can have less, which can be a problem… (unless you select “X + a safety margin” and you shuffle the results and select X)

– I think there’s also a distribution problem with the function. The rows at the end of the table have certainly a lower frequency than the first (not really time to digg it in the morning, but i’m quite sure)

Hi Ben,

1. to overcome the possibility of getting less results you can increase the fragment number a bit and limit the query.

2. It is true that the (only) few rows at the end of the table have a lower frequency than the first (the LIMIT clause forces the query to stop as it gets to the limit number). If these few rows frequency is very important to you, you can remove the limit, then wrap the query with another query that randomly sorts the results and retrieve only your limit number of results.

For example:

SELECT col1 FROM (SELECT col1 FROM tbl WHERE RAND()<=0.0006) t ORDER BY RAND() LIMIT 100; However, this can work only for queries that needs small number of results.

Cool tip!

Thanx.

Sorry to rain on your parade but this is broken for a number of reasons.

1. No guarantee that you will always get to your limit – yes its likely but its not certain.

2. Uneven distribution. The earlier a number is in the table the more likely it is to be seen.

For a simple example imagine you are selecting 1 value from 3: You actually get:

1/3 = 9/27th chance of first

2/3*1/3 = 6/27th of 2nd

2/3*2/3*1/3 = 4/27th of 3rd.

2/3*2/3*2/3 = 8/27th of no number at all.

In other words you are twice as likely to select nothing as you are to select the 3rd value.

If you increase the probability each time from 1/3 in order to improve this then that just moves the curve, not changing its shape. It will also shrink that final “No results” chance, but it will never take it to 0.

The “correct” algorithm for this would subtract 1 from the change each time – i.e.

1/3 chance of first

then 1/2 chance of second

then always third.

That’s tricky to do in your single query though (might be possible using variables). i.e. where rand()*numrowsstillneeded/numrowsremaining > 1

To get a single random row (from an auto increment PK table) this is probably the smartest solution I’ve seen. SELECT * FROM my_table

WHERE pk_column >=

(SELECT FLOOR( MAX(pk_column) * RAND()) FROM my_table)

ORDER BY pk_column

LIMIT 1;

I’m not sure what the best solution to the multiple rows problem is yet but so far I’ve not seen a solution I was impressed by.

Yes, i’m doing that way, just a note, using round instead of floor will increase randomness.

After seeing your method i developed an Order By Rand() Alternative method that calculates the fragment number automatically.

You can read about it in my article here:

http://www.warpconduit.net/2011/03/23/selecting-a-random-record-using-mysql-benchmark-results/

Thank you, sir.

Thanks, your code worked.

$aPickFoto = mysql_fetch_array(mysql_query(“SELECT path FROM project_foto WHERE pID='”.$aProject[‘pID’].”‘ ORDER BY RAND() LIMIT 1;”));

echo($aPickFoto[‘path’]);

I applied your idea to my query.

I basically needed half of the records, so I changed your rand line to this: AND RAND() < .05, but the query was taking about 2+ seconds to execute. Way too slow, right?

Interestingly, and for the information of the group, by changing the as so, AND RAND() < .05, the query returned in half a second.

i love this solution because it is so simple.

in my solution php makes a loop untill it gets 1 result, limit 1,

and a margin on the random (i need only 1 random row)

i order the results on an other column, where i have set a priority.

(well its more intricate, like week_number)

so the unequal distribution i actually use to my advantage.

when this is done, and the selected row is processed in my script,

i update the row with the proirity, set the week_number + 1 so it will fall out of the next selection of 1 row

thanks for sharing ,dj

If I put RAND() in the WHERE clause but omit the ORDER BY clause, I get the same record every time I repeat the query.

With RAND() in both clauses, you cut the number of rows ORDER BY sorts, yet you still get a different result if you repeat the query.

Hi Jan,

Can you post your queries and give more details?

Thanks

Ilan.

Thanks for this Ilan! Was pulling my hair trying to figure it out before.