You have to insert 1 Million entries into a database. You know there is a structure to it (I’ll supply one as a sample) so the easiest is to write a PHP script to do this task for you. Said and done, the script is finished and it does the job. But slow as hell. Not only does it trash your hard disk like mad, it also times out on a browser since you don’t want to do the shell rumba. You have only 20k entries in your database, wasted 30 seconds (while the database still tries to catch up with the sent queries) and now you realize you need a few more rows anyways.
To make life easy, I’ll use Zend Framework for this. You could use anything else that sends queries, but since I already used it for this project, there was little sense in doing something else. So, with all useless stuff stripped out, here the code in it’s first, rather infancy way.
$db = Zend_Registry::get('db');
$x = 0;
$y = 0;
$z = 0;
while($z < 100)
{
while($y < 100)
{
while($x < 100)
{
$query = 'INSERT INTO location (locx, locy, locz) VALUES ';
$query .= '('.$x.', '.$y.', '.$z.')';
$db->query($query);
$x++;
}
$y++;
$x = 0;
}
$z++;
$y = 0;
}
This sample creates rows for a location in a 3D space in case you wonder. So these lines represent single points in a cube with the side length of 100 whatever units you want to imagine here. It starts at 0,0,0 and ends at 99,99,99. So this gives us our nice 1 million entries.
The $db variable up there is a Zend_Db_Adapter instance, so I can now run queries directly. Yes I know you could write the query differently, but I left it like that to make life easier for the next steps. So instead of firing off 1 Million queries, how about grouping 100 queries together, into one large one? I left the top part out, so I will only post the while loop. The result looks like this.
while($z < 100)
{
while($y < 100)
{
$query = 'INSERT INTO location (locx, locy, locz) VALUES ';
$first = true;
while($x < 100)
{
if ($first == TRUE)
{
$first = false;
}
else
{
$query .= ', ';
}
$query .= '('.$x.', '.$y.', '.$z.')';
$x++;
}
$db->query($query);
$y++;
$x = 0;
}
$z++;
$y = 0;
}
It’s faster already, but it still does a lot of trashing around on the DB. While we now have “only” 10′000 queries instead of 1 Million, it still is far from “nice”. I could extended this and create larger queries, so I end up with 100 queries, each of them having 10′000 entries, but that might by too large queries and it does not solve much of my problem if I have even more data. Since I’m using MySQL for this, I have transactions at my disposal as well. So instead of writing it all with every query, I start a transaction, run 100 queries then commit that batch. But wait isn’t that the same thing than just writing 100 queries? In theory, yes, practically though this changes a few things. Since those queries are small, the system can keep them in RAM and write them in one big write, combined together, instead of 100 small changes, it makes one big one, lightning fast. Here how this looks like.
while($z < 100)
{
$db->beginTransaction();
while($y < 100)
{
$query = 'INSERT INTO location (locx, locy, locz) VALUES ';
$first = true;
while($x < 100)
{
if ($first == TRUE)
{
$first = false;
}
else
{
$query .= ', ';
}
$query .= '('.$x.', '.$y.', '.$z.')';
$x++;
}
$db->query($query);
$y++;
$x = 0;
}
$db->commit();
$z++;
$y = 0;
}
Now PHP will write all 1 Million rows in under 30 seconds even on a relative small system.
You could of course optimize this, maybe even wrap the whole process in a transaction instead of single steps, but that’s up to you and your ingenuity. For my needs this did the job so I had no need to go even further with it. Either way you should be able to take it and run with it fast and far for your problem of inserting large amounts of data.
Again, I used Zend Framework, so in your case the beginTransaction and commit function might not exist or named differently.
0 Responses
Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.
You must be logged in to post a comment.