Symfony Sonata Media Admin Boilerplate

Within the last 7 days I had to kickoff three Symfony 2 projects. All with admin interface, all with user support and all with media (image, video and RTE) support. I stumbled over the Symfony Sonata Admin boilerplate by Geoffrey Monté. After my second kickoff I created a new repo with all bundles preconfigured I need based on Geoffrey Monté repository.

So if you have to kickstart a project matching this requirements:

  • Symfony 2.8
  • Sonata Admin 2.3
  • Sonata User Bundle
  • Sonata Media Bundle
  • RTE with Media Bundle connection

Feel free to use my boilerplate from:

https://github.com/mountain-code/symfony-sonata-media-admin

Bulk insert delay on clustered crate database

I currently work on a backend project providing measurement data to an angular 2 frontend via REST API. One of the main jobs is to import huge amount of measurement data from cryptic text files into a database that can handle it. To handle means to run GROUP BY – HAVING – SORT BY and all that stuff queries on millions of records and getting an answer in a minimum of time. I picked Crate as storage for this named data. You should definitely try this sql database at home! Or in production!

I’m running a small Crate cluster with a handful of nodes on my local machine synchronizing all that data I put in. In this special case the data is inserted via bulk insert. 50 records in one call. Something like this:

INSERT INTO data VALUES (time_attribue, value_attribute) ('2016-01-25T12:45:18+00:00', 12.93), (...), (...) ON DUPLICATE KEY UPDATE value_attribute = VALUES(value_attribute);

My importer looks a little bit like this code – apart from about 2.000 lines of code:

<?php

protected function importController() {

  /*
   * Import stuff.
   */
  $this->bulkInsert();

  /*
   * Query stuff.
   */
  $this->analyticQuery();
}

protected function bulkInsert() {
  
  /* 
   * The above bulk insert.
   */
  $insert = 'INSERT INTO data VALUES (time_attribue, value_attribute) ... ';
  
  /*
   * Push it!
   */
  $this->crateConnection->query($insert);
}

protected function analyticQuery() {
  
  /*
   * Complex analytic query on the just imported data.
   */
  $select = 'SELECT * FROM data WHERE ... ';
  
  $records = $this->crateConnection->query($select);
  
  echo "Number of records in result set: " . count($records) . "\n";
}

?>

Precis: I import some data, about 50.000 records, splitted in packages of 50 records via bulk insert. Data is stored in > 2 nodes. There’s no time to lose. I run an analytic query on the fresh importet records.

By accident I have to clear the database and run the importer again, my analytic query result is different to the one before. Astonishing. Same data, same query should supply the same result. Executing Putting the import and table truncate many times behind one another I got a result like this one:

timo@laptop project-dir (master)
$
Number of records in result set: 122

timo@laptop project-dir (master)
$
Number of records in result set: 130

timo@laptop project-dir (master)
$
Number of records in result set: 137

timo@laptop project-dir (master)
$
Number of records in result set: 119

Every time I import the same data into an empty table the analytic query supplys a different result.

 

Solving the problem

The analytic query seems to be executed and processed from database before the whole new data is synchronized through all nodes of my Crate cluster.

protected function insertController() {

  /*
   * Import stuff.
   */
  $this->bulkInsert();

  /*
   * Wait for a all nodes to be synchronized.
   */
  usleep(400000);

  /*
   * Query stuff.
   */
  $this->analyticQuery();
}

Waiting for less than half a second after the import before querying the table solved the problem for the short term.

 

Why to write about it

I’m not that unexperienced in working with database but that synchronization trapped me. I only found this “bug” in my code by accident while testing with really small amount of data. Two weeks later with millions of records in the table it would have been worse to find the reason for randomly inconsistent data sets.