The Algoliasearch module of Algolia team provides very useful features for building a user-level interface for highly loaded sites. The profit of the functionality consists of two parts: the capabilities of browser module scripts and the use of the Algolia server for storing data.
The category page with the Instant search feature turned on reacting very quick to any search query, instantly rebuilding the filter and displaying only the products that fit.
At the same time there is a constant loading of data from Algolia. The data about the products itself is stored on the servers of Algolia in the form of “indices” – the lists of entities filtered under defined conditions. These indexes are periodically downloaded from the online store’s server to Algolia. It happens also in the process of rebuilding the Magento indexes.
In order not to overload servers by data transfers, Algolia invented and implemented the update of their indices through the requests queue. The indexing queue is implemented as the algoliasearch_queue table. Every record in this table is a command for data transferring. This command contains the necessary data for the receipt of a chunk of the product list (or other entities) from the Magento and it transfers the data to the server of Algolia.
There are several types of rows in the table. Some of them are commands for updating a single item, they are added after the change of product in the Magento. Others are commands to change a whole group of products, usually after Magento index done or periodically updating the products.
Group operations usually do not even fit into a single entry and they contain several dozens of lines at a time, each with a lot of products for the upload. Mass updates also end with a special separate command that activates new data in Algolia. This is done for the sake of data integrity: Algolia continues to provide old data until all new data comes to the server. On the server of Algolia new data is stored in a temporary index and the final command is needed just to remove the current working index and activate the temporary instead of it.
The process of transferring data from commands’ queue occurs during the work of the cron tasks. The task in cron handles a certain number of commands in the queue, and it sometimes leads to some problematic issues. The fact is that a fixed number of commands (for example, 10) can contain only 10 products for transfer (10 commands per unit transmission), as well as 1000 products (10 commands for the transfer of 100 products).
As a result, if there are intense individual edits of products, data transmission is too late. If you try to increase the number of commands processed within one cron task, then the inverse problem occurs: too much data volume does not have time to be processed in one cron task.
To mitigate this problem it was decided to optimize the queue of commands.
Optimization of AlgoliaSearch commands
Such optimization is quite a trivial operation that collects a lot of short commands (rows) for the transfer of the 1st product into a grouped one. The queue of commands in this way becomes more dense, uniform and establishes the optimum schedule of the cron.
Also, optimization of the queue gives an opportunity to remove duplicate commands from it. Such commands appear if there were several edits of the same product but none of them had yet been processed by the queue.
The algorithm is simple enough.
The queue optimization is also done in the cron, it may be not as good as catching the events of adding commands to the queue, but more stable and independent from Algoliasearch module implementation.
- At the first stage of the queue’s optimization, all duplicate commands are deleted. Of course, do it only for commands that update the working index in Algolia and do not do for the process with a mass update of the temporary index.
You can distinguish these commands by the data field: the mass update contains in this field not empty value use_tmp_index, as well as an empty array of product_ids and a non-empty value in the page var. Generally, there is no point at all to leave at least some records after the first mass update, because then all the products become relevant after that.
However, because the implementation of this operation is partly outside of the available code (namely, on the server of Algolia) so it is better to leave unique commands after mass commands also.
- At the second stage of optimization, all single updates of the products are selected and these commands are grouped into more massive ones.
After that, instead of, for example, 100 update commands for one product, a single command is obtained for updating 100 products. In order not to interfere with the order of the queue execution, new commands are not added to the end of the queue but replace some existing ones. As a result, we get the compression of the queue as well as the transfer of single updates of the products at an earlier time. Naturally, the mass update of the index happens at a bit later due to this.
As a rule, these two stages of optimization of the queue reduce its size to dozens of times.
- The third stage of optimization can be performed optionally. You can delete all commands from the queue that follow the moveProductsTmpIndex command.
As mentioned above, this command completes the mass update of the indexes and at the time of its execution, all the actual data will be transferred to the working index. However, it can not always be done because the process of full updating takes time. And during this time, changes can be made to the products that have already been written in a temporary index.
For the same reason, the optimization of individual updates should also be done in consideration of the beginning and the end of the mass update.
If such an update has not yet begun, then processing commands from later dates to recent ones are completely safe.
But if the mass update has already started and has not yet completed, then the single transfer should not be done. However, this situation usually does not happen, since the start of a mass update usually means the end of the work of all the previous commands.
In addition to updating the products, the module updates the categories according to the same principles. Since the categories are usually appear in the queue much less, then their optimization is not a big deal. Although this can be done on the same principles.
And of course, concerning optimization, one should not forget that the indices are separated for each store and this factor must be taken into account.