7/25/2023 0 Comments Datagrip sqlite data typesIf that results in a runtime that is too long, then I'd compress the runtime by making the ETL muti-threaded, with each thread only operating on a segment of the table but working in parallel. This is pretty much what I have been doing or a living the past few years, and my gut instinct is that the time to read 500,000 items from the source database and sync in the destination will not take as much time as one might think and the time taken to read the "key" fields, compute the MD5 hash, and cross check with your table to avoid syncing items that haven't changed won't end up saving too much time and may even run longer. There are around 500 000 products so I'm a bit worried about performances. Then compare and update the product if the new hash is different from the old one. I will store all hashes in a single table ( item code, current_hash, old_hash ) for performances purpose. A new hash based on the same data will be created on a daily basis from the source database. I think about creating a hash for each product, based on the fields to update in the target database: md5( code + description + supplier + around 10 other fields). So the whole process needs to be done in a single Python (ideally) script. There are no "updated time" fields anywhere.Target database uses a MySQL server - source may be DB2.Target and source db have completely different structures, tables are not the same at all, therefore data really have to be rearranged - comparing tables won't work.On the target database, I can do usual queries (select, update, insert, create) but I can't modify the existing structure/tables.I'm not allowed to do anything on the source database apart select queries.Obviously, there are a few issues that make this tricky. But I'm looking for a way to update some specific data - not all data - about each product. Basically, I need to grap some data about products in different tables in the first database and re-arrange them for other tables in the second database.Ĭreating my products on the first time is not very complicated. I have to implement data synchronization between two big databases which have completely different structures.
0 Comments
Leave a Reply. |