background preloader

Archiving

Facebook Twitter

Archive strategies for OLTP servers, Part 3 at Xaprb. In the first two articles in this series, I discussed archiving basics, relationships and dependencies, and specific archiving techniques for online transaction processing (OLTP) database servers. This article covers how to move the data from the OLTP source to the archive destination, what the archive destination might look like, and how to un-archive data. If you can un-archive easily and reliably, a whole new world of possibilities opens up. For your reference, here are links to part 2 and part 1, and the original article on efficient SQL queries for archiving, which is the basis for this whole series. How to move the data At some point you have to actually take the data from the source and put it into the archive. This is a three-step process: Find archivable rowsInsert them into the archiveDelete them from the source I wrote an article on how to find archivable rows efficiently, so I won’t go into it more here.

Transactions Your archive might also be non-transactional. Duplicated data. Archive strategies for OLTP servers, Part 2 at Xaprb. In the first article in this series on archiving strategies for online transaction processing (OLTP) database servers, I covered some basics: why to archive, and what to consider when gathering requirements for the archived data itself. This article is more technical. I want to help you understand how to choose which rows are archivable, and how to deal with complex data relationships and dependencies. In that context, I’ll also discuss a few concrete archiving strategies, their strengths and shortcomings, and how they can satisfy your requirements, especially requirements for data consistency, which as you will see is one of the most difficult problems in archiving.

Remember I’m basing these articles on the nibbling principle I explained in my very first article on archiving strategies. The goal is not to move away tables or take gigantic chunks out of tables manually. It’s a different matter if you’re archiving or purging from an OLAP system such as a data warehouse, of course. Archive strategies for OLTP servers, Part 1 at Xaprb. In May 2005, I wrote a widely-referenced article about how to efficiently archive and/or purge data from online transaction processing (OLTP) database servers. That article focused on how to write efficient archiving SQL. In this article I’ll discuss archiving strategy, not tactics. OLTP servers tend to have complex schemas, which makes it important and sometimes difficult to design a good archiving strategy. The 50,000-foot view Archiving is actually a very large topic! My goal is to at least mention many of the things to consider, and go into some of them in detail.

Here’s what I’ll cover in this and upcoming articles: Goals of archivingWhere to store the archived dataHow to choose which rows are archivableHow to deal with complex data relationships and dependenciesHow to actually archive the dataUn-archiving strategy Archiving: why do it? Archiving is about scaling. How do you know if you can do this? When people are a roadblock Gather requirements for the archive Conclusion. How to write efficient archiving and purging jobs in SQL at Xaprb. Sometimes it’s a terrible idea to do set-based operations in SQL. There are real-world constraints, especially in heavily used OLTP or large databases, that require doing things a tiny bit at a time.

In this article I’ll show you how to write a job that can purge data from a huge table without impacting critical processes, filling up the transaction log, or causing deadlocks. I have released a tool that does a fantastic job of archiving and purging MySQL tables, as part of MySQL Toolkit. If you’re using MySQL, you should take a look at it. Motivation Mission-critical database servers can’t be taken offline for maintenance tasks such as purging or archiving historical data, yet high-volume OLTP databases need to be small to stay responsive, which creates the need for purging or archiving.

At my current and previous employers, we’ve used similar tactics to purge and archive old data without detrimentally impacting the database server. First try: failure In my previous life… Keep it small.