Salesforce Archiving

March 11, 2021

1. Problem Statement:

While using legacy systems over the years, with poor Data Archival / Purging policy often your environment ends up in a situation where most of the objects are overflowing with data that is too old and useless.

This turns to be an even bigger problem when you are using platforms such as Salesforce where there are numerous platform limitations and governor limits waiting for you. While this majorly affects your queries and transaction time, in a worst-case scenario, you may exhaust the amount of Data you are entitled to. Usually increasing your data limits take a huge toll on your overall implementation cost!

In this article, we will focus on this problem specific to the Salesforce platform.

2. Different options available for backup/archival:

Salesforce explains their suggested best practices to back up Salesforce data in the following knowledge article:

There are numerous providers facilitating Archival / Backup services. Most of these players store your data on their own Cloud Servers and provide you with an option to Query / Retrieve the data from within the salesforce.

Some of the providers facilitate backing up your data on your own personal cloud or AWS or other Cloud platforms. There are few more alternatives that help you back up data in your local database servers.

The biggest challenge with all these providers is when you plan to switch from one provider to another providers since this involves a lot of complications.

With the launch of Big Objects, now it is possible to archive your data within the Salesforce platform using Big Objects and we are going to explore this further in this blog.

3. Why Big Objects?

While there are various reasons why Big Objects are the best ones to choose from all the above options, we will try to figure out some of the best reasons here. With Big Objects, your data remains within your own salesforce environment. This brings the following benefits to you:

  1. Ease of accessing the archived/backed up data.
  2. Ease of maintenance.
  3. Ease of implementing Data Security policies.
  4. Minimal cost overhead.
  5. Ability to leverage benefits of Salesforce platform features using Big Objects.

4. Archival Strategy:

While this is a very vast and subjective topic, we will try to discuss some of the key elements here. Since there are numerous factors that vary depending on your business needs, local data security and privacy policies, and legal policies.

a) Identifying Data to be Archived:

It is very crucial to identify what data you would like to archive. The criteria may vary from object to object. Do not forget to note how the objects are related to each other since deleting a parent record may end up deleting all its child records. If you miss out on archiving the records from child objects before you archive the records from a parent object, you may end up losing all your data stored in child objects.

b) Archiving the data:

As salesforce is a relational database, you may want to archive your data in such a way that whenever you restore the data, the relationships shall be preserved. How you maintain a relationship with the archived data is of utmost importance since most of the time when you need your data back, you would need all the related data restored as well. Order in which you archive your data as well is equally important. The frequency of running the archival jobs is another key topic.

c) Restoring the data:

Restoring your archived data is the most complicated step out of all these. The very first step in restoring the data is, how the user will identify the data that needs to be restored is already there in the archived records. Further users should be easily able to restore the exact data that they want (An unsaid requirement would be all the related data has to be restored as well!) There are few other challenges while restoring some of the related data since there may be self / circular relationships involved. In some cases when you are restoring the data, the picklist values might have changed, or the users referred in the archived records may have been deactivated. Some of the system validations, as well as triggers/automation processed, may need to be bypassed where some may still need to be there. In case you have any integrations and external Id’s they add more complexity to the recovery process.

d) Handling the duplicates:

This is one of the most important and most ignored factors. You must ensure that there are no duplicates across your archived data + your live data. This includes constantly monitoring all your newly created records against the archived data such that whenever you come across any duplicates, you restore your archived data and merge it together.

5. Best practices:

Below are some of the best practices that you should implement in order to ensure data quality in Salesforce:

a) Duplicate rule/jobs: Leverage benefit of duplicate rules across standard as well as a custom object.

b) Data Purging Policy: Define and implement a data purging policy such that data quality can be maintained.

c) Closely monitoring your org data limits: Have a monthly or quarterly review meeting, where you may monitor and analyze your consumption of Data limits as well as other salesforce governor limits.

6. Take Away:

Data Archival/Backup is one of the most ignored topics while implementing Salesforce, it turns out to be one of the biggest risks over time as your implementation grows in terms of Data. It is very important to plan your Archival strategy beforehand to avoid the after-effects.