I recently helped out the PR team with a press query on the subject of helping users with bulk uploads into the cloud. It was for an article on "Cloud ingestion" by @ssharwood published on the global technology news site, The Register:
"Salesforce.com, for example, advised us that bulk uploads are made possible by a Bulk API which happily puts SOAP and REST to work to suck up batches of 10,000 records at a time. 'Even while data is still being sent to the server, the Force.com platform submits the batches for processing,' the company said."
But it prompted me to share here in full how I would advise users to consider executing bulk uploads into our cloud.
Now generally, salesforce.com customers are not looking to create cloud silos or move the post-code of their data centers but to take advantage of capabilities from the cloud and combine this with their existing investments. Because of this salesforce.com has invested heavily in making its service available through industry standard APIs that work with the tools that businesses use for moving data today. The API now accounts for around half of the 800M+ transactions that hit the salesforce.com service on a daily basis.
Salesforce provides a number of ways to access and upload data to the service including SOAP and REST based APIs. A bulk version of the API allows large scale replication of business data.
The Bulk API was developed specifically to simplify the process of uploading large amounts of data. It is optimized for inserting, updating, upserting, and deleting large numbers of records asynchronously by submitting them in batches to Force.com, to be processed in the background.
Uploaded records are streamed to Force.com to create a new job. As the data rolls in for the job it is stored in temporary storage and then sliced up into user-defined batches (max of 10,000 records). Even while your data is still being sent to the server, the Force.com platform submits the batches for processing.
Batches can be processed in parallel or serially depending upon your needs. The Bulk API moves the functionality and work from your client application to the server. The API logs the status of each job and tries to reprocess failed records for you automatically.
There is a great article by Jeff Douglas on developer force Loading Large Data Sets with the Force.com Bulk API and the full documentation Force.com Bulk API Developer's Guide.
Any questions, drop in the comments below and I'll endeavor to answer them as soon as I can – or tweet me at @derektweets. Happy migrating! :)