Mastering Bulk API 2.0: High-Velocity Data Ingestion and Defeating the 100-Million Daily Limit

Unlock the full power of Salesforce Bulk API 2.0. Learn how to design high-performance data ingestion pipelines, avoid concurrency bottlenecks, and strategically manage the rolling 100-million daily record limit without disrupting production workloads.

André Rödel

5/28/20264 min read

Mastering Bulk API 2.0: High-Velocity Data Ingestion at Scale

Moving data into Salesforce is easy when you are dealing with a few thousand records. But when you are architecting integrations for enterprise environments—where a nightly sync involves transferring data from a massive AWS Data Lake or an on-premises ERP—traditional REST or SOAP APIs will quickly hit an absolute wall.

If you try to push millions of rows through standard synchronous endpoints, you will exhaust your concurrent request limits, trigger devastating lock contention errors, and saturate your daily API allocations within minutes.

For massive datasets, Bulk API 2.0 is the industry gold standard. Built on top of Salesforce’s modern REST framework, it changes how large-scale data loading is executed by handing the heavy architectural lifting over to the platform itself.

However, even the most robust asynchronous framework has boundaries. If you don't design your ingestion strategies with precision, you will eventually collide with the hard platform ceiling: the rolling 100-million daily record limit. Let’s explore how Bulk API 2.0 works under the hood and how to architect your way around this limit.

1. Bulk API 2.0: The Architectural Leap forward

If you worked with the legacy Bulk API 1.0, you probably remember the tedious orchestration it required: manually splitting your data into batches, tracking multiple batch IDs, and managing the state of each chunk individually.

Bulk API 2.0 completely rewrites this developer experience. You simply open a job, upload a single, massive CSV payload (up to 150 MB), and close the job. Salesforce handles the rest.

Automatic Chunking: The platform automatically splits your 150 MB file into optimal batches behind the scenes.
Simplified State Tracking: You track one single Job ID to monitor progress, rather than managing a spiderweb of batch statuses.
Optimized Daily Limit Tracking: Unlike Bulk 1.0, where limits were tied to the number of batches processed, Bulk 2.0 limits are strictly calculated based on the total number of records processed.

2. Strategic Blueprint: Navigating the 100-Million Daily Limit

Salesforce enforces a strict limit of 100,000,000 records processed per 24-hour rolling window for Bulk API jobs. If your enterprise data pipeline exceeds this threshold, the platform will completely reject any subsequent bulk ingestion attempts, risking data desynchronization.

To protect your org from hitting this limit during massive migrations or daily sync cycles, implement these architectural strategies:

A. Adopt a Strict Delta Loading (CDC) Mindset

The absolute best way to manage a limit is to avoid burning through it. Never execute "full drops" where you overwrite unchanged records.

Implement Change Data Capture (CDC) or timestamp-based delta tracking on your source system (e.g., AWS Kinesis or your data warehouse).
Only extract and stream records that have been created, modified, or deleted since the last successful sync execution.

B. Offload Data Enrichment to the Source System

If you are uploading millions of records just to trigger a series of heavy Apex formulas, flows, or cross-object updates that calculate text fields, you are wasting valuable computing power and increasing database lock times.

Perform complex data transformations, aggregations, and formatting upstream within your ETL tool or data layer before the payload ever reaches Salesforce.

C. Monitor Limits Dynamically via API

Don't wait for a job to fail to discover that your org is out of capacity. Your integration pipeline should query the Salesforce Limits REST endpoint (/services/data/vXX.X/limits) prior to kicking off any massive Bulk 2.0 job.

3. Ingestion Performance: Bypassing Database Bottlenecks

Even if you stay well below the 100-million record limit, your Bulk API jobs can still stall out or fail due to Lock Contention or CPU Timeout limits within Salesforce. When thousands of records are written simultaneously, the database must secure locks on parent accounts and related objects.

Optimize the Loading Order

To minimize lock contention, sort your input CSV file by parent IDs (such as AccountId or a custom look-up field) before initiating the upload. When records sharing the same parent are grouped together within the same internal chunk, Salesforce locks the parent record once, processes the group, and releases it cleanly, preventing parallel chunks from blocking each other.

The Automation Bypass (The "Kill Switch" Pattern)

Running validation rules, record-triggered flows, and complex Apex triggers during a multi-million-record bulk load will kill your ingestion performance.

Architect’s Recommendation: Implement an integration bypass switch. Use a Custom Metadata Type or a Hierarchical Custom Setting to create a global toggle that disables resource-heavy validation rules, automation flows, and triggers specifically for your Integration User profile during the bulk window.

4. Handling Partial Successes and Error Tracking

Bulk API 2.0 processes requests asynchronously, which means it will not roll back an entire job if a few records encounter errors. It operates under a partial success paradigm.

Once a job reaches the JobComplete state, your data pipeline must immediately retrieve the results files using the native endpoints:

Successful Records: /services/data/vXX.X/jobs/ingest/jobId/successfulResults
Failed Records: /services/data/vXX.X/jobs/ingest/jobId/failedResults

The failed results file provides the exact Salesforce ID, the original row data, and a clear error message (e.g., REQUIRED_FIELD_MISSING or CUSTOM_VALIDATION_EXCEPTION) for every failed row. Your ETL tool should be configured to capture this stream, route the failures to a dead-letter queue (DLQ) for remediation, and allow the main pipeline to continue uninterrupted.

Final Verdict

Bulk API 2.0 is an exceptionally powerful tool, but it requires a shift from a real-time request-response mindset to an asynchronous, chunk-based engineering strategy. By grouping your data to avoid lock contention, monitoring the 24-hour rolling limits programmatically, and building automated bypass switches for your org's internal logic, you can seamlessly scale your enterprise data pipeline to handle millions of records every single day.

Connect

Reach out for feedback or technical inquiries.

Email

andre@theforcejournal.tech