Multi File Uploader for Salesforce (oh ya, and progress bars too)

02/16/2012

Wouldn’t it be glorious if you could upload multiple file to salesforce.com at one time. Wouldn’t it be magically delicious if there were status bars for each file indicating the progress of these uploads? Yes, this would be a glorious and magically delicious feature that should exist so today I give you the first release of the Multi File Uploader Tool for salesforce.com and force.com! It still needs some improvements but it’s off to a good start so far.

If you wish to jump to the end of this magical and captivating journey that is the story of how this tool came to be you can check it out on github here. Yet if you want to captivated and intrigued with the story of how this tool came to be stay tuned and hang on because this is a fun one!

It is clear to everyone the current upload functionality in salesforce.com leaves something to be desired. The user flow today involves navigating to a new page and uploading files one at a time. In Visualforce there is a file upload component but it also lacks pizzaz and polish. Like the native upload functionality this component only allows uploading one file of time. One other issue with this component is you can not perform a rerender on this component as the page will crash hard. The problem is clear and the need for something better is obvious.

Why Now
Aside from the obvious need for something like this what really got my interests piqued in a multi file upload tool was a Seattle Force.com Meetup in which @greenstork asked if anyone had built a multi file upload tool for salesforce.com. No one had. After the meeting I took a stab at this and made progress quickly. I put out a teaser video but then I went silent. As I started to make progress on this app I knew this should go on github but I knew nothing about git. After taking some time to get up to speed on git I think I’m now ready to get this app out in the wild.

Another reason for me delaying the release of this tool is the larger heap size limit that come with the Spring ’12 release. The heap size has gone from 3MB to 6MB and this will help greatly with the size of files that can be uploaded.

How
So how to do this? If you google “file upload status bar” you will seen many options. Some of these solutions use AJAX and Javascript. Other use Flash and some even use a combination of each. The problem with every single one of these solutions is you need full access to the server so you can install a piece of code that monitors the status of the upload. The client can then ask the server for the status of the upload and update the status bar on the page. In salesforce.com magical unicorn land this is not on option as you have no control over the server. Server control is all salesforce.com. I’m okay with this as it’s part of the whole value proposition they deliver, no servers to manage, but for the sake of a multi file uploader tool with progress bars this presents a major problem.

The only thing we can execute in a custom manner on salesforce.com servers is Apex code. So somehow we need Apex to facilitate the processing of files, this we know, as there is no other option. How we get this data to the Apex code on the server is a more open ended question.

Chunking with HTML5 File API
In researching how I might accomplish this I came across a spectacular find that might just make a multi file upload widget possible. In researching some other neat HTML5 features I found the site html5rocks.com and a page on the File API. (Side note, this site has some pretty fan freaking-tastic examples of what you can do with HTML5.) Using the File API you are able to process files on the client side. You can read, get file meta data, and splice files in the browser! At this point the light bulb above my head started to glow. Maybe I can cut the file up in to small junks client side, send it to salesforce.com piece by piece, and then reconstruct the entire file server side using Apex code!

This does in fact work but here lies caveat #1. The File API is not supported by all browsers. It is no surprise the big blue E doesn’t support the File API but even the current version of Safari doesn’t support the ability to read files. Chrome and Firefox are of course good to go.

Using the File API we are able to split up the files and send them to salesforce.com where the entire file can be reconstructed but what is the best way to send the file chunks to the server? Maybe an Apex exposed web service with the SOAP or REST API? Maybe utilizing the existing file upload component or JavaScript remoting? All are potential options but it didn’t take long to narrow down the list. This is going to be used in a Visualforce page… all of the core code and logic is JavaScript… and it needs to be fast. Hmm….JavaScript Remoting it is!

Binary Data and the Woes of Base64
One of the first issues I uncovered with JavaScript Remoting and even the SOAP/REST APIs is the data must be sent to salesforce.com as a Base64 encoded string. There is currently no way for the platform to receive binary data (what files are made of). Even if we could receive the raw binary data Apex code doesn’t really have any way to manipulate binary data other than converted to Base64. So what is wrong with Base64 encoding the data before you send it? Well, when you Base64 encode something it actually bloats the data by 33%. If you want to upload a 1MB file you will actually send 1.33MB of data. Here we introduce caveat #2! Base64 encoding the data makes the uploads slower than they would normally be by 33%. Yet the time saved by not having to upload files individually is probably worth the 33% file bloat when transmitting the data.

Chunking and the Woes of Base 64
Oh, you though we where done with the “woes” didn’t you, nope! At this point I actually had an almost working example. I could splice the file in the browser, send individual blobs to salesforce.com, and then rebuild the file on the server, except there was one major problem. Every file uploaded became corrupt. The first chunk of a file would be good but then everything after that first piece was a complete mess. Why! I spent more time trying to figure out this corruption issue than anything else while developing this tool. Then laying in bed one night it hit me. Stupid Base64!!! Or maybe stupid Jason for not seeing this sooner? To understand why improperly encoding something in base64 will cause data corruption we first need some basic understanding of how Base64 works. Let’s pretend we have a simple text file with the word ‘shark’ in it. We will split this file in to 2 bytes chunks, send them to salesforce.com, and then rebuild the file by combining the Base64 chunks. Easy!

The data chunks sent to salesforce.com Base64 encoded would look like this:

sh -> c2g=
ar -> YXI=
k -> aw==

On the server we should be able to combine these Base64 values, decode it, and get the word shark….right? Wrong!

c2g=YXI=aw== decoded is sh\†°

Hmm, that does not look like the word shark. I’m not going to get deep into the details of Base64 encoding but the main problem is for every 3 input bytes encoded you get 4 output bytes. In our example we are only passing in 2 bytes at a time to get Base64 encoded so our output doesn’t actually use the full 4 bytes. If there aren’t 4 output bytes after encoding is complete padding characters are added, the “=” symbols. This ensures the output is 4 bytes for every 3 inputted. To prevent these padding characters in the middle of our file we need to make sure every chunk we process is divisible by 3. This way for each encoded chunk of 3 bytes we will get exactly 4 output bytes with no padding character in the middle to mess everything up. Lets try again using a chunk size of 3 bytes instead of 2.

sha = c2hh
rk = cms=

You can see the last part of the file still contains the “=” padding characters but this is okay because it is at the end of the file where there really isn’t any data to represent. If these padding characters end up in the middle of a encoded file, this is where we get the data corruption. So combing the encoded values and the decoding the result will look like this.

c2hhcms= decoded is shark !

This may not make a lot of sense at first as it took me some time to fully wrap my brain around this problem but in the end we must make sure when why splice a file in to chunks they must be in a byte size divisible by 3. If not, our data will get corrupted when the file is rebuilt.

Heap Schmeap!
At this point we have a theoretical multi file uploaded. I say theoretical as this post still hasn’t shown any code! We will get to the code, I promise, but in this particular case I think it is better to understand the concepts first. The final concept we need to understand is how the Apex heap size limits affects the upload process. Sharks are cool so let’s continue with our shark example. Below is a walk through of the upload process considering how heap size is affected throughout.

Upload Chunk #1)
Receive Base64 encoded ‘sha’ from remoting call, c2hh. 4 bytes
Decode to binary blob, set as attachment body and insert 3 bytes

Total heap for this operation, about 7 bytes.

Upload Chunk #2)
Receive Base64 encoded ‘rk’ from remoting call, ‘cms=’. 4 bytes
Query body from Attachment object uploaded in part #1, ‘sha’. 3 bytes.
Convert body from Attachment to base64, c2hh, 4 bytes.
Combine the two base64 strings, c2hhcms=. 8 bytes.
Decode base64 combination and set to attachment body, ‘shark’. 5 bytes

Total heap for this operation, 24 bytes! Eeek, especially considering all we are really doing is adding 3 bytes (sha) to 2 bytes (rk). That is a lot of overhead!

This brings us to caveat #3. Even though we have a 6MB heap to work with most of this is used up with the encoding and decoding of Base64 data. In the end the largest file we can upload is approximately 1.27MB. This could be a deal breaker for some use cases but unfortunately there is no way around this. If anyone has some super crafty ways to lower the heap usage please fork this project!

GIVE ME CODE!!!!
Enough talk, give code now! There are still several improvements I want to make with this tool so I don’t want to post the code on this page as it will probably be different one week from now. What I am going to do is put it up on github and you can find it here. I would highly encourage you to check it out, fork it, improve it, and make it awesome. This is the first time I have put something on github and its my first real foray into git so be prepared for me to totally mess something up. While I won’t post the code here I will post a code walk through that should help with some basic understandings of how everything works. It was late and I was tired when I made this so apologies in advance for the spurts of incoherent babbling…


Still Pretty Cool
Let’s review the important caveats.
- Doesn’t work with Internet Explorer or Safari.
- Uploading 1MB file sends 1.33MB of data.
- Heap limits prevent files larger than 1.27MB from being uploaded.

In the end, even with all these limitations this tool is pretty freaking rad I think. If you need to upload lots of smaller files this is a great tool but if your files are larger in size it may not work for you. If I had to give this tool a version number in its current state I would say it’s around 0.6 as there are several features I want to add in the very near future. Some of these are:

- Detect browser support and provide graceful degradation.
- If upload fails because upload is to big provide modal window to upload single large file.
- Appify to tool

This can be found on the issues tab in github as well.

That’s all folks, hope you like it!