Upload folder to S3 recursively

03 Dec 2013

Lately, I needed to upload a folder to S3 with all of it’s files.

The use-case is compiling assets on the CI and then uploading it to S3 for the CDN to consume.

While searching for a gem that does it I encountered s3_uploader, but I really didn’t like it because it’s using Fog.

Generally, I don’t like gems that use other gems for no apparent reason, there’s absolutely no reason to include fog in my project just to upload files recursively.

I did however, like that it’s using multi threads in order to do the upload so I am doing the same in my solution.

I wrote a solution that uses the aws-s3 Ruby SDK, which was already included in my project anyway.

Here’s the code:

  class S3FolderUpload
    attr_reader :folder_path, :total_files, :s3_bucket
    attr_accessor :files

    # Initialize the upload class
    #
    # folder_path - path to the folder that you want to upload
    # bucket - The bucket you want to upload to
    # aws_key - Your key generated by AWS defaults to the environemt setting AWS_KEY_ID
    # aws_secret - The secret generated by AWS
    #
    # Examples
    #   => uploader = S3FolderUpload.new("some_route/test_folder", 'your_bucket_name')
    #
    def initialize(folder_path, bucket, aws_key = ENV['AWS_KEY_ID'], aws_secret = ENV['AWS_SECRET'])
      @folder_path       = folder_path
      @files             = Dir.glob("#{folder_path}/**/*")
      @total_files       = files.length
      @connection        = AWS::S3.new(access_key_id: aws_key, secret_access_key: aws_secret)
      @s3_bucket         = @connection.buckets[bucket]
    end

    # public: Upload files from the folder to S3
    #
    # thread_count - How many threads you want to use (defaults to 5)
    #
    # Examples
    #   => uploader.upload!(20)
    #     true
    #   => uploader.upload!
    #     true
    #
    # Returns true when finished the process
    def upload!(thread_count = 5)
      file_number = 0
      mutex       = Mutex.new
      threads     = []

      thread_count.times do |i|
        threads[i] = Thread.new {
          until files.empty?
            mutex.synchronize do
              file_number += 1
              Thread.current["file_number"] = file_number
            end
            file = files.pop rescue nil
            next unless file

            # I had some more manipulation here figuring out the git sha
            # For the sake of the example, we'll leave it simple
            #
            path = file

            puts "[#{Thread.current["file_number"]}/#{total_files}] uploading..."

            data = File.open(file)

            next if File.directory?(data)
            obj = s3_bucket.objects[path]
            obj.write(data, { acl: :public_read })
          end
        }
      end
      threads.each { |t| t.join }
    end
  end

The usage is really simple

	uploader = S3FolderUpload.new('folder_name', 'your_bucket', aws_key, aws_secret)
	uploader.upload!

Since it’s using Threads, the upload is really fast, both from local machines and from servers.

Have fun coding!

Avi Zurel

Upload folder to S3 recursively

Related Posts

clone and pull all of your company's repositories in a single command 20 Mar 2017

The Creative Network - Live streaming a full stack web app 02 Mar 2017

Configuring multiple reportes for Apache Flink 27 Feb 2017