×
Community Blog Using Brotli to Compress Large Files

Using Brotli to Compress Large Files

This article discusses how you can use Brotli to compress large files so that you're able to upload them to Function Compute.

By Yi Xian

This article discusses how you can use Brotli to compress large files so that you're able to upload them to Function Compute.

For some background on this, Function Compute limits the maximum size of the zipped code package to be uploaded to 50 MB. However, in some scenarios, code packages may exceed this size limit, for example, uncropped serverless-chrome. Other examples include LibreOffice and trained machine learning models.

Currently there are three methods that can be used to solve this problem:

  1. You can use compression algorithms with a high compression ratio, for example, the Brotli algorithms described in this article.
  2. Use the OSS runtime for downloading.
  3. Use NAS for file sharing.

The following table looks at the advantages and disadvantages of these three methods.

1

Normally, the startup speed is relatively fast if the code package is less than 50 MB. Putting data and code together can simplify the engineering work: No additional scripts are required to update OSS or NAS.

Brotli: The Compression Algorithm

Brotli is an open-source compression algorithm developed by Google engineers. Brotli is used as the compression algorithm for HTTP transfer in the latest version of the popular browsers. The following figures obtained on the Internet show the benchmark testing for Brotli and other common compression algorithms.

2

3

4

As shown in the three figures, compared with gzip, and also xz and bz2, Brotli has the highest compression ratio and the slowest compression speed and its decompression speed is close to that of gzip.

However, the example scenario that will discussed in this article is not sensitive to the compression speed. Only one compression task is required during the development preparation phase.

Create a Compressed File

First, let's see how to create a compressed file.

Install Brotli

If you're using MacOS, you can use this command in Brew:

brew install brotli

And if you're using Windows, youcan go to this page to download Brotli: https://github.com/google/brotli/releases

Package and Compress Files

The two files before packaging are 7.5 MB and 97 MB, respectively.

╭─ ~/D/test1[◷ 18:15:21]
╰─  ll
total 213840
-rwxr-xr-x  1 vangie  staff   7.5M  3  5 11:13 chromedriver
-rwxr-xr-x  1 vangie  staff    97M  1 25  2018 headless-chromium

After the files are packaged and compressed with gzip, the size is 44 MB.

╭─ ~/D/test1[◷ 18:15:33]
╰─  tar -czvf chromedriver.tar chromedriver headless-chromium
a chromedriver
a headless-chromium
╭─ ~/D/test1[◷ 18:16:41]
╰─  ll
total 306216
-rwxr-xr-x  1 vangie  staff   7.5M  3  5 11:13 chromedriver
-rw-r--r--  1 vangie  staff    44M  3  6 18:16 chromedriver.tar
-rwxr-xr-x  1 vangie  staff    97M  1 25  2018 headless-chromium

Package the files again with the z option removed from tar. The file size is 104 MB.

╭─ ~/D/test1[◷ 18:16:42]
╰─  tar -cvf chromedriver.tar chromedriver headless-chromium
a chromedriver
a headless-chromium
╭─ ~/D/test1[◷ 18:17:06]
╰─  ll
total 443232
-rwxr-xr-x  1 vangie  staff   7.5M  3  5 11:13 chromedriver
-rw-r--r--  1 vangie  staff   104M  3  6 18:17 chromedriver.tar
-rwxr-xr-x  1 vangie  staff    97M  1 25  2018 headless-chromium

After compressed, the file is 33 MB, much smaller than the file compressed with gzip (44 MB). The total time is up to 6 minutes and 18 seconds while using gzip only takes 5 seconds.

╭─ ~/D/test1[◷ 18:17:08]
╰─  time brotli -q 11 -j -f chromedriver.tar
brotli -q 11 -j -f chromedriver.tar  375.39s user 1.66s system 99% cpu 6:18.21 total
╭─ ~/D/test1[◷ 18:24:23]
╰─  ll
total 281552
-rwxr-xr-x  1 vangie  staff   7.5M  3  5 11:13 chromedriver
-rw-r--r--  1 vangie  staff    33M  3  6 18:17 chromedriver.tar.br
-rwxr-xr-x  1 vangie  staff    97M  1 25  2018 headless-chromium

Unzip the Runtime

Take the Java Maven project for example.

Add the Unzip Dependency

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-compress</artifactId>
    <version>1.18</version>
</dependency>

<dependency>
    <groupId>org.brotli</groupId>
    <artifactId>dec</artifactId>
    <version>0.1.2</version>
</dependency>

commons-compress is a compress library by Apache and provides consistent abstract interfaces for various compression algorithms. For Brotli, only decompression APIs are supported, but this can still meet the requirement in this scenario. The org.brotli:dec package is the underlying implementation of the Brotli decompression algorithm provided by Google.

Implement the Initialize Method

public class ChromeDemo implements  FunctionInitializer {

    public void initialize(Context context) throws IOException {

        Instant start = Instant.now();

        try (TarArchiveInputStream in =
                     new TarArchiveInputStream(
                             new BrotliCompressorInputStream(
                                     new BufferedInputStream(
                                             new FileInputStream("chromedriver.tar.br"))))) {

            TarArchiveEntry entry;
            while ((entry = in.getNextTarEntry()) != null) {
                if (entry.isDirectory()) {
                    continue;
                }
                File file = new File("/tmp/bin", entry.getName());
                File parent = file.getParentFile();
                if (!parent.exists()) {
                    parent.mkdirs();
                }

                System.out.println("extract file to " + file.getAbsolutePath());

                try (FileOutputStream out = new FileOutputStream(file)) {
                    IOUtils.copy(in, out);
                }

                Files.setPosixFilePermissions(file.getCanonicalFile().toPath(),
                        getPosixFilePermission(entry.getMode()));
            }
        }

        Instant finish = Instant.now();
        long timeElapsed = Duration.between(start, finish).toMillis();

        System.out.println("Extract binary elapsed: " + timeElapsed + "ms");


    }
}

Implement the initialize method of the FunctionInitializer interface. The decompression begins with four nested streams:

  1. FileInputStream: reads files
  2. BufferedInputStream: provides cache, describes the context switch resulting from invocation and displays the read speed
  3. BrotliCompressorInputStream: decodes byte streams
  4. TarArchiveInputStream: extracts the files from the tar package

Files.setPosixFilePermissions restores the permissions of files in the tar package. The code is too long and not provided here.

Instant start = Instant.now();
...

Instant finish = Instant.now();
long timeElapsed = Duration.between(start, finish).toMillis();

System.out.println("Extract binary elapsed: " + timeElapsed + "ms");

The preceding code segment will print the decompression time (about 3.7s).

Do not forget to configure Initializer and InitializationTimeout in template.yml.

0 0 0
Share on

Alibaba Cloud Serverless

99 posts | 7 followers

You may also like

Comments