By Yi Xian
This article discusses how you can use Brotli to compress large files so that you're able to upload them to Function Compute.
For some background on this, Function Compute limits the maximum size of the zipped code package to be uploaded to 50 MB. However, in some scenarios, code packages may exceed this size limit, for example, uncropped serverless-chrome. Other examples include LibreOffice and trained machine learning models.
Currently there are three methods that can be used to solve this problem:
The following table looks at the advantages and disadvantages of these three methods.
Normally, the startup speed is relatively fast if the code package is less than 50 MB. Putting data and code together can simplify the engineering work: No additional scripts are required to update OSS or NAS.
Brotli is an open-source compression algorithm developed by Google engineers. Brotli is used as the compression algorithm for HTTP transfer in the latest version of the popular browsers. The following figures obtained on the Internet show the benchmark testing for Brotli and other common compression algorithms.
As shown in the three figures, compared with gzip, and also xz and bz2, Brotli has the highest compression ratio and the slowest compression speed and its decompression speed is close to that of gzip.
However, the example scenario that will discussed in this article is not sensitive to the compression speed. Only one compression task is required during the development preparation phase.
First, let's see how to create a compressed file.
If you're using MacOS, you can use this command in Brew:
brew install brotli
And if you're using Windows, youcan go to this page to download Brotli: https://github.com/google/brotli/releases
The two files before packaging are 7.5 MB and 97 MB, respectively.
╭─ ~/D/test1[◷ 18:15:21]
╰─ ll
total 213840
-rwxr-xr-x 1 vangie staff 7.5M 3 5 11:13 chromedriver
-rwxr-xr-x 1 vangie staff 97M 1 25 2018 headless-chromium
After the files are packaged and compressed with gzip, the size is 44 MB.
╭─ ~/D/test1[◷ 18:15:33]
╰─ tar -czvf chromedriver.tar chromedriver headless-chromium
a chromedriver
a headless-chromium
╭─ ~/D/test1[◷ 18:16:41]
╰─ ll
total 306216
-rwxr-xr-x 1 vangie staff 7.5M 3 5 11:13 chromedriver
-rw-r--r-- 1 vangie staff 44M 3 6 18:16 chromedriver.tar
-rwxr-xr-x 1 vangie staff 97M 1 25 2018 headless-chromium
Package the files again with the z option removed from tar. The file size is 104 MB.
╭─ ~/D/test1[◷ 18:16:42]
╰─ tar -cvf chromedriver.tar chromedriver headless-chromium
a chromedriver
a headless-chromium
╭─ ~/D/test1[◷ 18:17:06]
╰─ ll
total 443232
-rwxr-xr-x 1 vangie staff 7.5M 3 5 11:13 chromedriver
-rw-r--r-- 1 vangie staff 104M 3 6 18:17 chromedriver.tar
-rwxr-xr-x 1 vangie staff 97M 1 25 2018 headless-chromium
After compressed, the file is 33 MB, much smaller than the file compressed with gzip (44 MB). The total time is up to 6 minutes and 18 seconds while using gzip only takes 5 seconds.
╭─ ~/D/test1[◷ 18:17:08]
╰─ time brotli -q 11 -j -f chromedriver.tar
brotli -q 11 -j -f chromedriver.tar 375.39s user 1.66s system 99% cpu 6:18.21 total
╭─ ~/D/test1[◷ 18:24:23]
╰─ ll
total 281552
-rwxr-xr-x 1 vangie staff 7.5M 3 5 11:13 chromedriver
-rw-r--r-- 1 vangie staff 33M 3 6 18:17 chromedriver.tar.br
-rwxr-xr-x 1 vangie staff 97M 1 25 2018 headless-chromium
Take the Java Maven project for example.
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-compress</artifactId>
<version>1.18</version>
</dependency>
<dependency>
<groupId>org.brotli</groupId>
<artifactId>dec</artifactId>
<version>0.1.2</version>
</dependency>
commons-compress
is a compress library by Apache and provides consistent abstract interfaces for various compression algorithms. For Brotli, only decompression APIs are supported, but this can still meet the requirement in this scenario. The org.brotli:dec
package is the underlying implementation of the Brotli decompression algorithm provided by Google.
public class ChromeDemo implements FunctionInitializer {
public void initialize(Context context) throws IOException {
Instant start = Instant.now();
try (TarArchiveInputStream in =
new TarArchiveInputStream(
new BrotliCompressorInputStream(
new BufferedInputStream(
new FileInputStream("chromedriver.tar.br"))))) {
TarArchiveEntry entry;
while ((entry = in.getNextTarEntry()) != null) {
if (entry.isDirectory()) {
continue;
}
File file = new File("/tmp/bin", entry.getName());
File parent = file.getParentFile();
if (!parent.exists()) {
parent.mkdirs();
}
System.out.println("extract file to " + file.getAbsolutePath());
try (FileOutputStream out = new FileOutputStream(file)) {
IOUtils.copy(in, out);
}
Files.setPosixFilePermissions(file.getCanonicalFile().toPath(),
getPosixFilePermission(entry.getMode()));
}
}
Instant finish = Instant.now();
long timeElapsed = Duration.between(start, finish).toMillis();
System.out.println("Extract binary elapsed: " + timeElapsed + "ms");
}
}
Implement the initialize
method of the FunctionInitializer
interface. The decompression begins with four nested streams:
FileInputStream
: reads filesBufferedInputStream
: provides cache, describes the context switch resulting from invocation and displays the read speedBrotliCompressorInputStream
: decodes byte streamsTarArchiveInputStream
: extracts the files from the tar packageFiles.setPosixFilePermissions
restores the permissions of files in the tar package. The code is too long and not provided here.
Instant start = Instant.now();
...
Instant finish = Instant.now();
long timeElapsed = Duration.between(start, finish).toMillis();
System.out.println("Extract binary elapsed: " + timeElapsed + "ms");
The preceding code segment will print the decompression time (about 3.7s).
Do not forget to configure Initializer
and InitializationTimeout
in template.yml
.
A Proof of the Auto Scaling Capabilities of Function Compute
99 posts | 7 followers
FollowAlibaba Cloud Serverless - March 19, 2019
OpenAnolis - May 26, 2022
Alibaba Clouder - January 18, 2021
Alibaba Cloud Serverless - March 19, 2019
Alibaba Cloud Serverless - April 7, 2020
JJ Lim - November 10, 2021
99 posts | 7 followers
FollowAlibaba Cloud Function Compute is a fully-managed event-driven compute service. It allows you to focus on writing and uploading code without the need to manage infrastructure such as servers.
Learn MoreAlibaba Cloud offers an accelerated global networking solution that makes distance learning just the same as in-class teaching.
Learn MoreConnect your business globally with our stable network anytime anywhere.
Learn MoreVisualization, O&M-free orchestration, and Coordination of Stateful Application Scenarios
Learn MoreMore Posts by Alibaba Cloud Serverless