By Du Wan
LibreOffice [1] is a free and open source code office suite developed by the Documentation Foundation. The LibreOffice suite includes text processors, spreadsheets, presentation programs, vector graphics editors and chart tools, database management programs, and applications for creating and editing mathematical formulas. By using LibreOffice's CLI, Microsoft Office files can be easily converted to PDF files. See the following figure:
$ soffice --convert-to pdf --outdir /tmp /tmp/test.doc
The size of a full LibreOffice program is 2 GB. In Function Compute, however, the size of the /tmp cache directory is limited to 512 MB and that of the zip package is limited to 50 MB. Fortunately, the aws-lambda-libreoffice project [2] from the community has successfully migrated LibreOffice to the AWS Lambda platform. Based on the existing methods and experiences, I created the fc-libreoffice project, which enables LibreOffice to run on Alibaba Cloud's function calculation platform. fc-libreoffice resolves the following problems based on aws-lambda-libreoffice:
This document focuses on the entire migration process. It also records the key steps for migration to the function calculation platform in the future by using similar conversion tools. If you are interested in how to quickly build a cheap and scalable Word-to-PDF cloud service, see Launching a Word-to-PDF Cloud Service on Function Compute.
We recommend that you prepare a Debian/Ubuntu machine with high specifications because LibreOffice compilation consumes a number of computing resources. Install and configure the following tools on the machine:
For MacOS systems, use the following installation method:
brew tap vangie/formula
brew install fun
For other platforms, install by using NPM.
npm install @alicloud/fun -g
The command line tool for OSS is ossutil. Download and store the tool in the directory that $PATH points to.
We use the aliyunfc/runtime-nodejs8:build
docker image provided by fc-docker to compile LibreOffice. fc-docker provides a range of docker images, whose runtime environments are very similar to actual Function Compute environments. Because we will run LibreOffice in the nodejs8 environment, aliyunfc/runtime-nodejs8:build
is used in this case. The tab image requires more basic packages compared with other images.
Run the following command to start a container for building LibreOffice.
docker run --name libre-builder --rm -v $(pwd):/code -d -t --cap-add=SYS_PTRACE --security-opt seccomp=unconfined aliyunfc/runtime-nodejs8:build bash
In this case, a container named lipo-builder is started and the current directory is mounted to the /code directory of the file system in the container. The additional parameter --cap-add=SYS_PTRACE --security-opt seccomp=unconfined
is required for CPP program compilation. If this parameter is missing, you will be prompted with a warning. Here, -d
indicates daemon and -t
indicates tty. The bash
command prevents the container from exiting. --rm
indicates that the container is automatically deleted once it stops.
Now, enter the container to install the compilation tool.
apt-get install -y ccache
apt-get build-dep -y libreoffice
CCache is a compilation tool that can accelerate multi-compilation of GCC to the same program. Although the initial compilation process takes a relatively long time, CCache can significantly accelerate subsequent compilation processes.
The build-dep subcommand of apt-get builds an environment for the software programs to be complied. Specifically, it installs all the required tools and packages.
git clone --depth=1 git://anongit.freedesktop.org/libreoffice/core libreoffice
cd libreoffice
Add the --depth=1
parameter because full cloning is time-consuming for a large-scale LibreOffice project and the Git submission history is useless for compilation.
# For compilation of the software program multiple times, this setting can accelerate the progress of subsequent compilations.
ccache --max-size 16 G && ccache -s
Remove the unwanted modules with the --disable parameter to reduce compilation residuals.
# The most important part. Run ./autogen.sh --help to see what each option means
./autogen.sh --disable-report-builder --disable-lpsolve --disable-coinmp \
--enable-mergelibs --disable-odk --disable-gtk --disable-cairo-canvas \
--disable-dbus --disable-sdremote --disable-sdremote-bluetooth --disable-gio --disable-randr \
--disable-gstreamer-1-0 --disable-cve-tests --disable-cups --disable-extension-update \
--disable-postgresql-sdbc --disable-lotuswordpro --disable-firebird-sdbc --disable-scripting-beanshell \
--disable-scripting-javascript --disable-largefile --without-helppack-integration \
--without-system-dicts --without-java --disable-gtk3 --disable-dconf --disable-gstreamer-0-10 \
--disable-firebird-sdbc --without-fonts --without-junit --with-theme="no" --disable-evolution2 \
--disable-avahi --without-myspell-dicts --with-galleries="no" \
--disable-kde4 --with-system-expat --with-system-libxml --with-system-nss \
--disable-introspection --without-krb5 --disable-python --disable-pch \
--with-system-openssl --with-system-curl --disable-ooenv --disable-dependency-tracking
Start compiling
make
The compilation result is stored in the ./instdir/
directory.
Run the strip command to remove the symbol and compilation information from the binary file.
# this will remove ~100 MB of symbols from shared objects
strip ./instdir/**/*
Delete the unnecessary files.
# remove unneeded stuff for headless mode
rm -rf ./instdir/share/gallery \
./instdir/share/config/images_*.zip \
./instdir/readmes \
./instdir/CREDITS.fodt \
./instdir/LICENSE* \
./instdir/NOTICE
Run the following command to test whether the compiled soffice can properly convert a .txt file to a .pdf file.
echo "hello world" > a.txt
./instdir/program/soffice --headless --invisible --nodefault --nofirststartwizard \
--nolockcheck --nologo --norestore --convert-to pdf --outdir $(pwd) a.txt
# archive
tar -zcvf lo.tar.gz instdir
Run the following command to copy the lo.tar.gz file in the container file system to the host file system.
docker cp libre-builder:/code/libreoffice/lo.tar.gz ./lo.tar.gz
Gzip, Zopfli, and Brotli are three open source compression algorithms. When you use these algorithms to compress a chromium file of 130 MB, their compression results are as follows:
File | Algorithm | MiB | Compression Ratio | Decompression Duration |
chromium | - | 130.62 | - | - |
chromium.gz | Gzip | 44.13 | 66.22% | 0.968s |
chromium.gz | Zopfli | 43.00 | 67.08% | 0.935s |
chromium.br | Brotli | 33.21 | 74.58% | 0.712s |
From the preceding table, we can see that the Brotli algorithm is the most efficient.
Because aliyunfc/runtime-nodejs8:build
is based on the released Debian Jessie, it is difficult to install Brotli on Debian Jessie. Therefore, we used the Ubuntu container to convert tar.gz files to tar.br files.
docker run --name brotli-util --rm -v $(pwd):/root -w /root -d -t ubuntu:18.04 bash
docker exec -t brotli-util apt-get update
docker exec -t brotli-util apt-get install -y brotli
docker exec -t brotli-util gzip -d lo.tar.gz
docker exec -t brotli-util brotli -q 11 -j -f lo.tar
In the current directory, a lo.tar.br file is generated.
To run soffice in the nodejs8 environment of Function Compute, you must run NPM to install the decompression dependency package @shelf/aws-lambda-brotli-unpacker
for tar.br files, and run apt-get to install the libnss3
dependency. Start a nodejs8 container to ensure that the dependency installation and runtime environments are consistent.
docker run --rm --name libreoffice-builder -t -d -v $(pwd):/code --entrypoint /bin/sh aliyunfc/runtime-nodejs8
Note: @shelf/aws-lambda-brotli-unpacker
has a native binding, so packaging and uploading files by running npm install on MacOS systems does not work.
docker exec -t libreoffice-builder npm install
Because the global deb package cannot be installed when Function Compute is running, download deb and dependent deb packages and install them to the current working directory rather than to the system directory. In the current working directory, deb and the dependent deb packages can be packaged and uploaded along with the code.
docker exec -t libreoffice-builder apt-get install -y -d -o=dir::cache=/code libnss3
docker exec -t libreoffice-builder bash -c 'for f in $(ls /code/archives/*.deb); do dpkg -x $f $(pwd) ; done;'
libnss3 contains many .so dynamic link library files. The DDLs set in the LD_LIBRARY_PATH environment variable can be found only in a Linux system. However, on Function Compute, the /code/lib directory is added to LD_LIBRARY_PATH by default. Therefore, a script is developed to link all .so files to the /code/lib directory.
docker exec -t libreoffice-builder bash -c "rm -rf /code/archives/; mkdir -p /code/lib;cd /code/lib; find ../usr/lib -type f \( -name '*.so' -o -name '*.chk' \) -exec ln -sf {} . \;"
To use the lo.tar.br file, upload it to OSS first.
ossutil cp $SCRIPT_DIR/../node_modules/fc-libreoffice/bin/lo.tar.br oss://${OSS_BUCKET}/lo.tar.br \
-i ${ALIBABA_CLOUD_ACCESS_KEY_ID} -k ${ALIBABA_CLOUD_ACCESS_KEY_SECRET} -e oss-${ALIBABA_CLOUD_DEFAULT_REGION}.aliyuncs.com -f
Download the tar.br package with the initializer method.
module.exports.initializer = (context, callback) => {
store = new OSS({
region: `oss-${process.env.ALIBABA_CLOUD_DEFAULT_REGION}`,
bucket: process.env.OSS_BUCKET,
accessKeyId: context.credentials.accessKeyId,
accessKeySecret: context.credentials.accessKeySecret,
stsToken: context.credentials.securityToken,
internal: process.env.OSS_INTERNAL === 'true'
});
if (fs.existsSync(binPath) === true) {
callback(null, "already downloaded.");
return;
}
co(store.get('lo.tar.br', binPath)).then(function (val) {
callback(null, val)
}).catch(function (err) {
callback(err)
});
};
Use the @shelf/aws-lambda-brotli-unpacker
npm package to decompress the lo.tar.br package.
const {unpack} = require('@shelf/aws-lambda-brotli-unpacker');
const {execSync} = require('child_process');
const inputPath = path.join(__dirname, '..', 'bin', 'lo.tar.br');
const outputPath = '/tmp/instdir/program/soffice';
module.exports.handler = async event => {
await unpack({inputPath, outputPath});
execSync(`${outputPath} --convert-to pdf --outdir /tmp /tmp/example.docx`);
};
Compose a template. yml file and write all Function Compute settings to the file. Run the fun deploy
command to deploy a function.
ROSTemplateFormatVersion: '2015-09-01'
Transform: 'Aliyun::Serverless-2018-04-03'
Resources:
libre-svc: # service name
Type: 'Aliyun::Serverless::Service'
Properties:
Description: 'fc test'
Policies:
- AliyunOSSFullAccess
libre-fun: # function name
Type: 'Aliyun::Serverless::Function'
Properties:
Handler: index.handler
Initializer: index.initializer
Runtime: nodejs8
CodeUri: './'
Timeout: 60
MemorySize: 640
EnvironmentVariables:
ALIBABA_CLOUD_DEFAULT_REGION: ${ALIBABA_CLOUD_DEFAULT_REGION}
OSS_BUCKET: ${OSS_BUCKET}
OSS_INTERNAL: 'true'
In actual scenarios, it is inappropriate to write both keys and variables to the template.yml file. To separate the code from the settings, the variable placeholders ${ALIBABA_CLOUD_DEFAULT_REGION} and ${OSS_BUCKET}
are used in the example.
Replace the placeholders with envsubst.
SCRIPT_DIR=`dirname -- "$0"`
source $SCRIPT_DIR/../.env
export ALIBABA_CLOUD_DEFAULT_REGION OSS_BUCKET
envsubst < $SCRIPT_DIR/../template.yml.tpl > $SCRIPT_DIR/../template.yml
cd $SCRIPT_DIR/../
All the preceding settings are written to the .env file. dotenv is a common community solution, which is supported by various tools.
This document explains how to compile LibreOffice, which is challenging during the migration process. LibreOffice also requires installing npm native binding and apt-get to the local directory. For this reason, this example also applies to Function Compute dependencies. The steps in this document strongly depend on the fc-docker image for both compilation and dependency installation. This image resolves the problem of environmental differences, greatly reducing migration difficulty. Loading data when a large file is running is another common Function Compute problem. In the conversion tool scenario, a binary program is usually a large file. In the machine learning scenario, a data file in the training model is normally large. In both scenarios, you can use OSS to download and decompress the packages. Given that Function Compute now supports NAS, the use of NAS to mount shared online storage is also applicable.
For the full source code used in this document, refer to the fc-libreoffice project.
1. https://en.wikipedia.org/wiki/LibreOffice
2. How to Run LibreOffice in AWS Lambda for Dirty-Cheap PDFs at Scale
3. https://github.com/alixaxel/chrome-aws-lambda
4. https://github.com/shelfio/aws-lambda-brotli-unpacker
Deploy Microservices with Function Compute: Visitor Card of Cloud Customer Service
99 posts | 7 followers
FollowAlibaba Clouder - January 18, 2021
Alibaba Cloud Serverless - March 19, 2019
Alibaba Cloud Serverless - April 7, 2020
Alex - June 21, 2019
Alibaba Clouder - May 23, 2019
Alibaba Clouder - May 23, 2019
99 posts | 7 followers
FollowAlibaba Cloud Function Compute is a fully-managed event-driven compute service. It allows you to focus on writing and uploading code without the need to manage infrastructure such as servers.
Learn MoreVisualization, O&M-free orchestration, and Coordination of Stateful Application Scenarios
Learn MoreSecure and easy solutions for moving you workloads to the cloud
Learn MoreServerless Application Engine (SAE) is the world's first application-oriented serverless PaaS, providing a cost-effective and highly efficient one-stop application hosting solution.
Learn MoreMore Posts by Alibaba Cloud Serverless