By Johnson Chiang, Solutions Architect
Alibaba Cloud Function Compute (FC) is a, serverless FaaS with an event-driven programming model. This tutorial demonstrates how you can develop a PDF-to-Text conversion function with Function Compute, and you will see the simple yet powerful paradigm of FC to implement such helper service.
This tutorial is organized into the following sections. Each section represents a specific task when developing a Function Compute service:
Preparing OSS:
<YOUR_BUCKET>
./in
and /out
, where you will upload source PDF files to the former and converted output text files will be placed in the latter.Currently FC supports runtimes including Java/Python/PHP/Node.js. We will code upon Node.js and use the npm pdfreader module to read text from PDF files.
// required modules
var OSS = require('ali-oss').Wrapper; // FC built-in module
var PdfReader = require("pdfreader").PdfReader; // packaged 3rd-party PDF parser module
console.log('Loading function');
module.exports.handler = function (eventBuf, ctx, callback) {
console.log('Received event:', eventBuf.toString());
let eventObj = JSON.parse(eventBuf);
let ossEvent = eventObj.events[0];
let ossRegion = "oss-" + ossEvent.region;
// Init oss client instance where credentials can be retrieved from context.
let ossClient = new OSS({
region: ossRegion,
accessKeyId: ctx.credentials.accessKeyId,
accessKeySecret: ctx.credentials.accessKeySecret,
stsToken: ctx.credentials.securityToken
});
ossClient.useBucket(ossEvent.oss.bucket.name); // Bucket name is from OSS event
// Source PDF from "in/<filename>.pdf", processed to "out/<filename>.txt"
let newKey = ossEvent.oss.object.key.replace("in/", "out/").replace(".pdf", ".txt");
// Parse PDF to text
console.log("Getting object: " + ossEvent.oss.object.key);
ossClient.get(ossEvent.oss.object.key).then(function (val) {
let pdfBuf = val.content;
let convertedTxt = "";
console.log("Start parsing PDF buffer.");
new PdfReader().parseBuffer(pdfBuf, function(err, item) {
if (err) {
console.error("Failed to read PDF binary");
callback (err);
return;
}
if (!item) {
console.log("Done parsing text.");
const outBuf = Buffer.from(convertedTxt, "utf8");
// Upload converted text as buffer to "out" directory
ossClient.put(newKey, outBuf).then(function (val) {
console.log("Put object: ", val);
callback(null, val);
return;
}).catch(function (err) {
console.error("Failed to put object: %j", err);
callback(err);
return;
});
return;
}
if (item.text) {
console.log("Continue parsed text: " + item.text);
convertedTxt += item.text;
}
});
}).catch (function (err) {
console.error("Failed to get object: %j", err);
callback(err);
return;
});
};
$ ls -l; du -hs .
total 8
-rw-r--r--@ 1 owner staff 2600 Jan 21 20:00 index.js
drwxr-xr-x 5 owner staff 170 Jan 21 20:00 node_modules
180M .
$ du -h -d3 | sort -nr | head -n8
660K ./node_modules/pdf2json/node_modules
180M ./node_modules
180M .
178M ./node_modules/pdf2json
176M ./node_modules/pdf2json/test
108K ./node_modules/pdf2json/lib
88K ./node_modules/pdf2json/.idea
28K ./node_modules/pdfreader/lib
$ zip pdf-to-text.zip index.js node_modules/
adding: index.js (deflated 63%)
adding: node_modules/ (stored 0%)
$ ls -lh pdf-to-text.zip
-rw-r--r-- 1 owner staff 1.3K Jan 21 20:10 pdf-to-text.z
ap-southeast-1
.
FileConvertService
.
AliyunOSSFullAccess
, AliyunLogFullAccess
, and AliyunFCFullAccess
.
nodejs6
(or nodejs8
)OSS
newPDFTrigger
)my-cool-demo
)oss:ObjectCreated:PostObject
and oss:ObjectCreated:PutObject
. When an object is uploaded to the specified bucket directory and matches the trigger rule, OSS will publish an trigger event to invoke the function code.in/
with Suffix .pdf
<YOUR_BUCKET>/in
to invoke the FC function.
Then, check the <YOUR_BUCKET>/out
, and see the pdf-sample.txt created and view the texts recognized from the PDF file. That's it.
For more information, see documentation at Log Service.
In this tutorial, you have completed a quick and powerful file conversion service using FC with OSS trigger. Here are some suggestions for you to get more information we recommend for next:
Breaking the Limits of Relational Databases: An Analysis of Cloud-Native Database Middleware (2)
How to Use Alibaba Cloud DNS's Private Zone and GTM Features
2,599 posts | 762 followers
FollowAlibaba Clouder - May 6, 2019
Alibaba Cloud Serverless - April 7, 2020
Alibaba Clouder - January 18, 2021
Alibaba Clouder - June 16, 2020
Alibaba Cloud Serverless - April 7, 2020
Alibaba Clouder - October 30, 2018
2,599 posts | 762 followers
FollowAlibaba Cloud Function Compute is a fully-managed event-driven compute service. It allows you to focus on writing and uploading code without the need to manage infrastructure such as servers.
Learn MoreAn encrypted and secure cloud storage service which stores, processes and accesses massive amounts of data from anywhere in the world
Learn MoreElastic and secure virtual cloud servers to cater all your cloud hosting needs.
Learn MoreMore Posts by Alibaba Clouder
Raja_KT February 14, 2019 at 6:51 am
Whaw , it helps a lot, just make tweak a little in the code, copy, paste...