Skip navigation
2018

There are many definitions and interpretation of the "Serverless" term, but if I have to say it in a few words, it would be: a software architecture, which allow the (Dev)Ops team not to care about the backend infrastructure (there are still servers, they just don't care about them). Depending on the use case, there are different components that comprise a Serverless architecture:

  • Cloud data stores
  • API gateways
  • Functions as a Service

In this blog post we examine in more detail how Functions as a Service (FaaS) can be implement on by leveraging the vCloud Director (vCD) platform.

FaaS Requirements

First, lets define some basic requirements for a FaaS solution.

  • As a FaaS developer I would like to be able to create a function with the following properties
    • Name - the name of the function
    • Code - the function executable code, complying with the FaaS API
    • Trigger - the criteria which if met will tell the platform the run the function code. In the vCD world we can define this as two events:
      • External API call to an endpoint defined by the trigger
      • A notification event triggered as a result of an operation like creation of a VM.
  • As a FaaS developer I would like my functions to not be limited in terms of scale, or how many events they can handle.
  • As a FaaS developer I would like my function to be run in a sandbox, i.e. other tenants, should not have access to my functions.
  • As a Service Provider I would like individual function calls to be limited in amount of resources they are going to use.
  • As a FaaS developer I would like to be able to update a function.
  • As a FaaS developer I would like to be able to remove a function.

Architecture Alternatives

There are probably many architectures that would satisfy these requirements, but I would touch on two in this blog post and will discuss their pros and cons.The first part of both solution architectures is the same: when an event or an external API call is triggered, send a message to a queue. This is the OOTB vCD extensibility mechanism.

Gateway Based Alternative

This alternative relies on a FaaS gateway to handle the request for a function call by:

  1. Starting a previously created container
  2. Running the function with the request payload
  3. Stoping the container

faas_arch_gateway-2.png

This architecture has an obvious drawback: the time it takes to start the container is added the time a request can be handled, but on the other hand is:

  • Relatively simple
  • Scalable by nature

Queues Based Alternative

The second alternative replaces the FaaS gateway with a very simple router, which can route the massages to e function specific queues. Modern messaging queue system can handle message routing, however, it is described in the architecture to clearly communicate the need of message routing as the first queue is not tenant aware.

The function containers would need be "fatter" as the code running inside would need to handle messages from a message queue and translate them into a request to the function.

faas_arch_queue.png

This approach would deliver much faster response time, but it would require a monitoring/scaling mechanism of the containers, which is part of container orchestration solutions like Kubernetes.

Solution Implementation (PoC)

For the current PoC, I've decided to go with the simpler architecture and use vRO as a FaaS Gateway. We will cover only the external API endpoints type of trigger.

Function Definition

To cerate a function we would need to provide the Programming Language, endpoint URI and the function code itself.

Screen Shot 2018-07-18 at 18.26.00.png

When we hit create, it will:

  1. Store the function in a persistent store.
  2. Create a container
  3. Register the endpoint using the vCD API extensibility

Screen Shot 2018-07-18 at 18.26.10.png

There are few things to notice here:

  1. The status is initializing, because creating the container takes a minute or so. This is why we've made the process async.
  2. The route is tenant-specific. In the request form we provided "hello", but the solution generated "/api/org/pscoe/hello".

Container

Our container is relatively simple. It has:

  • Dockerfile, which describes the container

Screen Shot 2018-07-19 at 12.05.11.png

  • handler.js which contains the function code

Screen Shot 2018-07-19 at 12.06.05.png

  • package.json, which is used by the NodeJS package installer (NPM)
  • server.js, which is a very simple express web server, used to redirect request to the function code

Screen Shot 2018-07-19 at 12.03.05.png

Finally, we use docker to build our image and create a container from it.

 

Screen Shot 2018-07-18 at 18.29.24.png

Function Calls

Once the function is in ready state, we can test it using the "Test Function" button, which makes a simple GET HTTP request using the function's route.

Screen Shot 2018-07-19 at 13.02.52.png

FaaS Gateway:

  1. Starts the container.
  2. Makes an HTTP POST request to the container port with the body of the original request.
  3. Stops the container.

Screen Shot 2018-07-19 at 13.08.32.png

Result

Screen Shot 2018-07-18 at 18.28.05.png

When using a pay-as-you-go type service, you would like to see how much you have spent and how much you are going to pay at the end of the month. You might also want to have an insight on the cost of services for the last several months.

The native UI extensibility of vCloud Director (vCD), introduced in version 9.0, allows us to build any type of custom UI and incorporate it into the product to offer seamless user experience. This includes dashboards and charts - the ultimate visual aid for statistical data.

Solution Architecture

showback.png

Our solution consists of 3 main components:

  1. UI extension, containing a dashboard with different charts presenting billing information.
  2. Storage, exposed through a vCD API extension, for retrieving billing data.
  3. Data Collectors, small scheduled processes that pull data from billing solutions (like VRBC).

You may wonder "Why are we not pulling the data directly from VRBC?". There are a couple of reasons:

  • To optimize for performance, the data is ready to be consumed by the UI.
  • To support different billing data sources. Often, the service providers charge not only for the infrastructure, but also for additional custom services they offer, e.g. API calls to messaging queue.

Solution Implementation (PoC)

Lets try to build a simplified version of this solution as a proof of concept.

Data

For our PoC, we will prepare the data manually in a JSON format.

Screen Shot 2018-07-14 at 9.17.29.png

API Extension

To serve the data we need an API extension.

Screen Shot 2018-07-14 at 9.22.00.png

Dashboard

Once we have the data served, we can display it using a charting library in the vCD user interface and present the showback information to the tenant administrator.

Screen Shot 2018-07-14 at 8.15.23.png

Software systems nowadays need to handle constantly changing load levels while being cost-effective. Peek load has always being a challenge, requiring a lot of spare capacity, and in recent years elasticity has proven itself as the best option.

Dynamically adjusting resources up or down to fit the current demand for the system is de facto the golden standard and benefits both the business and service provider. Scaling can be either vertical (adding more resources to a single node, e.g. more CPU and memory) or horizontal (adding more nodes in the system). The vertical scaling has some obvious limitations and scaling down is usually a problem.

In this blog post, we will present a horizontal autoscaling solution using:

  • vCloud Director (vCD)'s native UI extensibility as management frontend
  • vRealize Orchestrator (vRO) as backend.
  • vRealize Operations (vROps) as monitoring system.

Solution Architecture

First, we have to decide what our solution architecture should be. As I've already mentioned, vCD will serve as a management interface, vRO will be the orchestration engine, and vROps the monitoring component.

Here is a high-level system context diagram showing what the components are and how they interact with each other.

autoscale-blog-solution-diagram.png

There are two main flows:

  1. When the user manages (creates/deletes/updates) autoscaling rules:
    1. The user uses the vCD UI and the custom extension to perform the required tasks.
    2. The vCD UI extension uses API extension endpoint defined by vRO, which works through RabbitMQ notifications and makes it async by nature. This means, we can scale the vRO instances very easily.
    3. The vRO code that handles the notifications, either persists the changes requested by the user, or manipulates vROps domain objects to enable monitoring of resources.
  2. When monitoring event happens:
    1. vROps previously being configured to monitor the group of VMs to be autoscaled, triggers an alert.
    2. The alert is sent to vRO, which in our case would be an SNMP trap, which comes out of the box with vROps. But a more fitted approach would be to use an AMQP protocol and the RabbitMQ cluster for handling the notification.
    3. The vRO code handles the alert by scaling out or scaling in, based on the pre-configured definition.

The Solution

Let's see how this will look like in reality.

The vApp to Scale

To illustrate the autoscaling solution, we will use a simple web application within a vApp called website.

Screen Shot 2018-07-03 at 14.21.41.png

The List of Rules

Our custom UI extension lists all created rules.

Screen Shot 2018-07-03 at 14.19.21.png

Creating a Rule

Creating a rule is a simple as filling out the following form. What need to be provided is the following information:

  • Rule name - the name of the rule
  • Template - the vApp template with a single VM inside
  • Target vApp - is where are going to provision the new VMs
  • Edge gateway - which holds the load balancer
  • Pool name - the pool name of the load balancer
  • Thresholds - the % of CPU and memory average usage, which would trigger either scale out or scale in

Screen Shot 2018-07-03 at 14.23.09.png

The Custom APIs

To define the custom APIs, we use an abstraction library to hide the AMQP massage handling complexity. We provide the method, the API URI, and the callback function to handle the request. For example, when the UI hits the https://<vcd.hostname>/api/org/:orgId/autoscale with HTTP request method GET, it will invoke the callback function which will return all rules.

Screen Shot 2018-07-02 at 8.44.44.png

vROps Symptoms and Alert Definitions

We need to have several things predefined before we can create the scaling objects:

  1. Group Type - the object type of our group, used to define metrics on.
  2. Two super metrics to enable us to monitor average CPU and Memory for the group of VMs:
    1. SM-AvgVMMemUsage → 
      avg(${adapterkind=VMWARE, resourcekind=VirtualMachine, metric=mem|usage_average, depth=1})
      avg(Virtual Machine: Memory|Usage)
    2. SM-AvgVMCPUUsage →
      avg(${adapterkind=VMWARE, resourcekind=VirtualMachine, metric=cpu|usage_average, depth=1}) + avg(${adapterkind=VMWARE, resourcekind=VirtualMachine, attribute=cpu|readyPct, depth=1})
      avg(Virtual Machine: CPU|Usage)+avg(Virtual Machine: CPU|Ready)

Finally, we have to enable our SuperMetrix in the default policy.

Symptom Definitions

Screen Shot 2018-07-02 at 8.54.27.png

Alert Definitions

Screen Shot 2018-07-02 at 8.55.46.png

The Rule Store

For the purposes of this post, the RuleStore will be backed as a plain JSON file, stored as a resource element in vRO. In reality, it might be backed as a SQL or NoSQL DB, depending on what type of operations you want to perform.

Screen Shot 2018-07-02 at 8.49.41.png

When an Alert Triggers

When an alert is triggered, for example because of high CPU load, a notification will be sent to vRO, which will take the appropriate action to scale out.

Screen Shot 2018-07-02 at 8.53.37.png

The vRO handler looks simple, because we use a library to abstract the SNMP notification. It provides a way to filter the trapMessage and an async action execution to perform the operation asynchronously.

 

Screen Shot 2018-07-02 at 9.06.49.png

The ScaleOut Operation

The action itself does 3 things:

  • Deploys a New VM from a Template

Screen Shot 2018-07-02 at 8.42.30.png

  • Updates the preconfigured load balancer pool with the new member

Screen Shot 2018-07-02 at 9.12.20.png

  • Updates the vROps group with the new machine.

Screen Shot 2018-07-02 at 9.13.15.png

 

The Result

The machine can now serve requests.

output_XDok61.gif