In the world of Cloud Foundry, logs are like the platform’s diary – verbose, detailed, and sometimes chattier than your neighborhood gossip. While this volume of information is invaluable for troubleshooting and monitoring, it can also present challenges in terms of storage, processing, and extracting actionable insights.
In Episode 28 of Cloud Foundry Weekly, we are joined again by special guest Mark Fynes, the Mad Scientist Developer himself, to show how Spring Cloud Data Flow (SCDF) can be a tool that helps divert this log flood into more manageable streams of useful information.
The Log Management Conundrum
Cloud Foundry’s logging feature is a powerful tool that provides an deep visibility into the operation of your platform. But with this great power, as they say, comes some responsibility and without the right operating model and tools, the CF logging capabilities can lead to:
Storage headaches due to sheer volume
Difficulty in quickly finding relevant information
Increased costs for log management and analysis tools
Challenges in meeting specific compliance or auditing requirements
Enter Spring Cloud Data Flow: A Log Processing Superhero
Spring Cloud Data Flow (SCDF) is a Spring project for building data integration and real-time data processing pipelines. When applied to log management in Cloud Foundry, it offers several advantages:
- Customizable Log Filtering: Create precise filters to capture only the log entries you need.
- Flexible Transformation: Reshape log data to fit your specific use cases or tool requirements.
- Visual Pipeline Design: Craft complex log processing workflows without going deep into code development.
- Scalability: Easily handle growing log volumes as your Cloud Foundry deployment expands.
- Native Integration: Available as a service in Tanzu Platform for Cloud Foundry (τPCF) marketplace for seamless deployment.
Practical Applications of SCDF in Log Management
- Compliance and Auditing: Instead of sifting through terabytes of logs to find login failures, create a dedicated stream that captures and formats this information in real-time to help keep the security wonks from yelling at you.
- Cost Optimization: Reduce the volume of logs sent to your Security Information and Events Management (SEIM) and/or log management software by pre-processing and filtering at the source. Your wallet (and your SIEM) will thank you, because remember, these tools generally charge by the gigabyte, so don’t spend a dollar for something that isn’t useful.
- Application Performance Monitoring: Generate custom metrics based on log data to keep a pulse on your applications’ health.
- Multi-Foundation Insights: Aggregate and process logs from multiple Cloud Foundry foundations for a holistic view of your ecosystem.
Getting Your Feet Wet with SCDF
- Deploy SCDF: Spin up an instance of the Spring Cloud Data Flow service from your Cloud Foundry marketplace. The tPCF version of SCDF will automatically create RabbitMQ and MySQL underlying services for you.
cf create-service p-dataflow standard my-dataflow
- Initialize your SCDF instance: The simplest way to set up your SCDF instance is by importing the application adapters for it that leverage the auto-provisioned RMQ broker. First install the ‘SCDF plugin’ for the cf-cli, then import the SCDF RMQ application starters.
cf install-plugin -r CF-Community "spring-cloud-dataflow-for-pcf"
cf dfsh my-dataflow
app import https://dataflow.spring.io/rabbitmq-maven-latest
- Testing your SCDF instance: This simple test will emit a per-second tick log event we can observe from the deployed ‘log’ SCDF sink component.
stream create --name "test" --definition "time | log" --deploy
# wait to see 'logs' app running
# cf apps
# tail the "log" SCDF sink process
cf logs tl9XXmM-test-log-v1
# timestamps should appear every second
# cleanup
cf dfsh my-dataflow
stream destroy test
- Set Up Syslog Ingestion: We have a set of steps needed for the SCDF rSyslog injection setup
- Platform Pre-Requirement
- Configure a tcp-domain for the platform – A ‘shared’ (TCP) domain will be created by your platform operator. They would configure the platform to deploy at least one ‘tcp-router’ and then register them behind a load-balancer.
- Configure the tPCF foundation rSyslog to forward to this host and available port (tcp.<my-domain>:<chosen-port>)
- Note – the default quotas in a tPCF foundation do not accommodate reserved route creation so update relevant org quotas to permit tcp routes
- Platform Pre-Requirement
# Platform Operations
cf create-shared-domain tcp.<my-domain> --router-group default-tcp
cf update-org-quota <quota-name> --reserved-route-ports 20
- Deploy the ‘syslog’ SCDF source adapter as part of a stream, configuring it to listen on the tcp port rSyslog will be forwarded to.
cf dfsh my-dataflow
stream create --name syslog --definition "syslog --rfc=5424 --port 50000 > :cf-syslog" --deploy
# note the reusable 'named destination' sink. We can create multiple streams.
# lets start with a test
stream create --name syslog-test --definition ":cf-syslog > log" --deploy
# tail the logs from the 'log' sink app
#cf apps
cf logs tl9XXmM-syslog-test-log-v1
- Configure the platform to expose a second route into the deployed ‘syslog’ application. All deployed SCDF ‘apps’ in the platform use the default http route to expose an actuator readiness endpoint. We can use ‘cf curl’ commands to open a second listen port in the platform hosted container that’s running our syslog listening process. Our example above transforms the syslog into a ‘named-destination’ meaning we only need to do this once. Numerous streams can be attached to the ‘named-destination’ later
- ‘addAppPort.sh’ script available here for reference
https://github.com/tfynes-pivotal/scdf-pcf-monitor/blob/master/addAppPort.sh
Usage addAppPort.sh <app-name> <app-port> <cf-space> <cf-tcp-domain>
./addAppPort.sh tl9XXmM-cf-syslog-syslog-v1 50000 ScdfDemos tcp.<my-domain>
A Taste of SCDF in Action: UAA Log Processing
Let’s look at a simple example of processing UAA (User Account and Authentication) logs:
- Filter for UAA logs:
- Extract specific fields:
- Store processed logs in a database:
This pipeline filters UAA logs, extracts relevant fields, and stores them in a database, ready for your security team to analyze or your compliance officer to generate reports. Mark shows us an example of how to do this in stream, so head on over and watch him walk us through the demo.
- Configure a UAA-only sub-stream. This stream will leverage our existing “cf-syslog” named destination to create a new “uaa-syslog” substream.
cf dfsh my-dataflow
stream create --name uaa-syslog --definition ":cf-syslog > filter --filter.function.expression='#jsonPath(payload,''$.syslog_APP_NAME'').equals(''uaa'')' > :uaa-syslog"
stream deploy uaa-syslog
# deploy uaa-syslog test
stream create --name uaa-syslog-test --definition ":uaa-syslog > log" --deploy
Note the SCDF / tPCF has a visual streams design and operations interface – select the ‘my-dataflow’ service in apps-manager and click ‘manage’
Storing the output stream content in a Postgres Database
# Create postgres database in target cf space
cf create-service postgres <postgres-plan> uaa-db
# create stream to read from :uaa-syslog filtered stream and commit timestamp, message and structured-data fields into the database
cf dfsh my-dataflow
stream create --name uaa-jdbc --definition ":uaa-syslog > jdbc --driver-class-name=org.postgresql.Driver --columns=syslog_TIMESTAMP,syslog_HOST,syslog_MESSAGE,syslog_STRUCTURED_DATA --initialize=true"
# deploy stream, binding it to our new postgres db
stream deploy uaa-jdbc --properties "deployer.jdbc.cloudfoundry.services=uaa-db"
Note this is a prototype implementation. Further refinement may use transformer steps in the SCDF flows to extract substrings from these log messages to allow for rapid querying for relevant events
Wrapping Up
Spring Cloud Data Flow offers a powerful yet accessible solution for taming the log beast in Cloud Foundry environments. By enabling targeted log filtering, transformation, and routing, SCDF helps organizations extract more value from their log data while keeping storage and processing costs in check.
As you explore SCDF for log processing, consider the various teams in your organization that might benefit from customized log streams. With its visual interface and scalable architecture, SCDF empowers both developers and operators to create sophisticated log processing pipelines that adapt to the evolving needs of your Cloud Foundry deployment.
So, the next time you find yourself drowning in a sea of logs, remember: SCDF might just be the lifeline you need to stay afloat and make sense of it all.
Want to dive deeper into the world of Cloud Foundry and stay up-to-date with the latest tips, tricks, and information? Don’t miss out on more content like this! Visit the Cloud Foundry Weekly channel and hit that subscribe button. It’s your ticket to a weekly dose of Cloud Foundry wisdom, delivered straight to your monitor.