Posts

Deduping Messages in Kafka Streams the Right Way

Most examples I found out in the wild of how to deduplicate identical or unchanged messages, I’ve recently discovered, do it the wrong way. By wrong way I mean they used a groupByKey & aggregate  to compare previous/current values and then filter  out the unchanged values. This seemed like a creative way of leveraging the DSL functionality. The problem with this method, is because this wasn’t the intent of the Kafka Streams DSL’s groupBy & aggregate (which is performing aggregations), various features need to be turned off or worked around to prevent the ‘changed’ events being lost by the DSL’s optimisations. The “right way” is a bit subjective, so to quell the inevitable quibbles, let’s settle for the “deduping messages in Kafka Streams: not the wrong way”. The streams DSL, performs optimisations on aggregations by caching the output before sending downstream and also caching it before writing to the state store and the changelog topic it’s persisted to. Ef...

Running Jenkins on ECS

Image
Running Jenkins using ECS tasks to run worker nodes has been documented before , however there aren’t any up to date examples, nor  provide separation of the master and salves. This post is fairly up to date deployment using the newer deployment techniques offered by AWS. Even if your not using Jenkins, being able to create CloudFormation templates using the new EC2 launch options is helpful if you’re using many spot instances, as you will most like be experiencing instance type availability fluctuations. Key features of this deployment Both worker nodes and master node run on ECS: the master as a service and slaves as dynamically added tasks The master node runs on it’s own dedicated cluster; it's file system store is only mounted on and accessible by the master The job can launch and run build docker images from within an already running container The worker nodes can also spawn build agents (docker containers) using the "new" Jenkins pipeline syntax; where the ...

Use Instance Store With AWS Elastic Container Storage

Many EC2 instance types come with instance attached storage ( Instance Store ) which can provide a fast local storage that is faster than using an EBS volume. If your using the Amazon ECS-optimized AMI (Amazon Linux 1), it’s instance storage is a secondary EBS volume that is used for storing docker containers and volumes. If your launching it on an EC2 with instance store, it is ignored and only the one EBS volume is used . Update July 3, 2019: Added details for Amazon Linux 2 Amazon Linux 1: Amazon ECS-optimized AMI The Amazon Linux 1 based version of the ECS AMI uses the Device Mapper storage driver for container storage, which uses a thin-pool volume (part of LVM). Here is a simplistic cloud-init script that detects the attached SSD and NVMe SSD’s and adds them to the LVM volume group. Just launch your EC2 instance with the following user data or download the script from this gist if you’ve got a more complex init script already. Note: I have not done thorough performance testin...

Create a Private Microservice Using an Application Load Balancer

Image
Previously if you wanted to create an REST API powered by a lambda you only had one choice: API Gateway . This has a few limitations notably they’re always public so you need to use IAM or similar to lock it down and you can only use a custom domain name once globally, meaning no duplicating the implementations across multiple accounts with the same host endpoint. AWS recently announced another way to create a RESTful endpoint for Lambda’s: Application Load Balancers .

Using an async iterator on Node.js + S3

There isn't support for  async iterators ( for await...of ) in Node.js v8.9 which is AWS Lambda's runtime. It's shame, as it’s a great feature that allows you to iterate over an iterable that returns as result asynchronously, i.e. retrieving another page from a database, using a compact for loop that feels synchronous but under the covers is actually done asynchronously. Which means, if you want to use a library that written specifically to use it (e.g.  Amazon DynamoDB QueryPaginator ), you have to use an even more verbose syntax. However with a bit of re-purposing you can use a generator function that returns a Promise and if you await each promise given in the loop it will behave like an async iterator.

Auto partition secondary EBS on CentOS 7

If you add an additional blank EBS volume to a CentOS 7 EC2 instance, it won’t auto-partition in the same way as the primary volume gets resized on launch. Additionally if your baking an image you will certainly encounter problems with it not automounting on different instance types and even manually trying to mount it can give misleading error messages.

Using a S3 Hive Metastore with EMR

When configuring Hive to use EMRFS (i.e. s3://) instead of using the implied HDFS cluster storage of the metastore, which is vital if you want to have a persistent metastore that can survive clusters being destroyed and recreated; you might encounter this message: Access Denied (Service: Amazon S3; Status Code: 403;...) .

Resizing root partition on CentOS 6 in the cloud

If your unfortunate enough to still be using CentOS 6 in AWS than more than likely be stumped why the root partition isn’t resizing like most other distros you launch.

Installing the new Cloudwatch Agent for ECS

If you want to gather log files on your ECS hosts running a Amazon ECS-optimized AMI, then there’s instructions on how to install the Previous CloudWatch Logs Agent using cloud-init and some user data. The old Logs Agent is still supported but the new Unified CloudWatch Agent is recommended as it is touted to be faster but more importantly, allows easy collection of instance metrics. To install it, simply launch an EC2 instance using one of the Amazon ECS AMI ’s, and put this in the user-data section (expand the advanced configuration of the launch wizard). Once the EC2 instance is started up, just simply create an AMI from it, shutdown the EC2 instance and use this new AMI for you ECS clusters or for your Batch Compute Environments. With a bit of modification, this script could be used to install and configure it on the launch of each ECS instance, but having a pre-baked AMI is a bit cleaner. This script also increase the default docker container size to 100GB, using the ECS Age...

Jenkins build and multi-environment deploys

Image
In our team, we can't run continuous deployment into our testing environments. Like most enterprises with large/legacy back-end systems, we only have a few up-stream instances running with populated data. That results in having a finite set of test environments: qa-1, qa-2, sit-1, sit-2, etc. QA would usually be stubbed out so having one per feature branch wouldn't be a problem but for SIT, having a consistent & known environment for integration testing makes this difficult. Previously, I used Bamboo and it had the concept of Releases, Environments and dedicated Deploy Jobs. These allowed for builds to be made into releases (manually or automatically) and have those release deployed to specific environments using predefined deploy jobs, with only one release being recorded as deployed to an environment at a time. The advantage of this was it allowed our testers to easily see what was currently deployed and where, without having to dig through every job. Jenkins 2 was re...

Using web component polyfills with template tags

I've been playing around with using <template> tags  and how well they work with the current Web Component ( Custom Elements ) polyfills. My main motivation for going for Web Components instead of something like React or Angular is that I'm currently developing a chrome extension. I wanted the code base to be as small so that it didn't slow down devtools and increase the frequency of hands. Plus I think it's going to be the natural progression from the current React/Angular/etc components - especially with HTTP 2.0's server push of dependant files removing the need for tools like webpack by allow all dependant files to be automatically sent in response to one request. I immediately hit problems using custom elements in a chrome extension as they're disabled by default. So in order to use them I had to forcefully polyfill the existing API, it took a bit of fiddling  but now works with both libraries I looked at. Next, using template tags an import li...

How to chain an ES6 Promise

Node.js uses async functions extensively, as it based around non-blocking I/O. Each function takes a callback function parameter, which can result in some messy, deeply nested callback functions if you have to call a bunch of async functions in sequence. Promises make these callbacks a lot cleaner. ES6 (or ES2016)  Promise s are a great way of chaining together asynchronous functions so that they read like a series of synchronous statements. There's already some great posts on Promises, 2ality has a good intro to async then detail of the api , so I won't rehash that article. However, after starting to use them for cases more complicated than most examples, it easy to make a few mistaken assumptions or make things difficult for yourself. So here is a more complicated example showing a pattern I like to follow. In this example, I'll use the new Javascript Fetch API , which is an API that returns a Promise, allowing you to make async HTTP calls without having to muck a...

Adding MDC headers to every Spring MVC request

Mapped Diagnostic Context ( MDC ) logging allows you to set attributes associated with current thread, so that SLF4J (via your logger implementation library) can log those attributes with each logging statement without it having to be specified. For example you could configure logback to log the sessionId on every statement, which is really handy when using a log indexer such as Splunk. This would allow you to easily see all the requests made by a user, for a given session. To use with logback, you'd set the pattern to %-4r [%thread] %-5level sessionId=%X{sessionId} - %msg%n Setting these attributes for each entry point would be a pain so one way would be to implement a ServletRequestListener , which would allow setting the attributes at the start of the request and removing them again at the end of the request. ( Note: It's important to remove the attributes afterwards, as threads are re-used by application servers and will give misleading logging statements) If you...

Populating stored procs into a HSQL DB

I recently encountered a problem trying to load stored procedures into a HSQL DB used for testing. The problem was caused by the script runner provided by spring which separates each statement to be executed in a script file by a semicolon. If a stored proc has statements inside it (which most do), then the proc isn't executed as a single statement. This is further compounded by each statement executed must be understandable by JDBC. For example the following stored proc causes issues: CREATE PROCEDURE MY_PROC(IN param1 VARCHAR(30), OUT out_param VARCHAR(100)) READS SQL DATA BEGIN ATOMIC SELECT the_value INTO out_param FROM my_table WHERE field = param1; END .; This problem is solved by using the script runners provided by HSQL in the "org.hsqldb:sqltool" dependency as they parse can correctly parse the scripts containing stored procedures. Here is a Spring Boot test, using an in memory database but using HSQL's script runners: @RunWith(SpringRunner.class...

Dev Setup of a Mac

After working in my 2nd consecutive company that uses Mac's for developers and having forgot everything I used the first time, I thought I better write down all the tweaks, work arounds and config changes that got me using my mac efficiently. Remap Fn-C to copy Mac's use Mac-C instead of Ctrl-C to copy (and X, V to cut and paste). This is really annoying if you switch between a mac and Windows machine a lot. Fortunately you can use Karabiner  to map Fn-C to copy as the Fn key is located where Ctrl is on a windows keyboard. If your on OSX Sierra you'll need to use Karabiner Elements for now. Git Just run git from the command line and it will prompt you to install xcode tools. homebrew This needs to be installed first as it installs most dev tools. Note: if your behind a corporate proxy you'll need to run export HTTPS_PROXY=http://yourproxy:port first. /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/in...

Running a meteor shell on a standalone server

For those that want to connect to meteor's shell running on a standalone/self-maintained server, the standard command you use in the development environment doesn't work. Fortunately you can 'trick' it into allowing the `meteor shell` command. APPDIR=/opt/bitnami/apps/myapp export METEOR_SHELL_DIR="$APPDIR/.meteor/local/shell" # other settings # ... exec node $APPDIR/bundle/main.js cd /opt/bitnami/apps/myapp mkdir -p .meteor/local/shell echo > .meteor/packages echo 'METEOR@1.2.1\n' > .meteor/release meteor shell

Javascript (ECMAScript 5.1) Refresher

I’ve been recently working heavily on JavaScript based applications in both Node.js and a Chrome extension. After working mostly on Java, I thought I’d share the syntax, conventions and the new features of ECMAScript 5.1 and parts of the still pending version, ES6 /ECMAScript 2015, that I’ve come across. Comparisons Operators == equal to === equal value and equal type != not equal !== not equal value or not equal type I saw something else I thought was a fancy notation: if (!!something) //which is not a special operator, it just converts it to boolean, i.e a double not. (!!something) === (!(!something)) //this would only be useful if you want to set a boolean to something truthy, i.e. var a = "somevalue"; var asBoolean = !!"somevalue"; Object Initializer Something I had thought was not a fully adopted syntax, was being able to specify the getter / setter functions in the initializer. (MDN Object initializer ) var o = { func1: function (p1, p2) {}, ge...

Cygwin, access control, default groups and just getting it playing nice

correcting current and default permissions If you've been messing with your permissions on copying data across from another NTFS system, some of the owners/groups may be off and the even after correcting the owners and their permissions, any new files don't have the right defaults. This simple script replaces the ACL records for each file and directory, giving the default permissions specified. find $1 -type f -exec setfacl -f facl {} \; find $1 -type d -exec setfacl -f dacl {} \; dacl - directory permissions user::rwx group::rwx other:r-x default:user::rwx default:group::rwx default:other:r-x facl - File Permissions user::rw- group::rw- other:r-- default:user::rw- default:group::rw- default:other:r-- Specifying the default groups for users The documentation for cygwin is in depth but doesn't simply answer the question: How do I set the default group for a user? (in the out of the box configuration). The starting point is the mkpasswd utility. These are the fo...