29 Feb 2016
I’ve been quiet for 14 days straight, the project has been going strong, even
though the “public” activity has not changed much.
Few things have changed:
- There’s another usage of the-startup-stack in the wild with an established
startup. They are using mesos, chronos and chef straight out of the startup
stack with no changes which is awesome.
- I am working on documenting what I have learned in the process into the docs
- I changed the chef bootstrapping process to only include the absolute
minimum requirements.
Few things are still a challenge
- Working with an existing stack: If you already have your VPC on Amazon and
you have something already started. the terraform configuration and
bootstrapping process are really “too much”. There’s work needed here.
- I am thinking of abstracting some of the “complex” commands into a CLI tool
that will be easier to run. something like stack new chef
and stack new
marathon
.
That’s pretty much it for the last couple of weeks. More to come soon.
Get involved
15 Feb 2016
First use of “the-startup-stack” in the wild
Well, this is exciting. There’s at least one use of the stack in the wild.
This is a super exciting checkpoint for me, since it allows me to really test
everything and I have already started making changes to reflect the feedback I
got.
Better Chef Server documentation
One of the very first feedbacks I got is that the chef server documentation
were connfusing.
Since this is the very first thing you are required to do, this is really a
problem since it sets the tone for everything else.
In order to make sure the experience is better, I changed the documentation and
the cookbooks to make it much easier.
Here’s what I’ve done
- Separate on premise and SAAS documentation into separate pages
- Change the terraform configuration template of clout-init
- Change terraform configuration for SSL certificate
- Adding “prerequisite” section to the docs, making sure it’s the top section
on the page. Harder to miss.
Get involved
11 Feb 2016
What is the-the-startup-stack and why?
What
I’ve been working on this project “in secret” for a few months now.
Hard to really say “secret” about something that has been open source on github
since day one. I guess what I mean is that there were no “announcements” around
it and I haven’t written about it at all.
Why
I feel that getting into “DevOps” is too diffucult for a lot of companises and
startup becuase there’s no “framework” that will conslolidate the tools that
are available out there.
Getting started should be a matter of hours and not days.
The ultimate goal is that you will be able to start your operations in the
cloud in 2-3 hours and grow from there.
Diving into what has changed
Code Changes
- Refactored the terraform configuration that drives everything. Modularized
the different parts, allowing you to start different services without
touching others.
- Easier bootstrapping for chef-server with the key and secret for SSL.
- Adding MongoDB to the list of Databases supported. This is done in order to
support the mean.io stack
- Initial (alpha) working version of the Rails server. (Not fully tested)
Near future plans
Near future plans are really to consolidate everything into an easily usable
stack. Making it easy to launch services, databases and more.
CI
I think CI is a major pipeline yet to be decided, there are many services out
there (CircleCI and others). However, as of now, I am reluctant to use those
and more drawn to a Jenkins based workflow.
I feel that this will allow for more control in the future, building more
complex pipelines.
DNS
DNS remains an issue. Since some of the services really need a “domain” in
order to work (at least a public one).
I wanted to go with Amazon Route53 but I am not sure what is the adoption of
this service. DNSimple is also an option but again, I don’t ant to force too
many decisions on the companies using the project.
Get involved
10 Jan 2016
One of my responsibilities as a DevOps engineer (I know it’s a weird term, if you have a better one, LMK) is to automate workflows for other engineers.
Recently, our iOS team has began doing builds using Bitrise.
Since we are using a chat room to communicate with each other, we automate everything around and inside the chat room. The server team has been using this for years now so the iOS team wanted to do the same.
Our chat room bot is based on the awesome hubot it’s name is gbot
.
The requirement was to call gbot build ios master production
Basically, you name the branch you want to build and whether you want to build for production or test environment.
In Bitrise, it’s quite easy to define workflows, so the iOS engineers have set it up.
From there, it was a matter of simple Node code.
http = require "http"
module.exports = (robot) ->
token = process.env.BITRISE_API_TOKEN
app_id = process.env.BITRISE_APP_ID
robot.respond /build ios (.*) (.*)/i, (msg) ->
branch = msg.match[1].trim()
env = msg.match[2].trim()
if env == "prod"
workflow = "PROD"
else
workflow = "TEST"
if branch == ""
msg.reply "You need to tell me which branch you me want to build. (eg: master)"
return
if env == ""
msg.reply "You need to tell me which workflow you me want to build. (eg: prod / test)"
return
params = {
hook_info: {
type: "bitrise",
api_token: token,
},
build_params: {
branch: branch
workflow_id: workflow
}
}
data="payload=#{encodeURIComponent(JSON.stringify(params))}"
msg.http("https://bitrise.io/app/#{app_id}/build/start.json")
.header("Content-Type","application/x-www-form-urlencoded")
.post(data) (err, res, body) ->
msg.reply "Building #{workflow} iOS. branch: #{branch}"
That’s it. Now the iOS team uses this many many times.
If you have questions or comments, as always, feel free to leave them here.
07 Jan 2016
Intro
Most of my days are spent writing Ruby code (Rails/Sinatra). When using Ruby, writing tests before you write code has been basically an inseparable part of my professional life for years.
Our stack at Gogobot includes a TON of javascript (require.js, Backbone, Angular, jQuery and more) so I am no stranger to JS and been writing a lot of it as well.
Writing tests for JS has always felt awkward to me if I’m honest. The tests run in a browser runner (mocha or whatever) and it’s not as flowing as writing Ruby tests.
If you read my post about my workflow My development workflow (vim+tmux+terminal+alfred) Awesomeness you can clearly see how it flows easier through the day. I just spend all of my day in the terminal, I never really even open the browser unless I need to see how something looks.
Current state of TDD in JS (outside looking in)
Yesterday I purchased Modern React with Redux course. I have to say I was kinda shocked.
I browsed through the course and I did not see a single test, there’s no lecture with the word “test” in the title.
The lecture flow is basically this:
- Write code
- Refresh browser (Well, this can obviously be hot-loaded, it’s not really the point).
- Check console for errors/typos
- Fix code
- goto 1
This is not the first time I encounter this “flow” and it always felt really weird to me, it feels slow and sluggish. Just by switching contexts to the browser every time feels out of flow.
This lead me to look deeper into the project.
When you work with Rails/Ruby you basically work a lot with gems so all the npm libraries you use in projects are comparable.
I was looking at the project at the center of the Redux lecture and started looking up some of the libraries it’s using like redux-promise
. I looked it up on Github it does have tests.
Even then, in rails, you still have tests for your code.
If you look at Rails Tutorial you can see how deeply rooted the testing “culture” is in the Ruby/rails world.
Huge part of my role is handling devops, writing chef recipes and cookbooks. With chef the test tooling is really awkward as well. You need to run vagrant, test kitchen, ssh into virtual machines and more.
If you look at open source recipes/cookbooks a lot of them don’t have tests as well. I found this to be kinda similar to what you have with JS.
Is the problems that Javascript testing is not streamlined enough? Is there not enough separation between the logic and the UI to allow real flow of test?
What do you use?
Going back to what we do at Gogobot. we use require.js
in order to modularize the code and we definitely spec out logic and rendering.
What do you use in your company? How do you test your Javascript code?
Discussion
The point of this post is not to slam anyone. Not that author of that course or any maintainer of libraries. It’s just stating how I feel about testing in the JS ecosystem.
Do you feel the same? You don’t? I would love to discuss in the comments.
Edit (01-08-2016)
patrickfatrick commented on Redit with a great looking resource Full-Stack Redux Tutorial.
06 Jan 2016
I use Jekyll a lot. From this blog to documentation, internal wikis and more.
It’s really a great tool to generate static websites and it’s a joy to use.
I use Vim for everything too. This means that I write blog posts using Vim and I want to preview what I write on the blog.
If you don’t know browser-sync you are missing on an important tool for client side development.
I have found that using browser-sync with jekyll really speeds up my workflow and makes previewing what I write much easier.
Jekyll comes with it’s own serve
command which will run a server on post 4000
usually. browser-sync support proxying to that server while watching files.
So, in order to live-preview everything I write I am running jekyll serve.
bundle exec jekyll serve --drafts
Then, I run browser-sync proxying to that server.
browser-sync start --files "css/*.css" --proxy "localhost:4000" --files "_posts/*.md" --files "_drafts/*.md" --reloadDelay "2000"
With every change to _posts
_drafts
or css
, the browser will reload and preview the changes.
It looks like this in the terminal
05 Jan 2016
Lately, I’ve been using Terraform quite a bit, both on an open source project I am working on (shhh… soon) and for Gogobot.
Since Gogobot’s infrastructure is quite complex, it’s not that easy to create a contained terraform file that will not get out of hand.
I like infrastructure to be context bound, for example: (Web context, Search Context, Monitoring context) etc…
Describing each of these “contexts” in the infrastructure is easy enough, but they all share a base context.
For example, the monitoring context has it’s own instances, security groups and connections but it needs to share the cluster
security group and the external_connections
security group.
external_connections
is a security group used to connect from the outside to services, any server that is allowed to have connections from the outside needs to have this security group.
There are of course other things like the vpc_id
and other things that all servers share.
In order to explain it best, lets look at the terraform example here.
# Specify the provider and access details
provider "aws" {
region = "${var.aws_region}"
}
# Create a VPC to launch our instances into
resource "aws_vpc" "default" {
cidr_block = "10.0.0.0/16"
}
# Create an internet gateway to give our subnet access to the outside world
resource "aws_internet_gateway" "default" {
vpc_id = "${aws_vpc.default.id}"
}
# Grant the VPC internet access on its main route table
resource "aws_route" "internet_access" {
route_table_id = "${aws_vpc.default.main_route_table_id}"
destination_cidr_block = "0.0.0.0/0"
gateway_id = "${aws_internet_gateway.default.id}"
}
# Create a subnet to launch our instances into
resource "aws_subnet" "default" {
vpc_id = "${aws_vpc.default.id}"
cidr_block = "10.0.1.0/24"
map_public_ip_on_launch = true
}
# A security group for the ELB so it is accessible via the web
resource "aws_security_group" "elb" {
name = "terraform_example_elb"
description = "Used in the terraform"
vpc_id = "${aws_vpc.default.id}"
# HTTP access from anywhere
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
# outbound internet access
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
# Our default security group to access
# the instances over SSH and HTTP
resource "aws_security_group" "default" {
name = "terraform_example"
description = "Used in the terraform"
vpc_id = "${aws_vpc.default.id}"
# SSH access from anywhere
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
# HTTP access from the VPC
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["10.0.0.0/16"]
}
# outbound internet access
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
resource "aws_elb" "web" {
name = "terraform-example-elb"
subnets = ["${aws_subnet.default.id}"]
security_groups = ["${aws_security_group.elb.id}"]
instances = ["${aws_instance.web.id}"]
listener {
instance_port = 80
instance_protocol = "http"
lb_port = 80
lb_protocol = "http"
}
}
resource "aws_key_pair" "auth" {
key_name = "${var.key_name}"
public_key = "${file(var.public_key_path)}"
}
resource "aws_instance" "web" {
# The connection block tells our provisioner how to
# communicate with the resource (instance)
connection {
# The default username for our AMI
user = "ubuntu"
# The connection will use the local SSH agent for authentication.
}
instance_type = "m1.small"
# Lookup the correct AMI based on the region
# we specified
ami = "${lookup(var.aws_amis, var.aws_region)}"
# The name of our SSH keypair we created above.
key_name = "${aws_key_pair.auth.id}"
# Our Security group to allow HTTP and SSH access
vpc_security_group_ids = ["${aws_security_group.default.id}"]
# We're going to launch into the same subnet as our ELB. In a production
# environment it's more common to have a separate private subnet for
# backend instances.
subnet_id = "${aws_subnet.default.id}"
# We run a remote provisioner on the instance after creating it.
# In this case, we just install nginx and start it. By default,
# this should be on port 80
provisioner "remote-exec" {
inline = [
"sudo apt-get -y update",
"sudo apt-get -y install nginx",
"sudo service nginx start"
]
}
}
Looking at this piece of code will look familiar if you ever saw a blog about terraform or any of the examples, but if you try to use this as a basis to describe your infrastructure you will soon find it doesn’t scale if you want to describe part of the infrastructure without really getting a full blown plan
or apply
.
But what if you now have a SOLR context and you want to launch it/change it. The promise of terraform is that you can do this all together but I found it much more convenient to modularize it.
So, lets get this started.
This is what the project directory will look like now (all files are still placeholders at this point).
├── base
│ ├── main.tf
│ ├── outputs.tf
│ └── variables.tf
└── web
├── main.tf
├── outputs.tf
└── variables.tf
We divided into two modules, base
will be the basis for our infrastructure, it will include the vpc, security groups etc, while web
will be the instance and the load balancer.
The basics of modules
Before diving into the code too much, lets understand the basics of terraform modules first.
- Modules are logical parts of your infrastructure
- Modules can have outputs
- It is better to “inject” variables into the modules during definitions. All variables better come from the root.
- You can only use the outputs of modules as inputs for other parts. For example. if a module has
aws_vpc
you can’t use it directly, you can output output vpc_id
and use it in another module/object.
The last point here is something that is not really clear at first, but we’ll clarify it more with the code.
Diving into the code
base/main.tf
# Specify the provider and access details
resource "aws_key_pair" "auth" {
key_name = "${var.key_name}"
public_key = "${file(var.public_key_path)}"
}
provider "aws" {
region = "${var.aws_region}"
}
# Create a VPC to launch our instances into
resource "aws_vpc" "default" {
cidr_block = "10.0.0.0/16"
}
# Create an internet gateway to give our subnet access to the outside world
resource "aws_internet_gateway" "default" {
vpc_id = "${aws_vpc.default.id}"
}
# Grant the VPC internet access on its main route table
resource "aws_route" "internet_access" {
route_table_id = "${aws_vpc.default.main_route_table_id}"
destination_cidr_block = "0.0.0.0/0"
gateway_id = "${aws_internet_gateway.default.id}"
}
# Create a subnet to launch our instances into
resource "aws_subnet" "default" {
vpc_id = "${aws_vpc.default.id}"
cidr_block = "10.0.1.0/24"
map_public_ip_on_launch = true
}
# Our default security group to access
# the instances over SSH and HTTP
resource "aws_security_group" "default" {
name = "terraform_example"
description = "Used in the terraform"
vpc_id = "${aws_vpc.default.id}"
# SSH access from anywhere
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
# HTTP access from the VPC
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["10.0.0.0/16"]
}
# outbound internet access
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
base/outputs.tf
output "default_vpc_id" {
value = "${aws_vpc.default.id}"
}
output "default_security_group_id" {
value = "${aws_security_group.default.id}"
}
output "default_subnet_id" {
value = "${aws_subnet.default.id}"
}
output "aws_key_pair_id" {
source "${aws_key_pair.id.auth.id}"
}
web/main.tf
module "base" {
source = "../base"
public_key_path = "${var.public_key_path}"
key_name = "${var.key_name}"
aws_region = "${var.aws_region}"
}
# A security group for the ELB so it is accessible via the web
resource "aws_security_group" "elb" {
name = "terraform_example_elb"
description = "Used in the terraform"
vpc_id = "${module.base.default_vpc_id}"
# HTTP access from anywhere
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
# outbound internet access
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
resource "aws_elb" "web" {
name = "terraform-example-elb"
subnets = ["${module.base.default_subnet_id}"]
security_groups = ["${module.base.default_security_group_id}"]
instances = ["${aws_instance.web.id}"]
listener {
instance_port = 80
instance_protocol = "http"
lb_port = 80
lb_protocol = "http"
}
}
resource "aws_instance" "web" {
# The connection block tells our provisioner how to
# communicate with the resource (instance)
connection {
# The default username for our AMI
user = "ubuntu"
# The connection will use the local SSH agent for authentication.
}
instance_type = "m1.small"
# Lookup the correct AMI based on the region
# we specified
ami = "${lookup(var.aws_amis, var.aws_region)}"
# The name of our SSH keypair we created above.
key_name = "${module.base.aws_key_pair_id}"
# Our Security group to allow HTTP and SSH access
vpc_security_group_ids = ["${module.base.default_security_group_id}"]
# We're going to launch into the same subnet as our ELB. In a production
# environment it's more common to have a separate private subnet for
# backend instances.
subnet_id = "${module.base.default_subnet_id}"
# We run a remote provisioner on the instance after creating it.
# In this case, we just install nginx and start it. By default,
# this should be on port 80
provisioner "remote-exec" {
inline = [
"sudo apt-get -y update",
"sudo apt-get -y install nginx",
"sudo service nginx start"
]
}
}
As you can see, the code in the web
context is much cleaner and really only includes the things you need for the web. The other parts of your infrastructure are really created on the base
context.
Now, you can “include” that context in other infrastrucutre contexts and expand on it.
I created a repository with the code Here
04 Jan 2016
TL;DR this is basically a brain dump of my thought process while working on some solutions for Gogobot.
Lately, I’ve been musing about a true configuration management solution.
I am not actually talking about Chef/Puppet in a broader server configuration solution but only about actually configuration files and how you distribute it to servers.
Most of our configuration files these days are managed by chef.
For example, here’s how we manage Nginx configuration through chef:
template "/etc/nginx/sites-available/gogobot.conf" do
source 'gogobot-nginx.conf.erb'
owner 'root'
group 'root'
mode '755'
variables({
socket_location: node['rails']['deploy']['socket_location'],
path: node['rails']['deploy']['path'],
web_dynamic: node['rails']['deploy']['web_dynamic'],
web_static: node['rails']['deploy']['web_static'],
web_health: node['rails']['deploy']['web_health'],
web_cdn: node['rails']['deploy']['web_cdn']
})
notifies :restart, 'service[nginx]'
end
You can clearly see that the source of this is gogobot-nginx.conf.erb
. This file is managed with chef, through version control (git) and also a more tuned version bumping mechanism called Spork.
However, if you are familiar with any of this, you know that in order to change the configuration on the server you will need to run chef-client
on the server.
If you are disciplined, you know that you need to run chef-client
on the servers periodically to find decay in the cookbooks, package versions and more.
But chef tends to be a huge hammer when really all you need is to add some configuration or change some configuration and distribute it to all the servers.
Example Use Case
In order to keep this concrete, there are a couple of examples I encountered over the last few months.
Changing YAML configuration
We work with Ruby On Rails and Ruby application for most parts. Those applications are best configured with YAML files.
We store almost everything in a YAML configuration but we don’t distribute the configuration through git, this exposes many security vulnerabilities both in secret management and other. Instead, we put the files on the servers with chef and manage the secrets securely.
One of those configuration is the way we connect to out search cluster.
Recently, I made a huge change to the search cluster, load balancing the cluster through Nginx instead of calling the cluster directly from the code.
I will not go into too much details WHY I did this, but the simple thing I needed to do was to change a YAML file on ALL the servers, restart unicorn and restart nginx to pick up the change.
Changing Nginx configuration
All of our web servers are run with Nginx, no matter if the underlying language is Ruby or Go, there’s always Nginx in the way.
In order to propagate that load balancing change across all the servers I needed to add a configuration to the Nginx site configuration and restart it.
Immutable infrastructure vs tweaking changes
The “trend” right now is to go towards immutable infrastructure. This means that when you have a change like this you want to distribute to all of the servers you need to converge new “instances” with the configuration and hot-swap the servers.
Our solutions fully supports this and we change underlying servers all the time without users noticing. However, I felt that this is an overkill in the situation.
Distributing configuration
All of this lead me to muse about configuration management and what I would like to see happening.
Distribute file to all servers
- Distributing a file securely to all of the servers with a proper secret management.
- Randomizing one of the parameters in the mix. For example if I have a collection of “servers” and I want the configuration to randomize between them while writing the file.
- Appending/Prepending/Editing existing files
- Restarting services as callbacks
Why not source control configuration with secrets distribution on ENV
Now, this is a VALID point. you can distribute the file through git with all the important stuff just hidden out in ENV variables.
For example:
database:
production:
username: <%= PRODUCTION_DB_USERNAME %>
This has a few advantages and disadvantages.
It’s simple and clean. When you need to change a configuration file you simply change and deploy.
But what about system configuration like Nginx/Unicorn/God?
What about if you are adding a new variable that doesn’t exist in the variables, you will need to again run converge on the system.
Finding the balance
Basically, I am trying to find the balance between converging a big system to new instances because of configuration change that can be summed up with a YAML file.
I am trying to think of an elegant solution that will detect a new SOLR server on the system and change the configuration of the SOLR load balancer to add that one to the ORIGIN list.
What are you using?
I have a couple of ideas on how to solve this. I am not sure all of them are valid.
So… What are you using in order to manage configuration on your application?
Hit me up in the comments. (if you mention Docker you get point penalty)
03 Jan 2016
Micro-services, everyone is talking about them. 2015 was the hot year for service oriented applications with the uprise of Docker and the other solutions around launching them.
However, many people are still too scared of making the leap into Micro-Services. I feel this is wrong and happening because of misunderstandings.
When dealing with Micro-Services there are things you need to look at like:
- Communication
- Data Storage
- Monitoring
- Deployment
- Testing
- Discovery
In my upcoming talk I will overview all of these and more.
Along with 2 more talks from highly experienced professionals, you are guarenteed to enjoy your time and educate yourself.
Come join me as I hitchhike towards Micro-Services land.
28 Dec 2015
TL;DR
Gogobot like many other websites uses Google Analytics and Mixpanel to track visits on webpages, along with stats like bounce rate, time on page etc…
Firefox recently released a Do Not Track feature on Private Browsing windows.
This created a lot of problems for us and I believe it has the potential to create problems for your website as well.
Technical details
When this feature is enabled, Firefox blocks known tracking websites such as Mixpanel and Google Analytics.
However, it does not handle what’s happening if you call any of the code in those libraries.
We use Require.js
in order to manage dependencies, here’s some sample code from the Gogobot website.
if (review_props) {
review_props['Type of review'] = 'Review';
review_props['Item ID'] = model.get("place_id");
review_props['Item name'] = "null";
review_props['New or edit'] = "null";
mixpanel.track("Write a review", review_props);
}
mixpanel
here is counting on the mixpanel library to load. However if it doesn’t load (say in the case of Firefox blocking the load) mixpanel.track
will fail and subsequently, all of your JS will crumble.
The solution
First, this firefox “block” is pretty easy to bypass. Since it’s really only counting on the domain of the tracking JS and not on the content of the file you can just proxy from your domain through Nginx.
This solution is only temporary since we are implementing the do not track and will fall back gracefully.
Checking if the Do Not Track
header is set will basically load a blank js file for Mixpanel
that implements the same interface but does nothing.
Same for Google Analytics and any other tracking you are using.
Blocking the domain is not blocking tracking
Firefox is really not communicating this correctly IMHO, blocking the domain of known trackers is not really blocking tracking. Chrome addons that claim to block tracking are doing the same thing.
Just because you block cdn.mxpanel.com
doesn’t really mean you blocked the tracking at all.
For us, respecting our users and their wishes is super important so we are implementing it but others just might leave the proxy there forever without really caring.
Check your pages
I encourage you to check your pages to make sure they work with the setting on, you might be surprised at what’s breaking.
Lesson learned
The lesson learned by this is to check whether you have the lib loaded by the
CDN. This can be blocked by many things, even user network.
It looks like this
track: function(event, properties) {
if(typeof mixpanel !== "undefined") {
event = event || "";
mixpanel.track(this.trackingTag + " " + event, properties);
}
}
Instead of calling mixpanel.track
everywhere we simply call track
which is
part of the base view javascript, this will gracefully fall back in case
mixpanel didn’t load. Same for Google Analytics.