
High Performance ELK with Kubernetes: Part 2
Expose Logstash and Kibana to the outside world
This is part two of a two part guide to set up Elasticsearch on Kubernetes. In part one, we walked through the process of using elasticsearch-kubed to start Elasticsearch and a few related containers in Kubernetes. If you missed that part, you should go back and read it before this one.
Basic Elasticsearch is not Secure!
You should be able to access Elasticsearch and Kibana within your Kubernetes cluster at this point. However, there is a good chance you will want to be able to collect data from sources that are outside the Kubernetes cluster, or provide access to Kibana to other members of your team. That is what we are going to learn to do in part two.
The free version of Elasticsearch we are using does not provide authentication features, so it is not safe to expose to the internet. There are solutions to the problem, such as paying for authentication features from Elastic or installing the readonlyrest plugin. I have used both solutions, but we will take a third approach here by not actually exposing Elasticsearch to the internet at all.
Instead, we will put Logstash into our Kubernetes cluster and expose Logstash, which can authenticate clients to prevent unauthorized sources from inserting into Elasticsearch. We will also control access to Kibana by using oauth2_proxy to limit access to GitHub users in a particular organization or team.
DNS
You will need to be able to add DNS records for a domain in order for clients to be able to securely connect to the exposed endpoints with SSL. And if you have been following along with minikube, you should be able to apply the configurations, but ultimately Kubernetes will not be able to acquire public addresses when the time comes.
The use case for exposing Logstash is different from the use case for exposing Kibana. I’ll go over both, but if you don’t have a need for Logstash, you can skip ahead to the oauth2_proxy section read about securing access to Kibana.
Logstash with SSL verification
Create Logstash config files
Logstash can be configured to authenticate clients by requiring them to present a certificate that is signed by a specific certificate authority(CA). To do this, we are going to use Logstash’s beats input plugin and configure it with the necessary SSL options. In particular, SSL must be enabled with ssl_verify_mode
set to force_peer
.
In order to generate certificates for Logstash itself and the client with connect to it, you will need to create your own CA. The config_templates.py
script will help you do this as well as create all the other keys, certs, and Kubernetes configuration files required for this setup. Here is an snippet from the output of that script where I created a new CA called Logstash CA, and used it to sign a certificate for logstash.jswidler.com
.
After the configuration is completed, there is one certificate and one key for each of the CA, Logstash, and Logstash’s clients. It is important that the CN field for logstash.crt
be set to the address the clients will use to connect to Logstash, otherwise the clients will reject the certificate.
Start Logstash
With the Kubernetes configuration files generated in 6_logstash
, we can now get Logstash running on our cluster and exposed to the outside world.
$ kubectl apply -f 6_logstash
configmap/logstash-pipelines created
secret/logstash-tls created
service/logstash created
deployment.apps/logstash created
Logstash is the first service in the guide so far that we are not able to configure through environment variables, and it is also the first service which requires a secret in its configuration. Because of this, we have created a ConfigMap and a Secret.
Both ConfigMaps and Secrets provide data to a Pod as either an environment variable or a file. For the Logstash configuration, we are only using these concepts to mount files into the container, and not to set environment variables. You can explore the YAML files to see how this is done.
Update DNS for the Logstash address
The Service we created this time has type LoadBalancer, which will expose it to the outside world. Once the service is set up, we can find the public IP for it with kubectl
.
$ kubectl describe service/logstash | grep Ingress
LoadBalancer Ingress: 12.34.56.78
Now that we have an IP address, we can set up DNS for the address we choose earlier during the setup. How to do this will depend on your domain registrar. Once it is setup correctly, a query for the host address should return the LoadBalancer Ingress’s IP address.
$ dig logstash.my-domain.com;; QUESTION SECTION:
;logstash.my-domain.com. IN A;; ANSWER SECTION:
logstash.my-domain.com. 299 IN A 12.34.56.78
Connect Beats
Now you can start any of Elastic’s Beats and get them to submit to your Elasticsearch cluster by configuring the beats output with something like this.
output.logstash:
enabled: true
hosts: ["logstash.my-domain.com"]
ssl:
enabled: true
certificate_authorities: [".../ca.crt"]
certificate: ".../client.crt"
key: ".../client.key"
Expose Kibana through oauth2_proxy
Create the config
oauth2_proxy is a proxy which will authenticate users with OAuth2 providers such as Google, Facebook, and GitHub, and then provide certain authorized users access to services behind the proxy. Because I use GitHub as the OAuth2 provider at Udacity, that is currently the way config-templates.py
will offer to configure oauth2_proxy for you. With small changes, you would be able to switch to a different OAuth2 provider supported by the proxy.
When you set up you OAuth2 application with the provider, you will need to provide a callback URL. The URL use you should use should look something like https://<host>/oauth2/callback
.
In addition the OAuth2 configuration, you will also need a valid certificate for the DNS address you intend to use. You can configure the templates through the setup script. That step will look like this example:
The application settings for oauth2_proxy are mostly contained in 7_oauth2-proxy/oauth2-config.yml
. Your oauth2_proxy.cfg
is stored in this file, but because it is a Kubernetes Secret, the fields are base64 encoded. Your cfg might look something like below depending on how you answered the setup questions.
client_id = "oauthclientid"
client_secret = "oauthclientsecret"
cookie_name = "_ghoauth"
cookie_secret = "0123456789012345678901234567890123456789"
email_domains = [
"*",
]
github_org = "your-org"
github_team = "your-team"
provider = "github"
https_address = "0.0.0.0:4433"
tls_cert_file = "/conf/tls.crt"
tls_key_file = "/conf/tls.key"
upstreams = [
"http://kibana.default.svc.cluster.local:5601",
]
I realize the setup script is rather specific the way I use oauth2_proxy, so if you do not want to use GitHub as your OAuth2 provider, or just need to change some of the other configuration, you can substitute your own configuration, just make sure you keep the https_address
, tls_cert_file
, tls_key_file
, and upstreams
, fields the same.
Start the proxy
As before, you can create the resources Kubernetes needs by applying the directory with the YAML files.
$ kubectl apply -f 7_oauth2-proxy
secret/oauth2-config created
service/oauth2-proxy created
deployment.apps/oauth2-proxy created
Update DNS for the proxy address
This is just like we did for Logstash if, but just in case you skipped that section, the service/oauth2-proxy
resources is a LoadBalancer which will expose it to the outside world. Once the service is set up, we can find the public IP for it with kubectl
.
$ kubectl describe service/oauth2-proxy | grep Ingress
LoadBalancer Ingress: 87.65.43.21
When it’s ready, set up the DNS address for oauth2_proxy. You can once again check that things are resolving correctly with dig
.
$ dig kibana.my-domain.com;; QUESTION SECTION:
;kibana.my-domain.com. IN A;; ANSWER SECTION:
kibana.my-domain.com. 299 IN A 87.65.43.21
When it’s all set up, you should be able to go to https://kibana.my-domain.com
and be granted access to Kibana only after logging in with the OAuth2 provider. Keep in mind that anyone with access to Kibana will have full access to the Elasticsearch cluster, so you should only authorize trusted users.
Victory!
Hopefully this guide has provided the building blocks for you to start using Elasticsearch in Kubernetes exactly as you wanted. Please let me know in the the comments in you think these posts can be improved or if there is something else you want to know!
Ready to Learn More?
Check out Udacity’s full catalog
Follow Us
For more from the engineers and data scientists building Udacity, follow us here on Medium.
Interested in joining us @udacity? See our current opportunities.