Of course we cannot always share details about our work with customers, but nevertheless it is nice to show our technical achievements and share some of our implemented solutions.
In certain projects we use external Elasticsearch instances running in the cloud (using elastic.co cloud). However having a service running in the cloud does not mean that a higher availability can be assumed - it probably should but real life experiences show otherwise. We've had our experiences with problems/outages in the cloud and the Elasticsearch cloud makes no difference.
To suffer "less" from a failed Elasticsearch instance, we have set up two instances in two different regions (Elastic.co uses AWS in the background). The idea: Both Elasticsearch instances contain (more or less) the same data and we can balance or fail over from one to another instance. Using a load balancer (HAProxy) between the application and the two instances is self-explanatory.
However when the applications wanted to connect to Elasticsearch via the load balancer, the following error message showed up:
$ curl https://esloadbalancer.internal:9243
{"ok":false,"message":"Unknown deployment."}
This error message comes from the target Elasticsearch instance. This means the Elasticsearch connection from the application via HAProxy to the Elasticsearch instance did work. However Elastic.co's cloud instances stopped serving requests to "unknown" host names. Only requests containing the HTTP Host header with the "real" instance name (e.g. 12345678912345678912345678912345.eu-central-1.aws.cloud.es.io) are allowed. Elastic removed support for additional DNS names / CNAME's a while ago. From their FAQ:
We don’t support custom SSL certificates, which means that a custom CNAME for an Elasticsearch Service endpoint such as mycluster.mycompanyname.com also is not supported.
What happens in this case is the following: The application sends a HTTP request to HAProxy using the Host header "esloadbalancer.internal". HAProxy by default simply forwards the whole HTTP request, including the headers. The HTTP request from HAProxy to one of the Elasticsearch instances now looks like this (simplified with curl):
$ curl -H "Host: esloadbalancer.internal" https://12345678912345678912345678912345.eu-central-1.aws.cloud.es.io:9243
{"ok":false,"message":"Unknown deployment."}
Of course the Host header could be rewritten to something static using http-request set-header , for example:
backend es-https-out
http-request set-header Host 12345678912345678912345678912345.eu-central-1.aws.cloud.es.io
server ES1 12345678912345678912345678912345.eu-central-1.aws.cloud.es.io:9243 id 1 maxconn 2000 check ssl verify none
server ES2 98765432198765432198765432198765.eu-central-1.aws.cloud.es.io:9243 id 2 maxconn 2000 check ssl verify none
But of course this would only work for the first backend server (ES1) as the Host header would not match ES2 and throw the same error again. And because this HAProxy deployment should be as dynamic as possible, maybe even run with more than two backend servers, this is not a solution.
Luckily there's another possibility described in the HAProxy documentation: http-send-name-header. Without reading the documentation this probably wouldn't mean much, but it actually does exactly what we need in our case:
http-send-name-header [<header>]
Add the server name to a request. Use the header string given by <header>
server <name> <address>[:port] [settings ...]
In the config example above this would mean either "ES1" or "ES2", depending on which backend server was contacted, would be set as server name. Wouldn't work of course. But if the server name is set to the actual address of the backend server, the HTTP Host header would be correct. This results in the following configuration:
backend es-https-out
http-send-name-header Host
server 12345678912345678912345678912345.eu-central-1.aws.cloud.es.io 12345678912345678912345678912345.eu-central-1.aws.cloud.es.io:9243 id 1 maxconn 2000 check ssl verify none
server 98765432198765432198765432198765.eu-central-1.aws.cloud.es.io 98765432198765432198765432198765.eu-central-1.aws.cloud.es.io:9243 id 2 maxconn 2000 check ssl verify none
Sure, not nice to the eyes but technically correct and working!
The applications can now correctly access the target Elasticsearch instances via the internal load balancer:
$ curl https://esbalancer.internal:9243 -u user:pass
{
"name" : "instance-0000000007",
"cluster_name" : "12345678912345678912345678912345",
"cluster_uuid" : "b_XXXXXXXXXXXXXXXXXXXX",
"version" : {
"number" : "6.8.2",
"build_flavor" : "default",
"build_type" : "tar",
"build_hash" : "b506955",
"build_date" : "2019-07-24T15:24:41.545295Z",
"build_snapshot" : false,
"lucene_version" : "7.7.0",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}