GridGain Community Edition

Expand all | Collapse all

GridGain thick client handshake fail on Kubernetes

  • 1.  GridGain thick client handshake fail on Kubernetes

    Posted 07-30-2020 09:04 AM
    Hi,

    I am trying to set up GridGain community edition on Kubernetes, which existing Spark application running on physical machines. When Spark application tries to connect to GridGain server on Kubernetes, it uses Thick Client protocol & fails with Handshake failed error. Based on logs, I do see that the GridGain client within spark application is able to discover the GridGain servers running on Kubernetes.

    Any ideas on what I am missing?

    ------------------------------
    Mandar Joshi
    ------------------------------


  • 2.  RE: GridGain thick client handshake fail on Kubernetes

    Posted 08-04-2020 03:36 AM
    Hello!

    It is hard to say what happens here. Thick Clients usually do not work with K8S cluster, which we have recently mitigated with "communication via discovery" mechanism. However, it did not cause "Handshake failed" even when it did not work. Can you provide logs from nodes?

    Regards,


    ------------------------------
    Ilya Kasnacheev
    Community Support Specialist
    GridGain
    ------------------------------



  • 3.  RE: GridGain thick client handshake fail on Kubernetes

    Posted 08-04-2020 03:47 AM
    Please try the following setting: https://www.gridgain.com/docs/latest/developers-guide/clustering/running-client-nodes-behind-nat

    ------------------------------
    Ilya Kasnacheev
    Community Support Specialist
    GridGain
    ------------------------------



  • 4.  RE: GridGain thick client handshake fail on Kubernetes

    Posted 08-04-2020 07:23 AM
    Hi Ilya,

    Thanks for getting back. Below are the requested logs. Also just to add, in our K8 setup, we are only allowed to connect via SSL, through Contour HTTPProxy, Also can you please elaborate more on "communication via discovery" mechanism?

    Client (Outside K8) Logs:
    org.apache.ignite.IgniteCheckedException: Failed to connect to node (is node still alive?). Make sure that each ComputeTask and cache Transaction has a timeout set in order to prevent parties from waiting forever in case of network issues [nodeId=df929f12-0d4e-4b66-a807-3bbe865b2d41, addrs=[/1........]]
    at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3680) [ignite-core-2.8.0.jar:2.8.0]
    at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3443) [ignite-core-2.8.0.jar:2.8.0]
    at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createCommunicationClient(TcpCommunicationSpi.java:3183) [ignite-core-2.8.0.jar:2.8.0]
    at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3066) [ignite-core-2.8.0.jar:2.8.0]
    at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2906) [ignite-core-2.8.0.jar:2.8.0]

    Server (Running in K8) Logs:

    2020-07-30 15:49:14.112  INFO 1 --- [ange-worker-#39] .c.d.d.p.GridDhtPartitionsExchangeFuture : Completed partition exchange [localNode=c32a0534-b0e1-4644-baab-5691d024b9c8, exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=17, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode [id=da11633e-a8e4-4d74-b71c-a4e5c3d2ce43, consistentId=da11633e-a8e4-4d74-b71c-a4e5c3d2ce43, addrs=ArrayList [....], discPort=0, order=17, intOrder=10, lastExchangeTime=1596124153996, loc=false, ver=2.8.0#20200226-sha1:341b01df, isClient=true], done=true, newCrdFut=null], topVer=AffinityTopologyVersion [topVer=17, minorTopVer=0]]
    ......

    2020-07-30 15:50:08.568  INFO 1 --- [vent-worker-#38] o.a.i.i.m.d.GridDiscoveryManager         : Topology snapshot [ver=19, locNode=c32a0534, servers=2, clients=1, state=ACTIVE, CPUs=10, offheap=320.0GB, heap=34.0GB]

    ..

    2020-07-30 15:51:11.043  WARN 1 --- [vent-worker-#38] o.a.i.i.m.d.GridDiscoveryManager         : Node FAILED: TcpDiscoveryNode [id=a8d650a0-96c6-4ee6-9ae5-950648952cf3, consistentId=a8d650a0-96c6-4ee6-9ae5-950648952cf3, addrs=ArrayList [......, sockAddrs=HashSet [......], discPort=0, order=19, intOrder=11, lastExchangeTime=1596124208551, loc=false, ver=2.8.0#20200226-sha1:341b01df, isClient=true]



    ------------------------------
    Mandar Joshi
    Software Developer
    J
    ------------------------------



  • 5.  RE: GridGain thick client handshake fail on Kubernetes

    Posted 08-04-2020 07:26 AM

    Hello!

    GridGain has its own inter-node SSL implementation:

    https://www.gridgain.com/docs/latest/administrators-guide/security/ssl-tls

    GridGain does not expect its communications to be MITMed and it would likely lead to errors such as cited above.

    Regards,



    ------------------------------
    Ilya Kasnacheev
    Community Support Specialist
    GridGain
    ------------------------------



  • 6.  RE: GridGain thick client handshake fail on Kubernetes

    Posted 08-04-2020 07:31 AM
    Hi Ilya,

    Thanks for the clarification. Can you please share some documentation on "communication via discovery" mechanism?

    ------------------------------
    Mandar Joshi
    Software Developer
    J
    ------------------------------



  • 7.  RE: GridGain thick client handshake fail on Kubernetes

    Posted 08-04-2020 07:33 AM
    Hello!

    Here is the link: Running Client Nodes Behind NAT
    Gridgain remove preview
    Running Client Nodes Behind NAT
    If your client nodes are deployed behind a NAT, the server nodes won't be able to establish connection with the clients because of the limitations of the communication protocol. This includes deployment cases when client nodes are running in virtual environments (like Kubernetes) and the server nodes are deployed elsewhere.
    View this on Gridgain >
    https://www.gridgain.com/docs/latest/developers-guide/clustering/running-client-nodes-behind-nat

    Regards,

    ------------------------------
    Ilya Kasnacheev
    Community Support Specialist
    GridGain
    ------------------------------