General Questions

Expand all | Collapse all

Unable to perform handshake within timeout

  • 1.  Unable to perform handshake within timeout

    Posted 20 days ago
    We just moved from Apache Ignite 2.7.6 to GridGain 2.7.7
    and now receive 10000+ of errors like
    2019-11-14 23:45:00.721 WARN [grid-timeout-worker-#23%test-imac.accorto.com%] ClientListenerNioListener.warning: Unable to perform handshake within timeout [timeout=10000, remoteAddr=/127.0.0.1:54164]

    2019-11-14 23:45:00.866 WARN [grid-timeout-worker-#23%test-imac.accorto.com%] ClientListenerNioListener.warning: Unable to perform handshake within timeout [timeout=10000, remoteAddr=/127.0.0.1:54166]

    2019-11-14 23:45:00.866 WARN [grid-timeout-worker-#23%test-imac.accorto.com%] ClientListenerNioListener.warning: Unable to perform handshake within timeout [timeout=10000, remoteAddr=/127.0.0.1:54168]
    in test environments.
    We embed Ignite with our Apps within Tomcat and tests bootstrap the entire environment for every test.
    The application mainly uses the JDBC to communicate with Ignite.

    The first attempt was to increase the timeout time:
    cfg.setFailureDetectionTimeout(IgniteConfiguration.DFLT_FAILURE_DETECTION_TIMEOUT * 6); // // from 10sec to 1minClientConnectorConfiguration clientConfig = cfg.getClientConnectorConfiguration();clientConfig.setHandshakeTimeout(ClientConnectorConfiguration.DFLT_HANDSHAKE_TIMEOUT * 6); // from 10sec to 1min
    but the settings get lost when the ClientListenerNioListener is instantiated (back to 10000).
    It seems it scans addresses from 49154 to 65535 (with an error message fo each port).

    My guess is that our bootstrap code keeps the box too busy that the handshake fails.
    Surprisingly, the application works correctly (once the server is done logging the thousands of warning).

    Questions:
    (1) How do we fix this?
    (2) Is it a bug that the Configuration settings are ignored?
    (3) Where is the port range set for scanning - from what I saw, the default port range is usually 100 - not 20000+

    ------------------------------
    Jorg Janke
    CTO Accorto, Inc.

    ------------------------------


  • 2.  RE: Unable to perform handshake within timeout

    Posted 20 days ago
    Hello!

    I don't think that this is port scan. It is just a creation of outbound socket, for which a random port is assigned by operating system.

    Can you provide more log? I don't understand why nodes would connect to Client listener (for ODBC/JDBC/Thin client) and not to Communication port. Can you provide your Discovery, Communication and Connector configurations?

    Regards,

    ------------------------------
    Ilya Kasnacheev
    Community Support Specialist
    GridGain
    ------------------------------



  • 3.  RE: Unable to perform handshake within timeout

    Posted 18 days ago
      |   view attached
    Thanks Ilya,

    I attached the ignite logs (only)
    - one with level Info and another with level Debug.

    We embed Ignite, so the configuration is done in code:

    IgniteConfiguration [igniteInstanceName=test-imac.accorto.com, pubPoolSize=8, svcPoolSize=null, callbackPoolSize=8, stripedPoolSize=8, sysPoolSize=8, mgmtPoolSize=4, igfsPoolSize=8, dataStreamerPoolSize=8, utilityCachePoolSize=8, utilityCacheKeepAliveTime=60000, p2pPoolSize=2, qryPoolSize=8, igniteHome=/Users/jorg/ignite, igniteWorkDir=null, mbeanSrv=null, nodeId=null, marsh=null, marshLocJobs=false, daemon=false, p2pEnabled=false, netTimeout=5000, netCompressionLevel=1, sndRetryDelay=1000, sndRetryCnt=3, metricsHistSize=10000, metricsUpdateFreq=2000, metricsExpTime=9223372036854775807, discoSpi=TcpDiscoverySpi [addrRslvr=null, sockTimeout=0, ackTimeout=0, marsh=null, reconCnt=10, reconDelay=2000, maxAckTimeout=600000, soLinger=5, forceSrvMode=false, clientReconnectDisabled=false, internalLsnr=null, skipAddrsRandomization=false], segPlc=STOP, segResolveAttempts=2, waitForSegOnStart=true, allResolversPassReq=true, segChkFreq=10000, commSpi=null, evtSpi=null, colSpi=null, deploySpi=null, indexingSpi=null, addrRslvr=null, encryptionSpi=null, clientMode=false, rebalanceThreadPoolSize=1, rebalanceTimeout=10000, rebalanceBatchesPrefetchCnt=2, rebalanceThrottle=0, rebalanceBatchSize=524288, txCfg=TransactionConfiguration [txSerEnabled=false, dfltIsolation=REPEATABLE_READ, dfltConcurrency=PESSIMISTIC, dfltTxTimeout=0, txTimeoutOnPartitionMapExchange=0, deadlockTimeout=10000, pessimisticTxLogSize=0, pessimisticTxLogLinger=10000, tmLookupClsName=null, txManagerFactory=null, useJtaSync=false], cacheSanityCheckEnabled=true, discoStartupDelay=60000, deployMode=CONTINUOUS, p2pMissedCacheSize=100, locHost=null, timeSrvPortBase=31100, timeSrvPortRange=100, failureDetectionTimeout=60000, sysWorkerBlockedTimeout=null, clientFailureDetectionTimeout=30000, metricsLogFreq=1800000, hadoopCfg=null, connectorCfg=ConnectorConfiguration [jettyPath=null, host=null, port=11211, noDelay=true, directBuf=false, sndBufSize=32768, rcvBufSize=32768, idleQryCurTimeout=600000, idleQryCurCheckFreq=60000, sndQueueLimit=0, selectorCnt=4, idleTimeout=7000, sslEnabled=false, sslClientAuth=false, sslCtxFactory=null, sslFactory=null, portRange=100, threadPoolSize=8, msgInterceptor=null], odbcCfg=null, warmupClos=null, atomicCfg=AtomicConfiguration [seqReserveSize=1000, cacheMode=REPLICATED, backups=1, aff=null, grpName=null], classLdr=null, sslCtxFactory=null, platformCfg=null, binaryCfg=null, memCfg=null, pstCfg=null, dsCfg=DataStorageConfiguration [sysRegionInitSize=41943040, sysRegionMaxSize=104857600, pageSize=0, concLvl=0, dfltDataRegConf=DataRegionConfiguration [name=default, maxSize=6871947673, initSize=268435456, swapPath=null, pageEvictionMode=DISABLED, evictionThreshold=0.9, emptyPagesPoolSize=100, metricsEnabled=true, metricsSubIntervalCount=5, metricsRateTimeInterval=60000, persistenceEnabled=false, checkpointPageBufSize=0], dataRegions=null, storagePath=null, checkpointFreq=180000, lockWaitTime=10000, checkpointThreads=4, checkpointWriteOrder=SEQUENTIAL, walHistSize=20, maxWalArchiveSize=1073741824, walSegments=10, walSegmentSize=67108864, walPath=db/wal, walArchivePath=db/wal/archive, metricsEnabled=true, walMode=FSYNC, walTlbSize=131072, walBuffSize=0, walFlushFreq=2000, walFsyncDelay=1000, walRecordIterBuffSize=67108864, alwaysWriteFullPages=false, fileIOFactory=org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIOFactory@183559f9, metricsSubIntervalCnt=5, metricsRateTimeInterval=60000, walAutoArchiveAfterInactivity=-1, writeThrottlingEnabled=false, walCompactionEnabled=false, walCompactionLevel=1, checkpointReadLockTimeout=null], activeOnStart=true, autoActivation=true, longQryWarnTimeout=3000, sqlConnCfg=null, cliConnCfg=ClientConnectorConfiguration [host=null, port=10800, portRange=100, sockSndBufSize=0, sockRcvBufSize=0, tcpNoDelay=true, maxOpenCursorsPerConn=128, threadPoolSize=8, idleTimeout=0, handshakeTimeout=60000, jdbcEnabled=true, odbcEnabled=true, thinCliEnabled=true, sslEnabled=false, useIgniteSslCtxFactory=true, sslClientAuth=false, sslCtxFactory=null], mvccVacuumThreadCnt=2, mvccVacuumFreq=5000, authEnabled=false, failureHnd=NoOpFailureHandler [super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], commFailureRslvr=null]

    So mainly No Persistence and the NoOpFailureHandler (for debugging).
    Nevertheless, this does not run in debug mode.

    Cheers,
    Jorg

    IgniteConfiguration cfg = new IgniteConfiguration();
    cfg.setClientMode(false);
    String igniteHome = IgniteUtil.getIgniteHome();
    cfg.setIgniteHome(igniteHome);
    m_instanceName = Sys.get().getName();
    cfg.setIgniteInstanceName(m_instanceName);
    
    // https://apacheignite.readme.io/docs/advanced-security#section-authentication
    cfg.setAuthenticationEnabled(IgniteUtil.isSecurityEnabled());
    //
    cfg.setDeploymentMode(DeploymentMode.CONTINUOUS);
    cfg.setMetricsLogFrequency(IgniteConfiguration.DFLT_METRICS_LOG_FREQ * 30); // 30min from 1min
    
    // Logging
    configLog(cfg);
    
      System.setProperty(IgniteSystemProperties.IGNITE_JVM_PAUSE_DETECTOR_DISABLED, "true");
      System.setProperty(IgniteSystemProperties.IGNITE_JVM_PAUSE_DETECTOR_PRECISION, String.valueOf(DurationUtil.MS_1MIN));
      System.setProperty(IgniteSystemProperties.IGNITE_JVM_PAUSE_DETECTOR_THRESHOLD, String.valueOf(DurationUtil.MS_1HOUR));
      cfg.setFailureHandler(new NoOpFailureHandler());
      // no effect:
      cfg.setFailureDetectionTimeout(IgniteConfiguration.DFLT_FAILURE_DETECTION_TIMEOUT * 6); // // from 10sec to 1min
      ClientConnectorConfiguration clientConfig = cfg.getClientConnectorConfiguration();
      clientConfig.setHandshakeTimeout(ClientConnectorConfiguration.DFLT_HANDSHAKE_TIMEOUT * 6); // from 10sec to 1min
    
    
    // Create Durable Memory configuration.
    DataStorageConfiguration storageCfg = new DataStorageConfiguration();
    storageCfg.setWalMode(WALMode.FSYNC);
    
    // Update default data region.
    DataRegionConfiguration regionCfg = storageCfg.getDefaultDataRegionConfiguration();
    regionCfg.setMetricsEnabled(true);
    if (Environment.isOnProduction()) {
      regionCfg.setPersistenceEnabled(true); // persistence on disk!
    }
    storageCfg.setMetricsEnabled(true);
    cfg.setDataStorageConfiguration(storageCfg);
    
    // CacheConfigurations
    cfg.setCacheConfiguration(cacheCache());
    
    AtomicConfiguration atomicCfg = cfg.getAtomicConfiguration();
    atomicCfg.setCacheMode(CacheMode.REPLICATED);
    
    // TCP
    cfg.setDiscoverySpi(getTcpDiscovery());
    
    cfg.setIncludeEventTypes(
      EventType.EVT_CACHE_OBJECT_PUT, EventType.EVT_CACHE_OBJECT_READ, EventType.EVT_CACHE_OBJECT_REMOVED,
      EventType.EVT_TASK_STARTED, EventType.EVT_TASK_FINISHED, EventType.EVT_TASK_TIMEDOUT,
      EventType.EVT_TASK_SESSION_ATTR_SET, EventType.EVT_TASK_REDUCED
    );
    
    return cfg;


    ------------------------------
    Jorg Janke
    CTO Accorto, Inc.
    ------------------------------

    Attachment(s)

    zip
    ignite-info.log.zip   74K 1 version


  • 4.  RE: Unable to perform handshake within timeout

    Posted 18 days ago
      |   view attached
    ... the debug level log

    ------------------------------
    Jorg Janke
    CTO Accorto, Inc.
    ------------------------------

    Attachment(s)

    zip
    ignite-debug.log.zip   7.46MB 1 version


  • 5.  RE: Unable to perform handshake within timeout

    Posted 17 days ago
    Hello!

    It definitely seems related to client connections (JDBC). As far as I see, connections start to build up as soon as node opens client port, and time out 10 seconds later.

    My hypothesis are as follows:

    Does your JDBC-using code work at all?
    Are you sure you are not forgetting to close your JDBC connections properly after you stop using these?
    Do you have thin client authentication (login/password) turned on? If so, are you sure you supply login/password when connecting clients?
    Are you sure that no stray applications connect to ports 10800 and/or 11211 and then maintain silence?

    I have also passed this information to the team responsible for this code.

    Regards,

    ------------------------------
    Ilya Kasnacheev
    Community Support Specialist
    GridGain
    ------------------------------



  • 6.  RE: Unable to perform handshake within timeout

    Posted 17 days ago
    Hi Ilya,

    Yes, the JDBC works fine and we automatically close all the connections via try() in Java.
    We do not use Authentification nor ssl.
    The connections are via the standard Driver and we do not connect to any port directly.

    The odd thing is that the Configuration Settings for timeout are ignored.  I increased them as when the server starts up, it is very busy checking/updating the database (usually 100% cpu) - usually done in 1-2 minutes.
      cfg.setFailureDetectionTimeout(IgniteConfiguration.DFLT_FAILURE_DETECTION_TIMEOUT * 6); // // from 10sec to 1min
      ClientConnectorConfiguration clientConfig = cfg.getClientConnectorConfiguration();
      clientConfig.setHandshakeTimeout(ClientConnectorConfiguration.DFLT_HANDSHAKE_TIMEOUT * 6); // from 10sec to 1min

    It seems to be some sort of regression as Ignite 2.7.6 is working fine - just occurs in GridGain 8.7.7
    - and that "it" tries to connect via 10k+ ports


    Cheers,
    Jorg

    ------------------------------
    Jorg Janke
    CTO Accorto, Inc.
    ------------------------------



  • 7.  RE: Unable to perform handshake within timeout

    Posted 17 days ago
    Hello!

    You can use netstat -apnt on localhost (Inside docker or outside it) to see what is on the other side of all those connections. Best invoked with root since it should show the connection initiator process outright.

    We have only recently introduced this warning, it's possible that it was silent previously.

    You can surely tune this timeout, but you should be changing it on ClientConnectorConfiguration. Not on discovery or communication SPIs.

    Regards,

    ------------------------------
    Ilya Kasnacheev
    Community Support Specialist
    GridGain
    ------------------------------



  • 8.  RE: Unable to perform handshake within timeout

    Posted 17 days ago
    Hi Ilya,
    I cannot run netstat on the CI server - but it is replicated on my mac.
    I attached two netatat's
    - #1 - before - baseline
    - #2 after stating I​​gnite embedded in Tomcat
    Cheers,
    Jorg

    ------------------------------
    Jorg Janke
    CTO Accorto, Inc.
    ------------------------------

    Attachment(s)

    txt
    netstat-1.txt   48K 1 version
    txt
    netstat-2.txt   101K 1 version


  • 9.  RE: Unable to perform handshake within timeout

    Posted 16 days ago
    Hello!

    I can see the relevant connections, but for some reason there is no PID of connecting process (maybe it does not work on mac?)

    This means I can't figure out what happens here:
    tcp4 0 0 localhost.gap localhost.49510 FIN_WAIT_1
    tcp4 0 0 localhost.49510 localhost.gap ESTABLISHED

    BTW, I remember something funny. If you're on mac, do you have iTunes running? If so, can you turn iTunes off completely and then re-repeat this? I remember that it was rumored to interfere with GridGain some time.

    Otherwise, you will have to figure out who opens these connections yourself.

    Regards,

    ------------------------------
    Ilya Kasnacheev
    Community Support Specialist
    GridGain
    ------------------------------



  • 10.  RE: Unable to perform handshake within timeout

    Posted 16 days ago
    Hi Ilya,

    Yes the info it is on mac - and no ITunes, etc. running.
    It also happens on the GitLab QA servers running Linux, but I cannot get to them.
    So this is IntelliJ running Tomcat embedded Ignite on Mac.
    I found the only way to get pid on Mac is to run lsof -Pnl -- please find the details attached.
    For port 
    netstat shows now
    tcp4 0 0 10.0.1.18.49510 88.90.154.104.bc.http CLOSE_WAIT
    and lsof
    java 1032 501 85u IPv6 0xcc5b7acad6084a27 0t0 TCP 10.0.1.18:49510->104.154.90.88:80 (CLOSE_WAIT)
    seems to be a "call home" call to https://maven.gridgain.com/
    the process is:
    501 1032 936 0 8:11AM ?? 1:24.94 /Applications/IntelliJ IDEA.app/Contents/jbr/Contents/Home/bin/java -Djava.awt.headless=true -Dmaven.defaultProjectBuilder.disableGlobalModelCache=true -Xmx768m -Didea.maven.embedder.version=3.6.1 -Dmaven.ext.class.path=/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven-event-listener.jar -Dfile.encoding=UTF-8 -classpath /Applications/IntelliJ IDEA.app/Contents/lib/resources_en.jar:/Applications/IntelliJ IDEA.app/Contents/lib/log4j.jar:/Applications/IntelliJ IDEA.app/Contents/lib/util.jar:/Applications/IntelliJ IDEA.app/Contents/lib/annotations.jar:/Applications/IntelliJ IDEA.app/Contents/lib/jdom.jar:/Applications/IntelliJ IDEA.app/Contents/lib/trove4j.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/lucene-core-2.4.1.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven-server-api.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3-server-common.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3-server-lib/nexus-indexer-artifact-1.0.1.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3-server-lib/nexus-indexer-3.0.4.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3-server-lib/archetype-catalog-2.2.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3-server-lib/archetype-common-2.2.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3-server-lib/maven-dependency-tree-1.2.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3-server-impl.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven36-server-impl.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/maven-settings-builder-3.6.1.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/maven-resolver-transport-wagon-1.3.3.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/maven-compat-3.6.1.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/aopalliance-1.0.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/maven-plugin-api-3.6.1.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/plexus-cipher-1.7.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/plexus-interpolation-1.25.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/guice-4.2.1-no_aop.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/slf4j-api-1.7.25.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/error_prone_annotations-2.1.3.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/cdi-api-1.0.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/jcl-over-slf4j-1.7.25.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/maven-resolver-provider-3.6.1.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/maven-artifact-3.6.1.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/maven-resolver-spi-1.3.3.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/maven-resolver-util-1.3.3.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/wagon-http-3.3.2-shaded.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/plexus-sec-dispatcher-1.4.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/plexus-component-annotations-1.7.1.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/maven-repository-metadata-3.6.1.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/plexus-utils-3.2.0.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/commons-cli-1.4.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/commons-io-2.5.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/jansi-1.17.1.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/maven-resolver-impl-1.3.3.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/maven-embedder-3.6.1.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/maven-model-builder-3.6.1.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/checker-compat-qual-2.0.0.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/j2objc-annotations-1.1.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/org.eclipse.sisu.inject-0.3.3.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/wagon-file-3.3.2.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/maven-model-3.6.1.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/maven-settings-3.6.1.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/maven-resolver-api-1.3.3.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/maven-resolver-connector-basic-1.3.3.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/animal-sniffer-annotations-1.14.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/maven-builder-support-3.6.1.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/maven-shared-utils-3.2.1.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/org.eclipse.sisu.plexus-0.3.3.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/jsr305-3.0.2.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/maven-core-3.6.1.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/javax.inject-1.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/commons-lang3-3.8.1.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/guava-25.1-android.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/jsr250-api-1.0.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/lib/wagon-provider-api-3.3.2.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/maven/lib/maven3/boot/plexus-classworlds-2.6.0.jar org.jetbrains.idea.maven.server.RemoteMavenServer36

    The logs -1.txt are befeore and -2.txt while running

    If you like, we can set up a GoToMeeting call or so 8am to 10pm Pacific.
    I leave this open for a while:
    https://global.gotomeeting.com/join/557995877

    Cheers,
    Jorg

    ------------------------------
    Jorg Janke
    CTO Accorto, Inc.
    ------------------------------

    Attachment(s)

    txt
    lsof-1.txt   1.01MB 1 version
    txt
    lsof-2.txt   1.16MB 1 version
    txt
    netstat-2.txt   92K 1 version


  • 11.  RE: Unable to perform handshake within timeout

    Posted 16 days ago
    Hello!

    It seems that your Java process is both initiator and listener of those connections. You should check where you create JDBC connections, try debugging socket creation so that you can find out who did it.

    Regards,

    ------------------------------
    Ilya Kasnacheev
    Community Support Specialist
    GridGain
    ------------------------------



  • 12.  RE: Unable to perform handshake within timeout

    Posted 16 days ago
    Sure - it's embedded - so I start Ignite and then connect to it via JDBC:
    String jdbcUrl = "jdbc:ignite:thin://localhost";
    IgniteJdbcThinDataSource ids = new IgniteJdbcThinDataSource();

    That works fine in Ignite 2.7.6 and GridGain 8.7.7 - I just get now the 10k+ of warnings in Gridgain indicating that it tries to connect to thousands of ports.  That kills the CI (warnings and resulting log size).




    ------------------------------
    Jorg Janke
    CTO Accorto, Inc.
    ------------------------------



  • 13.  RE: Unable to perform handshake within timeout

    Posted 15 days ago
    Hello!

    If it is only CI that you are concerned about, you can just mute this messge via e.g. log4j on package level (that would be org.apache.ignite.internal.processors.odbc).

    CI do weird things sometimes, maybe there is interference between two runs or the like.

    Regards,

    ------------------------------
    Ilya Kasnacheev
    Community Support Specialist
    GridGain
    ------------------------------



  • 14.  RE: Unable to perform handshake within timeout

    Posted 15 days ago
    Hi Ilya,
    In our tests, we check if there are any errors. To mute the errors is just "looking away" - not fixing the underlying issue.
    We use logging e.g. to check if Ignite was started properly and if there are any other issues.

    I can replicate the scan of 10k ports here on Mac easily. Shouldn't it do the scan in the first place?  
    And as mentioned, it did not do that in the previous release.
    Looks to me like a regression.
    Cheers,
    Jorg

    ------------------------------
    Jorg Janke
    CTO Accorto, Inc.
    ------------------------------



  • 15.  RE: Unable to perform handshake within timeout

    Posted 14 days ago
    Hi Ilya,

    I found the issue with the ClientConnectorConfiguration and creates a separate topic with the code fix suggestion.
    https://forums.gridgain.com/community-home/...

    ... but I am stuck with the ClientListenerNioListener timeout. Looking at the code I did not find where it creates the threads to connect to all the ports from about 56000 to 65000. I don't think that that is expected behavior.

    Cheers,
    Jorg

    ------------------------------
    Jorg Janke
    CTO Accorto, Inc.
    ------------------------------



  • 16.  RE: Unable to perform handshake within timeout

    Posted 14 days ago
    Hello!

    As I have already explained, a socket connection has two sides, and both have ports:
    One is held by Client Connector, and has port number 10800. The other is used by JDBC thin driver, and it is created by operating system and assigned a random number (in range 56000 to 65000 in your case).

    Hope it helps,

    ------------------------------
    Ilya Kasnacheev
    Community Support Specialist
    GridGain
    ------------------------------



  • 17.  RE: Unable to perform handshake within timeout

    Posted 14 days ago
    Thanks Ilya,

    My understanding:
    - the main thread creates Ignite in it's own process
    - Ignite starts threads listening to ports e.g. one for JDBC on 10800
    - A user thread opens a JDBC connection on port 10800
    - the OS gets these processes to do the actual communication on some random port on 56000+ to allow parallel execution - so one port per JDBC connection (first open wins) only a few ports involved
    - when the JDBC connection is closed the port is released.
    So far all good.  So the same Java process has a bunch of threads communication via some ports.
    There should be no warnings - all working well.

    Why does the ClientListenerNioListener then tries to do some Handshakes on 10k ports?

    I tried to suppress the warnings - using Log4J2
    <Logger name="org.apache.ignite.internal.processors.odbc.ClientListenerNioListener" level="ERROR"/>

    ... but ... surprisingly does not work - it seems that it ignores the log setting and is logging via Java Util (not sure).

    Cheers,
    Jorg



    ------------------------------
    Jorg Janke
    CTO Accorto, Inc.
    ------------------------------



  • 18.  RE: Unable to perform handshake within timeout

    Posted 14 days ago
    ... it is more than likely that this is a regression as, in the previous version, that code did not exist.

    ------------------------------
    Jorg Janke
    CTO Accorto, Inc.
    ------------------------------



  • 19.  RE: Unable to perform handshake within timeout

    Posted 13 days ago
    Edited by Ilya Kasnacheev 13 days ago
    Hello!

    *EDIT* Sorry, previously I have posted to a different thread.

    We will check if there are other reports about this behavior, launch internal investigation if we see it once more.

    Regards,

    ------------------------------
    Ilya Kasnacheev
    Community Support Specialist
    GridGain
    ------------------------------



Would you attend a July Meetup?


Announcements