General Questions

Expand all | Collapse all

Gridgrain Context using Spark - to create RDD, Dataframe

  • 1.  Gridgrain Context using Spark - to create RDD, Dataframe

    Posted 11-14-2019 09:40 PM

    Ignite context using spark connection is not establish, getting below error;

    Details of Ignite context - SCALA REPL output :

    ============================================== /spark/bin # ./spark-shell SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/spark/jars/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 19/11/13 09:58:45 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 19/11/13 09:59:03 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. 19/11/13 09:59:03 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042. 19/11/13 09:59:03 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043. Spark context Web UI available at http://spark-master-54fbb6966c-jr68h:4043 Spark context available as 'sc' (master = local[*], app id = local-1573639143236). Spark session available as 'spark'. Welcome to ____ __ / / ___ / / \ / _ / _ `/ __/ '/ // .__/_,// //_\ version 2.4.4 //

    Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_212) Type in expressions to have them evaluated. Type :help for more information.

    scala> import org.apache.ignite.spark._ import org.apache.ignite.spark._

    scala> import org.apache.ignite.configuration._ import org.apache.ignite.configuration._

    scala> val ic = new IgniteContext(sc, () => new IgniteConfiguration()) 19/11/13 10:00:43 WARN : Failed to resolve default logging config file: config/java.util.logging.properties Console logging handler is not configured. [10:00:43] __________ ________________ [10:00:43] / / _/ |/ / / _/ / [10:00:43] / // (7 7 // / / / / / [10:00:43] //_//|/_/ // /___/ [10:00:43] [10:00:43] ver. 8.7.6#20190704-sha1:6449a674 [10:00:43] 2019 Copyright(C) Apache Software Foundation [10:00:43] [10:00:43] Ignite documentation: http://gridgain.com [10:00:43] [10:00:43] Quiet mode. [10:00:43] ^-- Logging by 'JavaLogger [quiet=true, config=null]' [10:00:43] ^-- To see FULL console log here add -DIGNITE_QUIET=false or "-v" to ignite.{sh|bat} [10:00:43] [10:00:44] OS: Linux 3.10.0-957.27.4.el7.x86_64 amd64 [10:00:44] VM information: OpenJDK Runtime Environment 1.8.0_212-b04 IcedTea OpenJDK 64-Bit Server VM 25.212-b04 19/11/13 10:00:44 WARN IgniteKernal: Please set system property '-Djava.net.preferIPv4Stack=true' to avoid possible problems in mixed environments. [10:00:44] Please set system property '-Djava.net.preferIPv4Stack=true' to avoid possible problems in mixed environments. 19/11/13 10:00:44 WARN GridDiagnostic: Initial heap size is 126MB (should be no less than 512MB, use -Xms512m -Xmx512m). [10:00:44] Initial heap size is 126MB (should be no less than 512MB, use -Xms512m -Xmx512m). [10:00:47] Configured plugins: [10:00:47] ^-- GridGain 8.7.6#20190702-sha1:770681aa [10:00:47] ^-- 2019 Copyright(C) GridGain Systems [10:00:47] [10:00:47] Configured failure handler: [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]]] 19/11/13 10:00:47 WARN TcpCommunicationSpi: Message queue limit is set to 0 which may lead to potential OOMEs when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to message queues growth on sender and receiver sides. [10:00:47] Message queue limit is set to 0 which may lead to potential OOMEs when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to message queues growth on sender and receiver sides. 19/11/13 10:00:47 WARN NoopCheckpointSpi: Checkpoints are disabled (to enable configure any GridCheckpointSpi implementation) 19/11/13 10:00:48 WARN GridCollisionManager: Collision resolution is disabled (all jobs will be activated upon arrival). [10:00:48] Security status [authentication=off, tls/ssl=off] 19/11/13 10:00:49 WARN IgniteH2Indexing: Serialization of Java objects in H2 was enabled. [10:00:50] REST protocols do not start on client node. To start the protocols on client node set '-DIGNITE_REST_START_ON_CLIENT=true' system property. 19/11/13 10:00:50 WARN GridPluginProvider: Rolling updates are disabled. GridGain version update will require full cluster restart. Consider changing 'GridGainConfiguration.rollingUpdatesEnabled' configuration property. [10:00:50] Rolling updates are disabled. GridGain version update will require full cluster restart. Consider changing 'GridGainConfiguration.rollingUpdatesEnabled' configuration property. 19/11/13 10:00:50 WARN TcpDiscoveryMulticastIpFinder: TcpDiscoveryMulticastIpFinder has no pre-configured addresses (it is recommended in production to specify at least one address in TcpDiscoveryMulticastIpFinder.getAddresses() configuration property) 19/11/13 10:00:51 WARN TcpDiscoverySpi: IP finder returned empty addresses list. Please check IP finder configuration and make sure multicast works on your network. Will retry every 2000 ms. Change 'reconnectDelay' to configure the frequency of retries. [10:00:51] IP finder returned empty addresses list. Please check IP finder configuration and make sure multicast works on your network. Will retry every 2000 ms. Change 'reconnectDelay' to configure the frequency of retries.

    =============================================



    ------------------------------
    ankit gupta
    BigData consultant

    ------------------------------


  • 2.  RE: Gridgrain Context using Spark - to create RDD, Dataframe

    Posted 11-15-2019 01:27 AM
    Hello!

    > WARN TcpDiscoveryMulticastIpFinder: TcpDiscoveryMulticastIpFinder has no pre-configured addresses (it is recommended in production to specify at least one address in TcpDiscoveryMulticastIpFinder.getAddresses() configuration property)

    I think this should be pretty descriptive. You should specify some server nodes' addresses. Please note that you can specify Ignite config when configuring Spark:
    https://apacheignite-fs.readme.io/docs/ignitecontext-igniterdd
    https://apacheignite-fs.readme.io/docs/ignite-data-frame

    Regards,

    ------------------------------
    Ilya Kasnacheev
    Community Support Specialist
    GridGain
    ------------------------------



  • 3.  RE: Gridgrain Context using Spark - to create RDD, Dataframe

    Posted 11-21-2019 03:43 AM

    Hi IIya- As i'm running spark & gridgrain node in separate pods in openshift platform, 

    so which configuration while ignite context pickup by default, as when i pass the configuration using file path stored in location spark nodes... context fails with file/directory not found.

    i even tried to modify the default configuration file .can you please suggest how we can create ignite context from spark in kubernetes platform



    ------------------------------
    ankit gupta
    BigData consultant
    Fujitsu
    ------------------------------



  • 4.  RE: Gridgrain Context using Spark - to create RDD, Dataframe

    Posted 11-21-2019 07:08 AM

    Hi Ankit,

    Highly likely that you don't have the client XML file on some of the Spark nodes. Please note that you should have this file on every executor Spark node in the same path or you should use some distributed storage (e.g HDFS) path that can be reached from every spark node.

    Also, it's possible that the place where you have your client.xml isn't mounted on some of the Spark nodes. Please check it.

    BR,
    Andrei



    ------------------------------
    Andrei Alexsandrov
    Developer
    GridGain
    ------------------------------



  • 5.  RE: Gridgrain Context using Spark - to create RDD, Dataframe

    Posted 12-02-2019 07:45 PM

    Finally i was able to resolve the issue & able to create Ignite context and run spark queries on top of Dataframes /RDD .

    There were multiple issues;
    Firstly relative path of IGNITE_HOME was not setup  (path to sharedrdd.xml file)... as we manually copied the jars files to spark/jars where ignite context scripts try to resolve when we configure using sharedRDD.xml.
    Secondly, in Kubernetes env spark application was not able to identify(get) the ignite nodes (pods IPs using kuberenetesSPI protocol) for which global cluster role binding configuration need to created to get ignite nodes.



    ------------------------------
    ankit gupta
    BigData consultant
    Fujitsu
    ------------------------------



Would you attend a July Meetup?


Announcements