What are the characteristics of Apache Storm?
- It is a speedy and secure processing system.
- It can manage huge volumes of data at tremendous speeds.
- It is open-source and a component of Apache projects.
- It aids in processing big data.
- Apache Storm is horizontally expandable and fault-tolerant.
How would one split a stream in Apache Storm?
One can use multiple streams if one's case requires that, which is not really
splitting, but we will have a lot of flexibility, we can use it for content-based
routing from a bolt for example:
Declaring the stream in the bolt:
@Override
public void declareOutputFields(final OutputFieldsDeclarer outputFieldsDeclarer) {
outputFieldsDeclarer.declareStream("stream1", new Fields("field1"));
outputFieldsDeclarer.declareStream("stream2", new Fields("field1"));
}
Emitting from the bolt on the stream:
collector.emit("stream1", new Values("field1Value"));
You listen to the correct stream through the topology
builder.setBolt("myBolt1", new MyBolt1()).shuffleGrouping("boltWithStreams", "stream1");
builder.setBolt("myBolt2", new MyBolt2()).shuffleGrouping("boltWithStreams", "stream2");
Is there an effortless approach to deploy Apache storm on a local machine (say,
Ubuntu) for evaluation?
You use the below code, the topology is submitted to the cluster through the
active nimbus node.
StormSubmitter.submitTopology("Topology_Name", conf, Topology_Object);
But if you use the below code, the topology is submitted locally in the same
machine. In this case, a new local cluster is created with nimbus, zookeepers, and
supervisors in the same machine.
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("Topology_Name", conf, Topology_Object);
What is a directed acyclic graph in Storm?
Storm is a "topology" in the form of a directed acyclic graph (DAG) with spouts
and bolts serving as the graph vertices. Edges on the graph are called streams and
forward data from one node to the next. Collectively, the topology operates as a
data alteration pipeline.
What do you mean by Nodes?
The two classes of nodes are the Master Node and Worker Node.
The Master Node administers a daemon Nimbus which allocates jobs to devices and
administers their performance.
The Worker Node operates a daemon known as Supervisor, which distributes the
responsibilities to other worker nodes and manages them as per requirement.
What are the Elements of Storm?
Storm has three crucial elements, viz., Topology, Stream, and Spout. Topology is
a network composed of Stream and Spout.
The Stream is a boundless pipeline of tuples, and Spout is the origin of the data
streams which transforms the data into the tuple of streams and forwards it to the
bolts to be processed.
What are Storm Topologies?
The philosophy for a real-time application is inside a Storm topology. A Storm
topology is comparable to MapReduce. One fundamental distinction is that a MapReduce
job ultimately concludes, whereas a topology continues endlessly (or until you kill
it, of course). A topology is a graph of spouts and bolts combined with stream
groupings.
What is the TopologyBuilder class?
java.lang.Object -> org.apache.storm.topology.TopologyBuilder
public class TopologyBuilder
extends Object
TopologyBuilder displays the Java API for defining a topology for Storm to
administer. Topologies are Thrift formations in the conclusion, but as the Thrift
API is so repetitive, TopologyBuilder facilitates generating topologies.
Template for generating and submitting a topology:
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("1", new TestWordSpout(true), 5);
builder.setSpout("2", new TestWordSpout(true), 3);
builder.setBolt("3", new TestWordCounter(), 3)
.fieldsGrouping("1", new Fields("word"))
.fieldsGrouping("2", new Fields("word"));
builder.setBolt("4", new TestGlobalCount())
.globalGrouping("1");
Map conf = new HashMap();
conf.put(Config.TOPOLOGY_WORKERS, 4);
StormSubmitter.submitTopology("mytopology", conf, builder.createTopology());
How do you Kill a topology in Storm?
storm kill topology-name [-w wait-time-secs]
Kills the topology with the name: topology-name. Storm will initially deactivate the topology's spouts for the span of the topology's message timeout to let all messages currently
processing finish processing. Storm will then shut down the workers and clean up their state. You can annul the measure of time Storm pauses between deactivation and shutdown with
the -w flag.
What transpires when Storm kills a topology?
Storm does not kill the topology instantly. Instead, it deactivates all the
spouts so they don't release any more tuples, and then Storm pauses for
Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS moments before destroying all workers. This
provides the topology sufficient time to finish the tuples it was processing while
it got destroyed.
What is the suggested approach for writing integration tests for an Apache Storm topology in Java?
You can utilize
LocalCluster for
integration testing.
You can look at some of Storm's own integration tests for inspiration
here.
Tools you want to use are the FeederSpout and FixedTupleSpout. A topology where all spouts implement the CompletableSpout interface can be run until fulfillment using the tools in
the Testing class.
Storm tests can also decide to "simulate time" which implies the Storm topology will idle till you call LocalCluster.advanceClusterTime. This can allow you to do asserts in between
bolt emits, for example.
What does the swap command do?
A proposed feature is to achieve a storm swap command that interchanges a working
topology with a brand-new one, assuring minimum downtime and no risk of both
topologies working on tuples simultaneously.
How do you monitor topologies?
The most suitable place to monitor a topology is utilizing the Storm UI. The
Storm UI gives data about errors occurring in tasks, fine-grained statistics on the
throughput, and latency performance of every element of each operating topology.
How do you rebalance the number of executors for a bolt in a running Apache Storm topology?
You continually need to have larger (or equal number of) jobs than executors. As the quantity of tasks is fixed, you need to define a larger initial number than initial
executors to be able to scale up the number of executors throughout the runtime. You can see the number of tasks, like a maximum number of executors:
#executors <= #numTasks
See
here for details.
What are Streams?
A Stream is the core concept in Storm. A stream is a boundless series of tuples
that are processed and produced in parallel in a distributed manner. We define
Streams by a schema that represents the fields in the stream's records.
What can tuples hold in Storm?
By default, tuples can include integers, longs, shorts, bytes, strings, doubles,
floats, booleans, and byte arrays. You can further specify your serializers so that
custom varieties can be utilized natively.
How do we check for the httpd.conf consistency and the errors in it?
We check the configuration file by using:
httpd -S
The command gives a description of how Storm parsed the configuration file. A careful examination of the IP addresses and servers might help in uncovering configuration errors.
What is Kryo?
Storm utilizes Kryo for serialization. Kryo is a resilient and quick
serialization library that provides minute serializations.
What are Spouts?
A spout is the origin of streams in a topology. Generally, spouts will scan tuples from an outside source and release them into the topology. Spouts can be reliable or
unreliable. A reliable spout is able to replay a tuple if it was not processed by Storm, while an unreliable spout overlooks the tuple as soon as it is emitted.
Spouts can emit more than one stream. To do so, declare multiple streams utilizing the declareStream method of
OutputFieldsDeclarer and define the stream to emit to when applying
the emit method on
SpoutOutputCollector.
The chief method on spouts is nextTuple. nextTuple either emits a distinct tuple into the topology or just returns if there are no new tuples to emit. It is important that
nextTuple does not block any spout implementation as Storm calls all the spout methods on the corresponding thread.
Other chief methods on spouts are ack and fail. These are called when Storm identifies that a tuple emitted from the spout either successfully made it through the topology or
failed to be achieved. ack and fail are only called for reliable spouts.
See
the Javadoc for more information.
What are Bolts?
All processing in topologies is done in bolts. Bolts can do everything from
filtering, aggregations, functions, talking to schemas, joins, and more.
Bolts can perform simplistic stream transmutations. Doing complicated stream
transformations usually demands multiple actions and hence added bolts.
Compare Apache Storm with Kafka.
See Also
Spring Boot Interview Questions
Apache Camel Interview Questions
Drools Interview Questions
Java 8 Interview Questions
Enterprise Service Bus- ESB Interview Questions.
JBoss Fuse Interview Questions
Top ElasticSearch frequently asked interview questions