
上QQ阅读APP看书,第一时间看更新
Components
The following are components of Storm:
- Tuple: This is the basic data structure of Storm. It can hold multiple values and the data types of each value can be different. Storm serializes the primitive types of values by default but if you have any custom class then you must provide serializer and register it in Storm. A tuple provides very useful methods such as getInteger, getString and getLong so that the user does not need to cast the value in a tuple.
- Topology: As mentioned earlier, topology is the highest level of abstraction. It contains the flow of processing including spouts and bolts. It is a kind of graph computation. Each flow is represented in the form of a graph. So, nodes are spouts or bouts and edges are a stream grouping which connects them. The following figure showsa simple example of topology:

- Stream: The stream is the core abstraction of Storm. It is a sequence of unbounded tuples. A stream can be processed by different types of bolts, which result in a new stream. Every stream is provided an ID. If the user does not provide with an ID, then default is the default ID of the stream. You can define the ID of a stream by defining it in the OutputFieldDeclare class.
- Spout: A Spout is the source of a stream. It reads messages from sources such as Kafka, RabbitMQ, and so on as tuples and emits them in a stream. A Spout can produce more than one stream by defining the declareStream method of OutputFieldDeclare. There are two types of spouts:
- Reliable: The Spout keeps track of each tuple and replays the tuple in case of any failure
- Unreliable: The Spout does not care about the tuple once it is emitted as a stream to another bolt or spout
The following are methods of Spouts:
- Ack: This method is called when tuple is successfully processed in topology. The user should mark the tuple as processed or completed.
- Fail: This method is called when tuple is not processed successfully. The user must implement this method in such a way that the tuple should be sent for processing again in nextTuple.
- nextTuple: This method is called to get the tuple from the input source. The logic to read from the input source should be written in this method and emitted to the tuple for further processing.
- Open: This method is called only once when spout is initialized. Here, making a connection with input source or the output sink or configuring the memory cache,e configured, will ensure that it will not be repeated in the nextTuple method.
IRichSpout is the interface available in Storm to implement custom spout. All of the previous methods need to be implemented.
- Bolt: A bolt is a processing unit of Storm. All types of processing such as filtering and aggregations join the database operations. A bolt is a transformation that takes input as a stream of tuples and generates no or more streams as output. It is possible that types of values or more values in tuple might also change. A bolt can emit more than one stream by the defining declareStream method of OutputFieldDeclare. You can't subscribe to all streams at once. You must subscribe to them one by one. The following are methods of bolt:
- Execute: This method is executed for each tuple in a stream to which the bolt subscribed as an input. In this method, any processing can be defined either by transforming value or persisting values in a database. A bolts must call the ack method on the OutputCollector for every tuple they process, so that Storm knows when tuples are completed.
- Prepare: This method executes only once, when a bolt is initialized, so whatever connection or initializing of class variable can go into this method.
IRichBolt and IBasicBolt are available in Storm to implement the processing unit of Storm. The differences between the two are that IBasicBolt auto acks each tuple and provides basic filter and simple functions.