Bash Read in File Line by Line
It's pretty easy to read the contents of a Linux text file line by line in a shell script—as long equally you deal with some subtle gotchas. Here's how to do information technology the safety way.
Files, Text, and Idioms
Each programming language has a prepare of idioms. These are the standard, no-frills ways to accomplish a set of common tasks. They're the elementary or default way to utilise one of the features of the language the developer is working with. They become office of a programmer's toolkit of mental blueprints.
Actions like reading information from files, working with loops, and swapping the values of two variables are good examples. The programmer will know at least one way to achieve their ends in a generic or vanilla fashion. Perhaps that will suffice for the requirement at hand. Or maybe they'll embellish the lawmaking to make information technology more efficient or applicable to the specific solution they are developing. But having the building-cake idiom at their fingertips is a swell starting betoken.
Knowing and agreement idioms in one linguistic communication makes it easier to pick upwardly a new programming language, too. Knowing how things are constructed in one language and looking for the equivalent—or the closest thing—in some other language is a good mode to appreciate the similarities and differences betwixt programming languages you already know and the ane y'all're learning.
Reading Lines From a File: The One-Liner
In Bash, you can utilize a while loop on the command line to read each line of text from a file and exercise something with it. Our text file is called "data.txt." It holds a list of the months of the year.
January February March . . October November December
Our simple ane-liner is:
while read line; practice echo $line; washed < data.txt
The while loop reads a line from the file, and the execution catamenia of the little programme passes to the trunk of the loop. The echo control writes the line of text in the terminal window. The read try fails when in that location are no more lines to exist read, and the loop is done.
One corking trick is the ability to redirect a file into a loop. In other programming languages, you'd need to open up the file, read from it, and close it again when you'd finished. With Bash, yous can just apply file redirection and let the shell handle all of that depression-level stuff for you.
Of course, this 1-liner isn't terribly useful. Linux already provides the cat control, which does exactly that for u.s.. We've created a long-winded manner to supplant a 3-letter command. Simply it does visibly demonstrate the principles of reading from a file.
That works well enough, upwardly to a point. Suppose we have another text file that contains the names of the months. In this file, the escape sequence for a newline character has been appended to each line. We'll call it "data2.txt."
January\n February\n March\n . . October\n November\n Dec\n
Let's utilize our one-liner on our new file.
while read line; do echo $line; done < data2.txt
The backslash escape character " \ " has been discarded. The consequence is that an "n" has been appended to each line. Fustigate is interpreting the backslash as the get-go of an escape sequence. Often, nosotros don't want Bash to interpret what it is reading. It can be more user-friendly to read a line in its entirety—backslash escape sequences and all—and cull what to parse out or supercede yourself, within your ain lawmaking.
If we desire to do whatever meaningful processing or parsing on the lines of text, nosotros'll need to apply a script.
Reading Lines From a File With a Script
Hither's our script. It'due south called "script1.sh."
#!/bin/bash Counter=0 while IFS= '' read -r LinefromFile || [[ -northward " ${LinefromFile} " ]]; do (( Counter ++ )) repeat "Accessing line $Counter : ${LinefromFile} " done < " $i " We ready a variable called Counter to zero, and then we define our while loop.
The first statement on the while line is IFS='' . IFS stands for internal field separator. It holds values that Bash uses to place word boundaries. By default, the read command strips off leading and abaft whitespace. If we want to read the lines from the file exactly equally they are, we need to set IFS to be an empty string.
We could set this one time exterior of the loop, merely like we're setting the value of Counter . But with more complex scripts—especially those with many user-divers functions in them—it is possible that IFS could be set to dissimilar values elsewhere in the script. Ensuring that IFS is fix to an empty cord each time the while loop iterates guarantees that we know what its behavior will be.
We're going to read a line of text into a variable called LinefromFile . Nosotros're using the -r (read backslash as a normal character) pick to ignore backslashes. They'll be treated just like any other character and won't receive whatsoever special handling.
At that place are two conditions that will satisfy the while loop and allow the text to be processed by the body of the loop:
-
read -r LinefromFile: When a line of text is successfully read from the file, thereadcontrol sends a success signal to thewhile, and thewhileloop passes the execution period to the torso of the loop. Notation that thereadcommand needs to come across a newline character at the end of the line of text in lodge to consider it a successful read. If the file is not a POSIX compliant text file, the last line may not include a newline graphic symbol. If thereadcommand sees the end of file marking (EOF) earlier the line is terminated by a newline, information technology volition non care for it as a successful read. If that happens, the terminal line of text will non be passed to the body of the loop and will not be candy. -
[ -n "${LinefromFile}" ]: We need to practice some extra work to handle non-POSIX compatible files. This comparing checks the text that is read from the file. If it isn't terminated with a newline graphic symbol, this comparing will all the same return success to thewhileloop. This ensures that whatsoever trailing line fragments are processed by the body of the loop.
These two clauses are separated by the OR logical operator " || " and so that ifeither clause returns success, the retrieved text is processed by the body of the loop, whether in that location is a newline character or not.
In the body of our loop, we're incrementing the Counter variable by one and using echo to send some output to the final window. The line number and the text of each line are displayed.
We can still use our redirection trick to redirect a file into a loop. In this case, we're redirecting $one, a variable that holds the name of the first command line parameter that passed to the script. Using this trick, we tin can hands pass in the proper noun of the information file that we want the script to work on.
Copy and paste the script into an editor and save it with the filename "script1.sh." Use the chmod command to brand it executable.
chmod +x script1.sh
Let'due south see what our script makes of the data2.txt text file and the backslashes contained inside it.
./script1.sh data2.txt
Every character in the line is displayed verbatim. The backslashes are not interpreted as escape characters. They're printed equally regular characters.
Passing the Line to a Function
We're even so just echoing the text to the screen. In a existent-world programming scenario, we'd likely exist about to do something more interesting with the line of text. In virtually cases, information technology is a expert programming practice to handle the further processing of the line in some other role.
Hither's how nosotros could do it. This is "script2.sh."
#!/bin/fustigate Counter=0 function process_line() { repeat "Processing line $Counter : $1 " } while IFS= '' read -r LinefromFile || [[ -northward " ${LinefromFile} " ]]; practice (( Counter ++ )) process_line " $LinefromFile " washed < " $i " We define our Counter variable as before, and so we define a office chosen process_line() . The definition of a office must announced before the function is first called in the script.
Our function is going to be passed the newly read line of text in each iteration of the while loop. Nosotros tin access that value within the office by using the $1 variable. If in that location were two variables passed to the office, we could access those values using $1 and $2 , and so on for more variables.
The while loop is mainly the same. There is only one change inside the body of the loop. The repeat line has been replaced past a call to the process_line() function. Notation that you don't need to use the "()" brackets in the name of the role when y'all are calling information technology.
The proper noun of the variable holding the line of text, LinefromFile , is wrapped in quotation marks when it is passed to the function. This caters for lines that have spaces in them. Without the quotation marks, the start word is treated as $i by the function, the second word is considered to be $2 , and so on. Using quotation marks ensures that the entire line of text is handled, altogether, every bit $1. Notation that this is not the same $one that holds the same data file passed to the script.
Because Counter has been declared in the principal body of the script and not within a function, information technology can be referenced within the process_line() function.
Copy or type the script above into an editor and relieve information technology with the filename "script2.sh." Brand it executable with chmod :
chmod +x script2.sh
At present we can run it and laissez passer in a new information file, "data3.txt." This has a listing of the months in it, and ane line with many words on it.
Jan February March . . October November \nMore text "at the cease of the line" December
Our control is:
./script2.sh data3.txt
The lines are read from the file and passed one past 1 to the process_line() office. All the lines are displayed correctly, including the odd one with the backspace, quotation marks, and multiple words in it.
Building Blocks Are Useful
There's a train of thought that says that an idiom must comprise something unique to that language. That's not a conventionalities that I subscribe to. What's of import is that it makes good use of the language, is easy to remember, and provides a reliable and robust way to implement some functionality in your code.
Source: https://www.howtogeek.com/709838/how-to-process-a-file-line-by-line-in-a-linux-bash-script/
0 Response to "Bash Read in File Line by Line"
Post a Comment