Perl split - to cut up a string into pieces
PHP has the explode function, Python, Ruby and JavaScript all have split methods.
In Perl the function is called split.
Syntax of split
split REGEX, STRING will split the STRING at every match of the REGEX.
split REGEX, STRING, LIMIT where LIMIT is a positive number. This will split the the STRING at every match of the REGEX, but will stop after it found LIMIT-1 matches. So the number of elements it returns will be LIMIT or less.
split REGEX - If STRING is not given, splitting the content of $_, the default variable of Perl at every match of the REGEX.
split without any parameter will split the content of $_ using /\s+/ as REGEX.
Simple cases
split returns a list of strings:
use Data::Dumper qw(Dumper); my $str = "ab cd ef gh ij"; my @words = split / /, $str; print Dumper \@words;
The output is:
$VAR1 = [ 'ab', 'cd', 'ef', 'gh', 'ij' ];
Limit the number of parts
split can get a 3rd parameter that will limit the number of elements returned:
use Data::Dumper qw(Dumper); my $str = "ab cd ef gh ij"; my @words = split / /, $str, 2; print Dumper \@words;
The result:
$VAR1 = [ 'ab', 'cd ef gh ij' ];
Assign to scalars
Instead of assigning the result to a single array, we can also assign it to a list of scalar variables:
my $str = "root:*:0:0:System Administrator:/var/root:/bin/sh"; my ($username, $password, $uid, $gid, $real_name, $home, $shell) = split /:/, $str; print "$username\n"; print "$real_name\n";
The output is like this:
root System Administrator
Another way people often write this is the following: First they assign the results to and array, and then they copy the specific elements of the array:
my $str = "root:*:0:0:System Administrator:/var/root:/bin/sh"; my @fields = split /:/, $str; my $username = $fields[0]; my $real_name = $fields[4]; print "$username\n"; print "$real_name\n";
This is longer and I think less clear.
A slightly better way is to use an array slice:
my $str = "root:*:0:0:System Administrator:/var/root:/bin/sh"; my @fields = split /:/, $str; my ($username, $real_name) = @fields[0, 4]; print "$username\n"; print "$real_name\n";
Please note, in the array slice @fields[0, 4]; we have a leading @ and not a leading $.
If we are really only interested in the elements 0 and 4, the we could use array slice on the fly:
Slice on the fly
my $str = "root:*:0:0:System Administrator:/var/root:/bin/sh"; my ($username, $real_name) = (split /:/, $str)[0, 4]; print "$username\n"; print "$real_name\n";
Here we don't build an array, but as we put the whole expression in parentheses, we can put an index on them and fetch only elements 0 and 4 from the temporary (and invisible) array that was created for us: (split /:/, $str)[0, 4]
Split on more complex regex
The separator of split is a regex. So far in the examples we used the very simple regex / / matching a single space. We can use any regex: For example if we have strings that look like these:
fname = Foo lname = Bar email=foo@bar.com
We want to split where the = sign and disregard the spaces around it. We can use the following line:
my ($key, $value) = split /\s*=\s*/, $str
This will include any white-space character around the = sign in the part that cuts the pieces.
Split on multiple characters
For example we might have a string built up from pairs concatenated with &. The two parts of each pair is separated by =.
use Data::Dumper qw(Dumper); my $str = 'fname=Foo&lname=Bar&email=foo@bar.com'; my @words = split /[=&]/, $str; print Dumper \@words;
$VAR1 = [ 'fname', 'Foo', 'lname', 'Bar', 'email', 'foo@bar.com' ];
Of course, if we know these are key-value pairs, then we might want to assign the result to a hash instead of an array:
use Data::Dumper qw(Dumper); my $str = 'fname=Foo&lname=Bar&email=foo@bar.com'; my %user = split /[=&]/, $str; print Dumper \%user;
And the result looks much better:
$VAR1 = { 'fname' => 'Foo', 'email' => 'foo@bar.com', 'lname' => 'Bar' };
Split on empty string
Splitting on the empty string, or empty regex, if you wish is basically saying "split at every place where you find an empty string". Between every two characters there is an empty string so splitting on an empty string will return the original string cut up to individual characters:
use Data::Dumper qw(Dumper); my $str = "Hello World"; my @chars = split //, $str; print Dumper \@chars;
$VAR1 = [ 'H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd' ];
Including trailing empty fields
By default split
will exclude any fields at the end of the string that are empty. However you can pass a 3rd parameter to be -1.
If the 3rd parameter is a positive number it limits the number of fields returned. When it is -1, it instructs split to include
all the fields. Even the trailing empty fields.
examples/split_empty_trailing.pl
use strict; use warnings; use 5.010; use Data::Dumper qw(Dumper); say Dumper [split /;/, ";a;b;c"]; say Dumper [split /;/, ";a;b;c;"]; say Dumper [split /;/, ";a;b;c;;"]; say Dumper [split/;/, ";a;b;c;;", -1];
$VAR1 = [ '', 'a', 'b', 'c' ]; $VAR1 = [ '', 'a', 'b', 'c' ]; $VAR1 = [ '', 'a', 'b', 'c' ]; $VAR1 = [ '', 'a', 'b', 'c', '', '' ];
Beware of regex special characters
A common pitfall with split, especially if you use a string as the separator (split STRING, STRING) as in split ';', $line; is that even if you pass the first parameters as a string it still behaves as a regex. So for example
split '|', $line;
is the same as
split /|/, $line;
and both will split the string character by character. The right way to split on a pipe | character is to escape the special regex character:
split /\|/, $line;
Other examples
Though in the general case split is not the right tool for this job, it can be employed for reading simple CSV files. Check that article for much better ways to read a CSV or TSV file.
It is also a critical part of the example showing how to count words in a text file.
Another special case helps to retain the separator or parts of it.
Published on 2013-12-15