optimization - How can I speed up Perl's processing of fixed-width data? -


We have a mature body of code that loads data from files into the database; there are many file formats; They are all fixed-width areas.

Part of the code uses Perl Unpack () function to read fields from the input data in the package variable. Business logic is able to refer to these areas in 'human-readable' manner.

File readings are generated from a format description before reading the file format.

In the sketch form, the generated code looks like this:

  while ( & gt;) {# generates the generated code # Here we open 2 fields , The real code is around 200. ($ FIELDS :: transaction_date, $ FIELDS :: customer_id) = unpack q {A8 A20}; # A key place has been removed in some areas # Generated code has a line similar to an affected line. $ FIELDS :: customer_id = ~ s / ^ \ s + //; # End of Generated Code # If we apply business logic for data ... if ($ FIELDS :: transaction_date eq $ today) {push @ fields, q {some or other}; } Write on the standard format for bulk loads in the database. Join print $ FH ('|', @fields). Q {\ n} or die; }  

The code outline shows that about 35% of the time is spent on unpack and leading position bar. The rest of the time is spent in data validation and converting and writing in the output file.

It seems that there is no part of business logic that takes more than 1-2% of run time.

The question is whether we can touch somehow in some way and move a little further than the place?

Edit:

If a difference makes

ex> $ perl -v it Perl for Perl, v5.8.0

Yes substrate < This is the fastest way to remove using / code>. That is: $ FIELDS :: transaction_date = substr $ _, 0, 8; $ FIELDS :: customer_id = substr $ _, 8, 20;

is likely to be faster now, if I was writing this code, then I would not leave unpack , but if you prepare the code If you are, you can give it shot and remedy too.

See

can also be the fastest way to snatch key locations, s / ^ \ s + // .

Update: It is difficult to say anything definite without being able to run the benchmark. However, about how:

  my $ x = substr $ _, 0, 8; For those areas  

no trimming is required and

  my ($ y) = substr ($ _, 8, 20) = ~ / \ A (.? +) \ S + \ s + \ z /;  

Is that necessary to fold?


Comments

Popular posts from this blog

asp.net - Javascript/DOM Why is does my form not support submit()? -

sockets - Delphi: TTcpServer, connection reset when reading -

javascript - Classic ASP "ExecuteGlobal" statement acting differently on two servers -