1. Parsing problem

I have some finance data that my bank insists on lumping all together
into one field. A sample of the data is below:

Visa Purchase                26DECThe Carss Park Super Carss Par
Non Stg/Bsa Atm Wdl Fee
Internet Deposit             05JAN23:11itamoney
Visa Purchase                03JANWorld Vision Of Aust Burwood E
Adi Limited28686
Visa Cash Advance            07JANEur210.00 Banco Di Brescia
Visa Purchase                18JANOptus Tv/Net Autopay Chatswood
Internet Deposit             25JAN10:05shopping
Atm Withdrawal               25JAN11:09Westpaccrlngfrd 2 O/S    Carlingfor2=
  Au
Adi Limited28686
Atm Withdrawal               27JAN07:37St.George Telopea Hair Telopea Nsw \=
Au
Atm Withdrawal -Cba           27JAN09:20Cba Atm  Uts B'Way Op   Nsw 228498 =
  Aus
Visa Cash Advance            21JANEur120.00 Banco Di Brescia
Atm Withdrawal -Cba           28JAN08:48Cba Atm  Uts B'Way Op   Nsw 228498 =
  Aus
Visa Purchase                25JANWoolworths W1122    Carlingfo
Visa Purchase                26JANColes Express Dundas Dundas
Visa Purchase                25JANVodafone            Chatswood
Internet Deposit             29JAN09:57billls....
Eftpos Purchase              29JAN10:28N R M A Mcc        North Ryde
Atm Withdrawal - O/Bank       31JAN12:34Garden Island           Sydney     =
  Au
O/Seas Cash Withdrawal Fee
Visa Purchase                27JANBilo Telopea 4106   Telopea
Non Stg/Bsa Atm Wdl Fee
Atm Withdrawal               01FEB19:25St.George Telopea Hair Telopea Nsw \=
Au
Visa Purchase                29JANIkea Homebush Bay   Rhodes
Visa Purchase                28JANWoolworths W1200    Eastwood
Visa Purchase                29JANPlatinum Communicatn Rhodes
Visa Purchase                29JANCaci Clinic         Edgecliff
Visa Purchase                31JANWorld Vision Of Aust Burwood E
Visa Purchase                28JANVodafone            Chatswood
Visa Purchase                31JANWoolworths W1200    Eastwood
Tfr Wdl Bpay Internet         03FEB19:0311174893           Integral Energy

Each line is a single field in the incoming .csv file.
Any suggestions on how to parse it? In most but not all cases, the
second 'column' starts at about the 29th element. Sometimes the date
is given, sometimes data and time, sometimes nothing.

--
MrTrick
----------

new topic     » topic index » view message » categorize

2. Re: Parsing problem

Patrick Barnes wrote:
> 
> I have some finance data that my bank insists on lumping all together
> into one field. A sample of the data is below:
> 
> Visa Purchase                26DECThe Carss Park Super Carss Par
> Non Stg/Bsa Atm Wdl Fee
> Internet Deposit             05JAN23:11itamoney
> Visa Purchase                03JANWorld Vision Of Aust Burwood E
> Adi Limited28686
> Visa Cash Advance            07JANEur210.00 Banco Di Brescia
> Visa Purchase                18JANOptus Tv/Net Autopay Chatswood
> Internet Deposit             25JAN10:05shopping
> Atm Withdrawal               25JAN11:09Westpaccrlngfrd 2 O/S    Carlingfor2=
>   Au
> Adi Limited28686
> Atm Withdrawal               27JAN07:37St.George Telopea Hair Telopea Nsw \=
> Au
> Atm Withdrawal -Cba           27JAN09:20Cba Atm  Uts B'Way Op   Nsw 228498 =
>   Aus
> Visa Cash Advance            21JANEur120.00 Banco Di Brescia
> Atm Withdrawal -Cba           28JAN08:48Cba Atm  Uts B'Way Op   Nsw 228498 =
>   Aus
> Visa Purchase                25JANWoolworths W1122    Carlingfo
> Visa Purchase                26JANColes Express Dundas Dundas
> Visa Purchase                25JANVodafone            Chatswood
> Internet Deposit             29JAN09:57billls....
> Eftpos Purchase              29JAN10:28N R M A Mcc        North Ryde
> Atm Withdrawal - O/Bank       31JAN12:34Garden Island           Sydney     =
>   Au
> O/Seas Cash Withdrawal Fee
> Visa Purchase                27JANBilo Telopea 4106   Telopea
> Non Stg/Bsa Atm Wdl Fee
> Atm Withdrawal               01FEB19:25St.George Telopea Hair Telopea Nsw \=
> Au
> Visa Purchase                29JANIkea Homebush Bay   Rhodes
> Visa Purchase                28JANWoolworths W1200    Eastwood
> Visa Purchase                29JANPlatinum Communicatn Rhodes
> Visa Purchase                29JANCaci Clinic         Edgecliff
> Visa Purchase                31JANWorld Vision Of Aust Burwood E
> Visa Purchase                28JANVodafone            Chatswood
> Visa Purchase                31JANWoolworths W1200    Eastwood
> Tfr Wdl Bpay Internet         03FEB19:0311174893           Integral Energy
> 
> Each line is a single field in the incoming .csv file.
> Any suggestions on how to parse it? In most but not all cases, the
> second 'column' starts at about the 29th element. Sometimes the date
> is given, sometimes data and time, sometimes nothing.
> 
> --
> MrTrick
> ----------
> 
> 

I would start of with something like:



function splitline(sequence line) integer a a=match(" ",line) return {trim(line[1..a])}&{trim(line[a..length(line)])} end function

---first pass---------- while 1 do line=gets(fn) if match(" ",line) then good=splitline(line) else good=line end if end while

------------second pass------- col=repeat({},2) col=repeat(col,length(good)) for x=1 to length(good) do if length(good)=2 then col[1]=good[1] col[2]=good[2] else col[1]=good[1] col[2]={} end if end for

from there you could put in match("JAN",col[2])

or

for x=1 to length(SHORTMONTHS) dosee my moredates if match(SHORTMONS[x],col[2]) then a=match(SHORTMONTHS[x],col[2] month=col[2][1..3] day=trim(col[2][4..5]) end if end for etc. etc.. you could cut out any information you wanted in this fashsion.

I din't do everything exactly properly but you should get the idea.

Don Cole, SF }}}

new topic     » goto parent     » topic index » view message » categorize

3. Re: Parsing problem

Patrick Barnes wrote:
> 
> I have some finance data that my bank insists on lumping all together
> into one field. A sample of the data is below:
> 
> Visa Purchase                26DECThe Carss Park Super Carss Par
> Non Stg/Bsa Atm Wdl Fee
> Internet Deposit             05JAN23:11itamoney
> Visa Purchase                03JANWorld Vision Of Aust Burwood E
> Adi Limited28686
> ... 
> Each line is a single field in the incoming .csv file.

That's horrible. You should sue them. :D

> Any suggestions on how to parse it?

It looks like the first field is standard width (though you said that varies),
I've had to write code to parse inconsistent files before. It's not that
difficult as long as you can get a handle on the exceptions. I'd start by
splitting the long space between the 1st and 2nd fields. Then I'd search for
the times and dates (use the colon) and split that up. Depending on how many
lines you have, you might want to just do it manually. Ouch.

-=ck
"Programming in a state of EUPHORIA."
http://www.cklester.com/euphoria/

new topic     » goto parent     » topic index » view message » categorize

4. Re: Parsing problem

Patrick Barnes wrote:
> 
> I have some finance data that my bank insists on lumping all together
> into one field. A sample of the data is below:
> 
> Visa Purchase                26DECThe Carss Park Super Carss Par
> Non Stg/Bsa Atm Wdl Fee
> Internet Deposit             05JAN23:11itamoney
<SNIP>
> Visa Purchase                31JANWorld Vision Of Aust Burwood E
> Visa Purchase                28JANVodafone            Chatswood
> Visa Purchase                31JANWoolworths W1200    Eastwood
> Tfr Wdl Bpay Internet         03FEB19:0311174893           Integral Energy
> 
> Each line is a single field in the incoming .csv file.
> Any suggestions on how to parse it? In most but not all cases, the
> second 'column' starts at about the 29th element. Sometimes the date
> is given, sometimes data and time, sometimes nothing.
> 
> --
> MrTrick
> ----------

Hi.

It appears as if the second set of data in each line always begins with a
number, so you have a basis for separation there, as well as the tabbing or
spacing otherwise.  (Question: are the wide spaces tabs originally?  If so, that
would make parsing easy.)  Also the first part of the second set either ends with
a number, or it ends with a three-letter word for the month, so this can create a
"rule" for separation of the second set.

It would be handy if the bank would provide a copy of their rules for
generationg these lines (some techy deep in the bowels of the bank would know,
but they may not let him/her out of the cage for public communication :^D )

--Quark

new topic     » goto parent     » topic index » view message » categorize

5. Re: Parsing problem

Patrick Barnes wrote:
> 
> I have some finance data that my bank insists on lumping all together
> into one field. A sample of the data is below:
> 
> Visa Purchase                26DECThe Carss Park Super Carss Par
> Non Stg/Bsa Atm Wdl Fee
> Internet Deposit             05JAN23:11itamoney
> Visa Purchase                03JANWorld Vision Of Aust Burwood E
> Adi Limited28686
> Visa Cash Advance            07JANEur210.00 Banco Di Brescia
> Visa Purchase                18JANOptus Tv/Net Autopay Chatswood
> Internet Deposit             25JAN10:05shopping
> Atm Withdrawal               25JAN11:09Westpaccrlngfrd 2 O/S    Carlingfor2=
>   Au
> Adi Limited28686
> Atm Withdrawal               27JAN07:37St.George Telopea Hair Telopea Nsw \=
> Au
> Atm Withdrawal -Cba           27JAN09:20Cba Atm  Uts B'Way Op   Nsw 228498 =
>   Aus
> Visa Cash Advance            21JANEur120.00 Banco Di Brescia
> Atm Withdrawal -Cba           28JAN08:48Cba Atm  Uts B'Way Op   Nsw 228498 =
>   Aus
> Visa Purchase                25JANWoolworths W1122    Carlingfo
> Visa Purchase                26JANColes Express Dundas Dundas
> Visa Purchase                25JANVodafone            Chatswood
> Internet Deposit             29JAN09:57billls....
> Eftpos Purchase              29JAN10:28N R M A Mcc        North Ryde
> Atm Withdrawal - O/Bank       31JAN12:34Garden Island           Sydney     =
>   Au
> O/Seas Cash Withdrawal Fee
> Visa Purchase                27JANBilo Telopea 4106   Telopea
> Non Stg/Bsa Atm Wdl Fee
> Atm Withdrawal               01FEB19:25St.George Telopea Hair Telopea Nsw \=
> Au
> Visa Purchase                29JANIkea Homebush Bay   Rhodes
> Visa Purchase                28JANWoolworths W1200    Eastwood
> Visa Purchase                29JANPlatinum Communicatn Rhodes
> Visa Purchase                29JANCaci Clinic         Edgecliff
> Visa Purchase                31JANWorld Vision Of Aust Burwood E
> Visa Purchase                28JANVodafone            Chatswood
> Visa Purchase                31JANWoolworths W1200    Eastwood
> Tfr Wdl Bpay Internet         03FEB19:0311174893           Integral Energy
> 
> Each line is a single field in the incoming .csv file.
> Any suggestions on how to parse it? In most but not all cases, the
> second 'column' starts at about the 29th element. Sometimes the date
> is given, sometimes data and time, sometimes nothing.
> 
> --
> MrTrick
> ----------
> 
Seems like a problem for regular expressions.
For instance:

constant p_date = "([0..3][0..9](JAN|FEB|MAR|APR|MAY| <and so on smile..> )
constant p_time  = "(\d\d:\d\d)"
constant p_item = "(.+(?!" & p_date & "))"
constant p_tail = "(.*)"
RGXscan (input_line, p_item & p_date & p_time & p_tail)

then:
RGXsubstring(2)  -- returns the date, if any
RGXsubstring(1)  -- returns the part before the date
RGXsubstring(3)  -- returns the date

N.B.
  The fuctions above come from my EU-PCRE -- other regex implementations
  will be similar.
  The code is not tested and is probably wrong in some details.

new topic     » goto parent     » topic index » view message » categorize

6. Re: Parsing problem

Hi Patrick,
First thing to do,
complain to your bank.

Regards,
jacques DeschĂȘnes

new topic     » goto parent     » topic index » view message » categorize

7. Re: Parsing problem

Thank you for all your suggestions, I'll try implementing them into a parse=
r.
(And those gaps are spaces, not tabs)

--
MrTrick
---------------------------------------------------------------------------=
----------
Catapultum habeo. Nisi pecuniam omnem mihi dabis, ad
caput tuum saxum immane mittam

new topic     » goto parent     » topic index » view message » categorize

8. Re: Parsing problem

On 16 Aug 2005, at 22:26, Patrick Barnes wrote:

> 
> I have some finance data that my bank insists on lumping all together
> into one field. A sample of the data is below:
> 
> Visa Purchase                26DECThe Carss Park Super Carss Par
> Non Stg/Bsa Atm Wdl Fee
> Internet Deposit             05JAN23:11itamoney
> Visa Purchase                03JANWorld Vision Of Aust Burwood E
> Adi Limited28686
> Visa Cash Advance            07JANEur210.00 Banco Di Brescia
> Visa Purchase                18JANOptus Tv/Net Autopay Chatswood
> Internet Deposit             25JAN10:05shopping
> Atm Withdrawal               25JAN11:09Westpaccrlngfrd 2 O/S    Carlingfor2 
> Au
> Adi Limited28686 Atm Withdrawal               27JAN07:37St.George Telopea Hair
> Telopea Nsw \Au Atm Withdrawal -Cba           27JAN09:20Cba Atm  Uts B'Way Op 
>
> Nsw 228498   Aus Visa Cash Advance            21JANEur120.00 Banco Di Brescia
> Atm Withdrawal -Cba           28JAN08:48Cba Atm  Uts B'Way Op   Nsw 228498  
> Aus
> Visa Purchase                25JANWoolworths W1122    Carlingfo Visa Purchase 
>
>             26JANColes Express Dundas Dundas Visa Purchase               
> 25JANVodafone            Chatswood Internet Deposit            
> 29JAN09:57billls.... Eftpos Purchase              29JAN10:28N R M A Mcc       
> North Ryde Atm Withdrawal - O/Bank       31JAN12:34Garden Island          
> Sydney       Au O/Seas Cash Withdrawal Fee Visa Purchase               
> 27JANBilo Telopea 4106   Telopea Non Stg/Bsa Atm Wdl Fee Atm Withdrawal       
>
>      01FEB19:25St.George Telopea Hair Telopea Nsw \Au Visa Purchase           
>
>   29JANIkea Homebush Bay   Rhodes Visa Purchase                28JANWoolworths
> W1200    Eastwood Visa Purchase                29JANPlatinum Communicatn
> Rhodes
> Visa Purchase                29JANCaci Clinic         Edgecliff Visa Purchase 
>
>             31JANWorld Vision Of Aust Burwood E Visa Purchase               
> 28JANVodafone            Chatswood Visa Purchase               
> 31JANWoolworths
> W1200    Eastwood Tfr Wdl Bpay Internet         03FEB19:0311174893          
> Integral Energy

Could you send me the file without email linewraps? Or however it's done to 
post a .zip to the euforum? Or something else?

> Each line is a single field in the incoming .csv file.
> Any suggestions on how to parse it? In most but not all cases, the
> second 'column' starts at about the 29th element. Sometimes the date
> is given, sometimes data and time, sometimes nothing.

I have an idea that's easy with strtok v3 (not uploaded yet, not finished
either)
parses(), if i understand what's there. Can you provide any more info on 
what's on each line?

Kat

new topic     » goto parent     » topic index » view message » categorize

9. Re: Parsing problem

On 8/16/05, Kat <gertie at visionsix.com> wrote:
> Could you send me the file without email linewraps? Or however it's done =
to
> post a .zip to the euforum? Or something else?

No. This is my bank transaction history. blink

> I have an idea that's easy with strtok v3 (not uploaded yet, not finished=
 either)
> parses(), if i understand what's there. Can you provide any more info on
> what's on each line?

Thanks, but that's fine now, I've used the suggestions in the forum,
an 'explode', 'trim', and a func to split at 2-or-more-space
boundaries... and it works nicely.

--
MrTrick
----------

new topic     » goto parent     » topic index » view message » categorize

10. Re: Parsing problem

On 17 Aug 2005, at 21:25, Patrick Barnes wrote:

> 
> On 8/16/05, Kat <gertie at visionsix.com> wrote:
> > Could you send me the file without email linewraps? Or however it's done to
> > post a .zip to the euforum? Or something else?
> 
> No. This is my bank transaction history. blink

I meant only what you posted to euforum anyhow, nothing additional.

> > I have an idea that's easy with strtok v3 (not uploaded yet, not finished
> > either) parses(), if i understand what's there. Can you provide any more
> > info
> > on what's on each line?
> 
> Thanks, but that's fine now, I've used the suggestions in the forum,
> an 'explode', 'trim', and a func to split at 2-or-more-space
> boundaries... and it works nicely.

Ok, no pressing need for strtok v3 noted.

Kat

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu