Program word counter and data analysis

jinsatkilife

Senior Member
Joined
May 4, 2019
Messages
2,115
Reaction score
2,103
I need my program to work as follows:
There's 3 sentences and within the 3 sentences, they have words/numbers.

The sentences have structured syntax, meaning they are just repeating words with different names/numbers etc
There can be some variation of the words. For example, instead of "for $300", it can be "for $300 to $500"

The program needs to plug out the numbers and store them in csv/excel friendly format to be transferred to an excel file for data manipulation. The transfer part can be manually done

So there are multiple of 3 sentences stacked together and the program need to be able to scale. i.e. scan through hundreds of 3 sentences, plug out the data, present in csv format for data manipulation in excel

What programming languages is best suited for this? how would you implemented this?

P.S.

I'm a newbie and the major consideration is limited time/speed and ease of implementation (won't have additional features added).
I don't do programming for a living but need them to analyse stuff quick
 

davidktw

Arch-Supremacy Member
Joined
Apr 15, 2010
Messages
13,547
Reaction score
1,301
I need my program to work as follows:
There's 3 sentences and within the 3 sentences, they have words/numbers.

The sentences have structured syntax, meaning they are just repeating words with different names/numbers etc
There can be some variation of the words. For example, instead of "for $300", it can be "for $300 to $500"

The program needs to plug out the numbers and store them in csv/excel friendly format to be transferred to an excel file for data manipulation. The transfer part can be manually done

So there are multiple of 3 sentences stacked together and the program need to be able to scale. i.e. scan through hundreds of 3 sentences, plug out the data, present in csv format for data manipulation in excel

What programming languages is best suited for this? how would you implemented this?

P.S.

I'm a newbie and the major consideration is limited time/speed and ease of implementation (won't have additional features added).
I don't do programming for a living but need them to analyse stuff quick

Without touching into fuzzy logic or any form of machine learning algorithms with or without NLP, you could have chosen any programming language that is equipped with regular expression feature to easily select fragments of text that you know and can identify the patterns you are going to extract.

Personally I would have chosen Perl for this job for it is the one of the best regex implementation and very easy to write.

Scaling the technique could be as simple as using an embarrassingly parallel technique where you just need to break up your 3 sentences into chunks and send them off to be analysed in parallel by running multiple threads/processes of the same extraction function. Each processing unit will return the results which you can later consolidate them together.

:)

Perl:
"for $300", it can be "for $300 to $500"


$paragraph = 'I would like to sell this item for $300';
($for,$to) = $paragraph =~ /for\s+\$(\d+)(?:\s+to\s+\$(\d+))?/i;
print "1) FOR=$for,TO=$to\n";
$paragraph = 'I would like to sell this item for $300 to $500';
($for,$to) = $paragraph =~ /for\s+\$(\d+)(?:\s+to\s+\$(\d+))?/i;
print "2) FOR=$for,TO=$to\n";

# "for $300
# $for -> 300
# $to -> undef

# "for $300 to $500
# $for -> 300
# $to -> 500

Bash:
$perl main.pl
1) FOR=300,TO=
2) FOR=300,TO=500

If you want to learn automation, then some form of programming skill set is required. That is the advantage of knowing how to program. It is not just creating website.

:)
 
Last edited:

jinsatkilife

Senior Member
Joined
May 4, 2019
Messages
2,115
Reaction score
2,103
Without touching into fuzzy logic or any form of machine learning algorithms with or without NLP, you could have chosen any programming language that is equipped with regular expression feature to easily select fragments of text that you know and can identify the patterns you are going to extract.

Personally I would have chosen Perl for this job for it is the one of the best regex implementation and very easy to write.

Scaling the technique could be as simple as using an embarrassingly parallel technique where you just need to break up your 3 sentences into chunks and send them off to be analysed in parallel by running multiple threads/processes of the same extraction function. Each processing unit will return the results which you can later consolidate them together.

:)

Perl:
"for $300", it can be "for $300 to $500"


$paragraph = 'I would like to sell this item for $300';
($for,$to) = $paragraph =~ /for\s+\$(\d+)(?:\s+to\s+\$(\d+))?/i;
print "1) FOR=$for,TO=$to\n";
$paragraph = 'I would like to sell this item for $300 to $500';
($for,$to) = $paragraph =~ /for\s+\$(\d+)(?:\s+to\s+\$(\d+))?/i;
print "2) FOR=$for,TO=$to\n";

# "for $300
# $for -> 300
# $to -> undef

# "for $300 to $500
# $for -> 300
# $to -> 500

Bash:
$perl main.pl
1) FOR=300,TO=
2) FOR=300,TO=500

If you want to learn automation, then some form of programming skill set is required. That is the advantage of knowing how to program. It is not just creating website.

:)
If the input paragraph is inside a .txt file, how do i import it for the program to read instead of hardcoding it?

Similarly, how do I plug out the values and export it out in csv format?

having trouble with these file input and output stuff as these are not taught in regular tutorial videos
 

davidktw

Arch-Supremacy Member
Joined
Apr 15, 2010
Messages
13,547
Reaction score
1,301
If the input paragraph is inside a .txt file, how do i import it for the program to read instead of hardcoding it?

Similarly, how do I plug out the values and export it out in csv format?

having trouble with these file input and output stuff as these are not taught in regular tutorial videos

If your file is not that large and can fit into the memory.
This will be the simplest approach.

There are numerous other ways, inclusive of reading from files, but there is no need for you to hardcode your filenames
to the script. Keep it flexible and let your shell do the redirecting.

Code:
$ cat main.txt
I would like to sell this item for $300
I would like to sell this item for $300 to $500
I would like to sell this item for $200
I would like to sell this item for $200 to $400

Perl:
$/ = undef;
$_ = <>;

print "FOR,TO\n"; # header
while (/for\s+\$(\d+)(?:\s+to\s+\$(\d+))?/sig) {
  print "$1,$2\n"; # data
}

Bash:
$ perl ./main.pl < main.txt > main.csv

Code:
$ cat main.csv
FOR,TO
300,
300,500
200,
200,400

:)
 
Last edited:

peterchan75

Supremacy Member
Joined
Apr 26, 2003
Messages
6,719
Reaction score
529
Since the final destination is Excel, why not do it Excel VBA?
A sample of the input text file will be much helpful in conceptualizing the algorithm.
1. Read text file
2. Extract the numbers
3. Populate the extracted numbers in spreadsheet
 

jinsatkilife

Senior Member
Joined
May 4, 2019
Messages
2,115
Reaction score
2,103
Since the final destination is Excel, why not do it Excel VBA?
A sample of the input text file will be much helpful in conceptualizing the algorithm.
1. Read text file
2. Extract the numbers
3. Populate the extracted numbers in spreadsheet
Can you do regex in Excel? Yes, I learnt Excel VBA before.

I have included the sample input text below. I spent the week learning about regex but perl is taking up alot of time.

As you can see, what I want is the ticker (UEC), quantity (2,000), and premium (if its a range, for eg, "for 0.30 and 0.35", then pick the lower bound i.e 0.3 only if its a range)

Optional good-to-have is to include the date, time, period (December), strike (5) and [calls or puts], and stock price

I need to export them to an excel or csv file for data manipulation. Also, stuck at how to transfer stuff found from regex to individual variables/string for transfer to csv. This is the part I'm still stuck at.

UEC Bullish Call Buying
October 20, 2021 12:40 pm
2,000 December 5 calls bought in two prints at the same second for 0.30 and 0.35 above open interest of 25 contracts. Stock 3.96. Third bullish entry on Log since 9/10.

MAR Bullish Call Buying
October 20, 2021 12:36 pm
6,600 05November 155 calls bought for 3.45 to 4.50 above open interest of 1,771 contracts. Stock 152.85-154.96. EPS 11/03 before open.

HLT Bullish Call Buying
October 20, 2021 12:32 pm
5,000 November 140 calls bought for 4.50 to 4.95 above open interest of 2,212 contracts. Stock 139.90-140.46. EPS 10/27 before open.

GWH Bearish Put Buying
October 20, 2021 12:28 pm
3,000 November 12.50 puts bought for 0.95 to 1.05 above open interest of 1,476 contracts. Stock 15.84.

FTCH Bullish Call Buying
October 20, 2021 12:08 pm
12,250 November 45 calls bought in mostly one 10,000 print for 0.98 to 1.36 above open interest of 4,668 contracts. Stock 39.04-39.90. EPS estimated 11/11.

M Bearish Put Spread
October 20, 2021 1:34 pm
4,000 29October 26 puts bought for 0.57 above open interest of 304 contracts; 4,000 29October 23.50 puts sold for 0.06 above open interest of 376 contracts. Stock 26.73.

WWD Bullish Call Roll
October 20, 2021 1:25 pm
3,500 December 125 calls bought for 2.55 above open interest of 183 contracts; 4,000 November 125 calls sold for 1.35 below open interest of 4,017 contracts. Stock 115.03.

TJX Bullish Call Roll
October 20, 2021 1:20 pm
5,000 29October 65 calls bought for 0.75 above open interest of 620 contracts; 5,000 22October 64 calls sold for 0.90 below open interest of 11,231 contracts. Stock 64.69.

WEN Bullish Call Buying
October 20, 2021 1:00 pm
2,000 November 23 calls bought for 0.42 to 0.72 above open interest of 1,178 contracts. Stock 22.23-22.73. EPS 11/10 before open.

MTTR Bearish Put Buying
October 20, 2021 12:59 pm
5,000 February 12.50 puts bought at the same time for 0.50 and 0.55 above open interest of 48 contracts. Stock 19.28. EPS 11/3 after close.

GE Bullish Call Buying
October 20, 2021 12:57 pm
2,200 22October 106 calls mostly bought for 0.51 to 0.68 above open interest of 1,218 contracts. Stock 105.21-105.40.

PLBY Bullish Call Buying
October 20, 2021 12:47 pm
3,600 November 35 calls mostly bought for 0.63 to 1.55 above open interest of 2,542 contracts. Stock 27.03-29.40. EPS estimated 11/11 after close.

SPY Bearish Put Spread
October 20, 2021 12:46 pm
4,800 01November 445 puts bought for 1.36 above open interest of 1,440 contracts; 4,800 01November 440 puts sold for 0.85 above open interest of 3,722 contracts. Stock 452.53.

INDA Bearish Put Roll-up
October 20, 2021 12:41 pm
15,000 November 50 puts bought for 1.05-1.10 above open interest of 47 contracts; 13,000 December 48 puts sold for 0.75 below open interest of 23,414 contracts. Stock 50.15.
[4:31 AM]
CSCO Bullish Call Roll
October 20, 2021 2:41 pm
2,270 05November 55 calls bought for 1.50 above open interest of 703 contracts, 2,270 22October 55 calls sold for 1.19 below open interest of 3,423. Stock 56.15. EPS 11/17 after close. Follows 05November 56 call buying on Log 10/7.

BX Bullish Call Buying
October 20, 2021 2:41 pm
2,500 29October 135 calls bought in mostly two prints for 0.55 to 1.05 above open interest of 353 contracts. Stock 127.19-128.13. EPS tomorrow 10/21 before open.

MNST Bearish Put Spread
October 20, 2021 2:34 pm
2,500 November 80 puts bought for 0.89 to 0.98 above open interest of 831 contracts, 2,500 November 70 puts sold for 0.20 to 0.27 above open interest of 5. Stock 85.36. EPS estimated 11/4 after close.

IMAX Bullish Call Buying
October 20, 2021 2:22 pm
10,000 November 22 calls bought mostly in two prints (5,400 and 3,600) for 0.59 to 0.85 above open interest of 815 contracts. Stock 20.33-20.71. EPS 10/28 before open.

RIG Bullish Call Buying
October 20, 2021 2:17 pm
4,200 12November 4.50 calls bought for 0.17 to 0.19 above open interest of 780 contracts. Stock 4.05. EPS 11/1 after close.

LYV Bullish Call Roll-Down
October 20, 2021 2:15 pm
8,500 November 100 calls bought for 4.30 to 4.90 mostly in one print of 5,950 above open interest of 3,239 contracts; 6,000 November 105 calls sold for 2.45 below open interest of 6,549 contracts. Appears to roll-down trade on Log…

SHPW Bullish Call Buying
October 20, 2021 1:55 pm
7,500 November 10 calls mostly bought for 0.35 to 1.50 against no open interest. Stock 7.69-9.85. Follows November 7.50 calls on Log this morning.
[4:31 AM]
OXY Bullish Call Buying
October 20, 2021 3:43 pm
4,900 29October 35 calls mostly bought for 0.39 to 0.43 above open interest of 3,039 contracts. Separately 6,500 29October 36 calls bought for 0.22-0.27 above open interest of 3,379 contracts. Fifth bullish entry on Log since 9/27. Stock 32.88-33.38. EPS…

CAR Bearish Put Spread
October 20, 2021 3:43 pm
3,000 November 140 puts bought mostly for 5.05 above open interest of 405 contracts, 3,000 November 120 puts sold mostly for 1.80 above open interest of 2,703. Stock 161.20. EPS 11/1 after close.

FDX Bullish Call Buying
October 20, 2021 3:39 pm
7,500 29October 245 calls mostly bought for 0.42 to 0.70 above open interest of 245 contracts, 10,500 29October 247.50 calls mostly bought for 0.31 to 0.62 above open interest of 17. Stock 230.97-231.40.

FTFT Bullish Call Buying
October 20, 2021 3:31 pm
3,500 November 2.50 calls mostly bought for 0.18 to 0.40 above open interest of 2,906 contracts. Stock 2.18-2.43. EPS estimated 11/5 after close.

SANW Bullish Call Buying
October 20, 2021 3:21 pm
2,600 February 2.50 calls mostly bought itm for 0.65 to 1.35 above open interest of 82 contracts. Stock 2.83-3.75.

ETWO Bullish Call Buying
October 20, 2021 3:18 pm
2,500 April 12.50 calls bought in two prints for 1.35 above open interest of 439 contracts. Stock 11.88.

DNMR Bullish Call Buying
October 20, 2021 2:53 pm
4,000 November 17.50 calls bought for 0.25 to 1.20 above open interest of 1,031 contracts. Stock 13.72-16.13.

LOW Bullish Call Buying
October 20, 2021 2:49 pm
12,700 22October 230 calls mostly bought for 0.20 to 0.35 above open interest of 1,098 contracts. Stock 224.79-225.25.

LEN Bullish Call Buying
October 20, 2021 2:45 pm
3,000 November 109 calls bought in one print for 1.00 above open interest of 1 contracts. Stock 102.96. EPS estimated 12/15.

DIA Bearish Put Buying
October 20, 2021 4:00 pm
10,000 September 170 puts bought in one print for 1.05 above open interest of 173 contracts. Stock 355.65.

QRTEA Bullish Call Buying
October 20, 2021 3:59 pm
3,000 November 11 calls bought for 0.43 to 0.45 above open interest of 1,475 contracts. Stock 10.46.

ARKK Bearish Put Buying
October 20, 2021 3:55 pm
8,000 November 109 puts bought in two prints for 1.27-1.30 above open interest of 612 contracts. Stock 118.81-118.82.

QS Bullish Call Buying
October 20, 2021 3:49 pm
5,000 November 31 calls bought for 0.50 to 0.60 above open interest of 124 contracts. Stock 24.97-25.15. EPS 10/26 after close.
 
Last edited:

peterchan75

Supremacy Member
Joined
Apr 26, 2003
Messages
6,719
Reaction score
529
I give it a try based on what you have described. FOC of course. :D
Excel VBA regex is pretty powerful especially for web scraping e.g. Yahoo Finance
I am only a stock guy. Option is foreign to me, like a blind person touching an elephant.

1. Can I treat this "[4:30 AM]" as separator as well ?
2. Output in Excel by column : Ticker, Qty, Premium, Date, Time, Period, Strike, Option, Price
Ticker - 1st line, first character string
Qty - 3rd line, first number ??
Premium - 3rd line, 0.30 and 0.35(pick 0.30), What about 3.45 to 4.50? Pick which number ?
Date - 2nd line ?
Time - 2nd line ?
Period - 3rd line 2nd character string ?
Strike - 3rd numeric after period ?
Option - 3rd line after striker ?
Price - 3rd line right after "stock" ? e.g. 3.96, 152.85-154.96 ?
 
Last edited:

jinsatkilife

Senior Member
Joined
May 4, 2019
Messages
2,115
Reaction score
2,103
I give it a try based on what you have described. FOC of course. :D
Can I treat this "[4:30 AM]" as separator as well ?
ya, for this in particular, if program find it, can just ignore it

I'll usually scrub it manually before inputting it in but sometimes I just miss it

oo wow thanks Peter, looking forward to see how it's being implemented

edit: I have removed this [4.30am] away so that it don't get complicated :)
 
Last edited:

peterchan75

Supremacy Member
Joined
Apr 26, 2003
Messages
6,719
Reaction score
529
ya, for this in particular, if program find it, can just ignore it

I'll usually scrub it manually before inputting it in but sometimes I just miss it

oo wow thanks Peter, looking forward to see how it's being implemented

edit: I have removed this [4.30am] away so that it don't get complicated :)
Don't have to remove. When first character either " " or "[" means new record.
Please answer more questions above. :)
 

davidktw

Arch-Supremacy Member
Joined
Apr 15, 2010
Messages
13,547
Reaction score
1,301
Can you do regex in Excel? Yes, I learnt Excel VBA before.

I have included the sample input text below. I spent the week learning about regex but perl is taking up alot of time. I'm willing to get some help for $$.

As you can see, what I want is the ticker (UEC), quantity (2,000), and premium (if its a range, for eg, "for 0.30 and 0.35", then pick the lower bound i.e 0.3 only if its a range)

Optional good-to-have is to include the date, time, period (December), strike (5) and [calls or puts], and stock price

I need to export them to an excel or csv file for data manipulation. Also, stuck at how to transfer stuff found from regex to individual variables/string for transfer to csv. This is the part I'm still stuck at.

Perl:
$ cat main.pl
#!/usr/bin/env perl

$/ = undef;
$_ = <>;

print "datetime,ticker,quantity,lower,upper\n";
while (/((\w+)\s+(?:Bullish|Bearish)\s+(?:(?:Call|Put)\s+\w+)\s+([^\r\n]+)\s+([\d,]+)\s+([^\n\r]+))/sig) {
  ($all, $ticker, $datetime, $amt, $rest) = ($1,$2,$3,$4,$5);
  $datetime =~ s/,//g;
  $amt      =~ s/,//g;
  ($lower, $upper) = $rest =~ /for\s+([\d\.]+)\s+(?:and|to)\s+([\d\.]+)/;
  print "$datetime,$ticker,$amt,$lower,$upper\n";
  #print "$1\n\n"; # data
}

This wouldn't do exactly what you want. In a hurry for something else, will get back to it when I'm free.
I haven't got time to read through all the variants of formatting.
But you can tell me what is missing.

Code:
datetime,ticker,quantity,lower,upper
October 20 2021 12:40 pm,UEC,2000,0.30,0.35
October 20 2021 12:36 pm,MAR,6600,3.45,4.50
October 20 2021 12:32 pm,HLT,5000,4.50,4.95
October 20 2021 12:28 pm,GWH,3000,0.95,1.05
October 20 2021 12:08 pm,FTCH,12250,0.98,1.36
October 20 2021 1:34 pm,M,4000,,
October 20 2021 1:25 pm,WWD,3500,,
October 20 2021 1:20 pm,TJX,5000,,
October 20 2021 1:00 pm,WEN,2000,0.42,0.72
October 20 2021 12:59 pm,MTTR,5000,0.50,0.55
October 20 2021 12:57 pm,GE,2200,0.51,0.68
October 20 2021 12:47 pm,PLBY,3600,0.63,1.55
October 20 2021 12:46 pm,SPY,4800,,
October 20 2021 2:41 pm,CSCO,2270,,
October 20 2021 2:41 pm,BX,2500,0.55,1.05
October 20 2021 2:34 pm,MNST,2500,0.89,0.98
October 20 2021 2:22 pm,IMAX,10000,0.59,0.85
October 20 2021 2:17 pm,RIG,4200,0.17,0.19
October 20 2021 1:55 pm,SHPW,7500,0.35,1.50
October 20 2021 3:43 pm,OXY,4900,0.39,0.43
October 20 2021 3:43 pm,CAR,3000,,
October 20 2021 3:39 pm,FDX,7500,0.42,0.70
October 20 2021 3:31 pm,FTFT,3500,0.18,0.40
October 20 2021 3:21 pm,SANW,2600,0.65,1.35
October 20 2021 3:18 pm,ETWO,2500,,
October 20 2021 2:53 pm,DNMR,4000,0.25,1.20
October 20 2021 2:49 pm,LOW,12700,0.20,0.35
October 20 2021 2:45 pm,LEN,3000,,
October 20 2021 4:00 pm,DIA,10000,,
October 20 2021 3:59 pm,QRTEA,3000,0.43,0.45
October 20 2021 3:55 pm,ARKK,8000,,
October 20 2021 3:49 pm,QS,5000,0.50,0.60
 

davidktw

Arch-Supremacy Member
Joined
Apr 15, 2010
Messages
13,547
Reaction score
1,301
#2nd attempt

Realised there are cases where there are multiple transactions per entry.

Perl:
$ cat main.pl
#!/usr/bin/env perl

$/ = undef;
$_ = <>;

print "datetime,period,ticker,quantity,lower,upper,stock price\n";
while (/((\w+)\s+(?:Bullish|Bearish)\s+(?:(?:Call|Put)\s+\w+)\s+([^\r\n]+)\s+([^\n\r]+))/sig) {
  ($all, $ticker, $datetime, $rest) = ($1,$2,$3,$4);
  $datetime =~ s/,//g;
  $amt      =~ s/,//g;
  while ($rest =~ /([\d,]+)\s+(\w+)\s+(?:\d+).*?(?:calls|puts).*?for\s+([\d\.]+)\s+(?:and|to)\s+([\d\.]+)/sig) {
    ($quantity, $period, $lower, $upper) = ($1, $2, $3, $4);
    $period =~ s/(\d+)(\w+)/$1 $2/g;
    $quantity =~ s/,//g;
    ($stockprice) = $rest =~ /stock\s+([\d.-]+)/si;
    $stockprice =~ s/\.$//;
    print "$datetime,$period,$ticker,$quantity,$lower,$upper,$stockprice\n";
  }
}

Code:
$ ./main.pl < main.txt
datetime,period,ticker,quantity,lower,upper,stock price
October 20 2021 12:40 pm,December,UEC,2000,0.30,0.35,3.96
October 20 2021 12:36 pm,05 November,MAR,6600,3.45,4.50,152.85-154.96
October 20 2021 12:32 pm,November,HLT,5000,4.50,4.95,139.90-140.46
October 20 2021 12:28 pm,November,GWH,3000,0.95,1.05,15.84
October 20 2021 12:08 pm,November,FTCH,12250,0.98,1.36,39.04-39.90
October 20 2021 1:00 pm,November,WEN,2000,0.42,0.72,22.23-22.73
October 20 2021 12:59 pm,February,MTTR,5000,0.50,0.55,19.28
October 20 2021 12:57 pm,22 October,GE,2200,0.51,0.68,105.21-105.40
October 20 2021 12:47 pm,November,PLBY,3600,0.63,1.55,27.03-29.40
October 20 2021 2:41 pm,29 October,BX,2500,0.55,1.05,127.19-128.13
October 20 2021 2:34 pm,November,MNST,2500,0.89,0.98,85.36
October 20 2021 2:34 pm,November,MNST,2500,0.20,0.27,85.36
October 20 2021 2:22 pm,November,IMAX,10000,0.59,0.85,20.33-20.71
October 20 2021 2:17 pm,12 November,RIG,4200,0.17,0.19,4.05
October 20 2021 1:55 pm,November,SHPW,7500,0.35,1.50,7.69-9.85
October 20 2021 3:43 pm,29 October,OXY,4900,0.39,0.43,32.88-33.38
October 20 2021 3:39 pm,29 October,FDX,7500,0.42,0.70,230.97-231.40
October 20 2021 3:39 pm,29 October,FDX,10500,0.31,0.62,230.97-231.40
October 20 2021 3:31 pm,November,FTFT,3500,0.18,0.40,2.18-2.43
October 20 2021 3:21 pm,February,SANW,2600,0.65,1.35,2.83-3.75
October 20 2021 2:53 pm,November,DNMR,4000,0.25,1.20,13.72-16.13
October 20 2021 2:49 pm,22 October,LOW,12700,0.20,0.35,224.79-225.25
October 20 2021 3:59 pm,November,QRTEA,3000,0.43,0.45,10.46
October 20 2021 3:49 pm,November,QS,5000,0.50,0.60,24.97-25.15
 
Last edited:

jinsatkilife

Senior Member
Joined
May 4, 2019
Messages
2,115
Reaction score
2,103
I give it a try based on what you have described. FOC of course. :D
Excel VBA regex is pretty powerful especially for web scraping e.g. Yahoo Finance
I am only a stock guy. Option is foreign to me, like a blind person touching an elephant.

1. Can I treat this "[4:30 AM]" as separator as well ?
2. Output in Excel by column : Ticker, Qty, Premium, Date, Time, Period, Strike, Option, Price
Ticker - 1st line, first character string
Qty - 3rd line, first number ??
Premium - 3rd line, 0.30 and 0.35(pick 0.30), What about 3.45 to 4.50? Pick which number ?
Date - 2nd line ?
Time - 2nd line ?
Period - 3rd line 2nd character string ?
Strike - 3rd numeric after period ?
Option - 3rd line after striker ?
Price - 3rd line right after "stock" ? e.g. 3.96, 152.85-154.96 ?
Example:
FDX Bullish Call Buying
October 20, 2021 3:39 pm
7,500 29October 245 calls mostly bought for 0.42 to 0.70 above open interest of 245 contracts, 10,500 29October 247.50 calls mostly bought for 0.31 to 0.62 above open interest of 17. Stock 230.97-231.40.

Ticker - FDX
Qty - 7,500 (yes, 1st number)
Premium - 3rd line, 0.30 and 0.35(pick 0.30), What about 3.45 to 4.50? Pick which number ? (choose lower bound, "3.45")
Date - 2nd line ? (yes 2nd line, "October 20, 2021")
Time - 2nd line ? (yes 2nd line, "3:39 pm")
Period - 3rd line 2nd character string ? (yes, "29October") [Note that the period can be a month or weekly so it could be "29October" or just "October"]
Strike - 3rd numeric after period ? (yes, "245", its always after the month/period)
Option - 3rd line after striker ? (yes, word after the strike price on 3rd line, so "calls" in this case)
Price - 3rd line right after "stock" ? e.g. 3.96, 152.85-154.96 ? (Yes, the numeric after "stock", so "3.96";
If stock price has a range, then choose the lower bound so "152.85")

The structured syntax is as follows:
{Ticker} Bullish Call Buying
{Date} {Time}
{Quantity} {Period} {Strike} {Calls|Puts} mostly bought for {Premium|choose lower bound/1st num} above open interest of 245 contracts.10,500 29October 247.50 calls mostly bought for 0.31 to 0.62 above open interest of 17. Stock {Price|choose lower bound/1st num}.

For those words not in curly bracket i.e.{variable}, they are just words that can vary but are not important to note

Sometimes, there might be a repeated line so for eg "10,500 29October 247.50 calls mostly bought for 0.31 to 0.62 above open interest of 17. "

If you want, you could try and extract the data from this 2nd line but it's not necessary as I think it will complicate things since it will appear only for some. Usually I will take a deeper look after a first look analysis
 
Last edited:

peterchan75

Supremacy Member
Joined
Apr 26, 2003
Messages
6,719
Reaction score
529
Just to confirm. You and davidKTW have any deal going on? If there is then I don't want to be the spoiler.
BTW, I don't see any curly bracket in your text file. Something new ?
 

jinsatkilife

Senior Member
Joined
May 4, 2019
Messages
2,115
Reaction score
2,103
Just to confirm. You and davidKTW have any deal going on? If there is then I don't want to be the spoiler.
BTW, I don't see any curly bracket in your text file. Something new ?
no deal, i'm using activeperl with komodo on windows...he's using some unix OS i think. Heard that Perl is easier on Mac and unix...

anyway those curly brackets are syntax to see easily what are the variables that need to be printed in the csv

My tutorial video don't have this "$ cat main.pl" and it throw up error
"
Bareword found where operator expected at C:\Ch02\main.pl line 1, near "$ cat main"
(Missing operator before main?)
syntax error at C:\Ch02\main.pl line 1, near "$ cat main"
Execution of C:\Ch02\main.pl aborted due to compilation errors."

I have another main.txt which contain the "sample input text" aka raw data in same directory
 

davidktw

Arch-Supremacy Member
Joined
Apr 15, 2010
Messages
13,547
Reaction score
1,301
no deal, i'm using activeperl with komodo on windows...he's using some unix OS i think. Heard that Perl is easier on Mac and unix...

anyway those curly brackets are syntax to see easily what are the variables that need to be printed in the csv

My tutorial video don't have this "$ cat main.pl" and it throw up error
"
Bareword found where operator expected at C:\Ch02\main.pl line 1, near "$ cat main"
(Missing operator before main?)
syntax error at C:\Ch02\main.pl line 1, near "$ cat main"
Execution of C:\Ch02\main.pl aborted due to compilation errors."
cat is short for concatenate. It is a unix tool to print out contents of a file (or more). Do adapt to your windows environment :)
 

jinsatkilife

Senior Member
Joined
May 4, 2019
Messages
2,115
Reaction score
2,103
cat is short for concatenate. It is a unix tool to print out contents of a file (or more). Do adapt to your windows environment :)
thanks david, I think what you printed is what I needed. Will try and figure out the rest on my windows environment
 

peterchan75

Supremacy Member
Joined
Apr 26, 2003
Messages
6,719
Reaction score
529
I am also using Activeperl version 5.20 for all the heavy lifting. Here is the Excel VBA script.
Just create a new excel spreadsheet and cut&paste the script into Excel VBA editor. Save it to a folder and put the input text file in the same folder. Remember to change your input file name. If your input is more than 32000 row, change i from integer to long. Excel 2010 cannot exceed 1M rows. This is where perl comes in and do heavy lifting.

Screen shot of the output to Excel spreadsheet.
screenshot.jpg


Code:
Sub Read_text()

Dim input_file As String
Dim my_path As String
Dim fso As Object
Dim input_obj As Object
Dim data_str As String
Dim data_array As Variant, data_array2 As Variant
Dim i As Integer, j As Integer

input_file = "input_text.txt" 'Your input text file
my_path = ThisWorkbook.Path & "\"
ThisWorkbook.Sheets("Sheet1").Cells.Clear
ThisWorkbook.Sheets("Sheet1").Cells(1, 1) = "Ticker"
ThisWorkbook.Sheets("Sheet1").Cells(1, 2) = "Qty"
ThisWorkbook.Sheets("Sheet1").Cells(1, 3) = "Premium"
ThisWorkbook.Sheets("Sheet1").Cells(1, 4) = "Date"
ThisWorkbook.Sheets("Sheet1").Cells(1, 5) = "Time"
ThisWorkbook.Sheets("Sheet1").Cells(1, 6) = "Period"
ThisWorkbook.Sheets("Sheet1").Cells(1, 7) = "Strike"
ThisWorkbook.Sheets("Sheet1").Cells(1, 8) = "Option"
ThisWorkbook.Sheets("Sheet1").Cells(1, 9) = "Price"

'Open same file for reading
Set fso = CreateObject("Scripting.FileSystemObject")
Set input_obj = fso.opentextfile(my_path & input_file, 1)

i = 2
data_str = input_obj.readline
While Not input_obj.AtEndOfStream
    If data_str = "" Then 'Check for empty line
        data_str = input_obj.readline
    ElseIf data_str <> "" Then
        If Mid(data_str, 1, 1) = "[" Then 'Check for "["
            data_str = input_obj.readline
        End If
    End If
    ' 1st line - ticker
    data_array = Split(data_str, " ")
    ThisWorkbook.Sheets("Sheet1").Cells(i, 1) = data_array(0)
    
    ' 2nd line - Date and time
    data_str = input_obj.readline
    data_array = Split(data_str, " ")
    ThisWorkbook.Sheets("Sheet1").Cells(i, 4) = data_array(0) & " " & data_array(1) & " " & data_array(2)
    ThisWorkbook.Sheets("Sheet1").Cells(i, 5) = data_array(3) & " " & data_array(4)
    
    ' 3rd line - Qty, Premium
    data_str = input_obj.readline
    data_array = Split(data_str, " ")
    ' Qty - first data
    ThisWorkbook.Sheets("Sheet1").Cells(i, 2) = Replace(data_array(0), ",", "")
    ' Premium - numeric after "for"
    For j = LBound(data_array) To UBound(data_array)
        If data_array(j) = "for" Then
            If InStr(data_array(j + 1), "-") = 0 Then
                ThisWorkbook.Sheets("Sheet1").Cells(i, 3) = data_array(j + 1)
            Else
                data_array2 = Split(data_array(j + 1), "-")
                ThisWorkbook.Sheets("Sheet1").Cells(i, 3) = data_array2(0)
            End If
            Exit For
        End If
    Next j
    ' Period - 2nd data
    ThisWorkbook.Sheets("Sheet1").Cells(i, 6).NumberFormat = "@"
    ThisWorkbook.Sheets("Sheet1").Cells(i, 6) = ""
    For k = 1 To Len(data_array(1))
        If InStr("0123456789", Mid(data_array(1), k, 1)) = 0 Then
            ThisWorkbook.Sheets("Sheet1").Cells(i, 6) = ThisWorkbook.Sheets("Sheet1").Cells(i, 6) & Mid(data_array(1), k, 1)
        End If
    Next k
    ' Strike - 3rd data
    ThisWorkbook.Sheets("Sheet1").Cells(i, 7) = data_array(2)
    ' Option - 4th data (put/call)
    ThisWorkbook.Sheets("Sheet1").Cells(i, 8) = data_array(3)
    ' Price - numeric after "stock"
    For j = LBound(data_array) To UBound(data_array)
        If data_array(j) = "Stock" Then
            If InStr(data_array(j + 1), "-") = 0 Then
                ThisWorkbook.Sheets("Sheet1").Cells(i, 9) = Mid(data_array(j + 1), 1, Len(data_array(j + 1)) - 1)
            Else
                data_array2 = Split(data_array(j + 1), "-")
                ThisWorkbook.Sheets("Sheet1").Cells(i, 9) = data_array2(0)
            End If
            Exit For
        End If
    Next j
    ' Read next record
    If Not input_obj.AtEndOfStream Then
        data_str = input_obj.readline
    End If
    i = i + 1
Wend

input_obj.Close
Set input_obj = Nothing
Set fso = Nothing

End Sub
 

jinsatkilife

Senior Member
Joined
May 4, 2019
Messages
2,115
Reaction score
2,103
I am also using Activeperl version 5.20 for all the heavy lifting. Here is the Excel VBA script.
Just create a new excel spreadsheet and cut&paste the script into Excel VBA editor. Save it to a folder and put the input text file in the same folder. Remember to change your input file name. If your input is more than 32000 row, change i from integer to long. Excel 2010 cannot exceed 1M rows. This is where perl comes in and do heavy lifting.

Screen shot of the output to Excel spreadsheet.
screenshot.jpg


Code:
Sub Read_text()

Dim input_file As String
Dim my_path As String
Dim fso As Object
Dim input_obj As Object
Dim data_str As String
Dim data_array As Variant, data_array2 As Variant
Dim i As Integer, j As Integer

input_file = "input_text.txt" 'Your input text file
my_path = ThisWorkbook.Path & "\"
ThisWorkbook.Sheets("Sheet1").Cells.Clear
ThisWorkbook.Sheets("Sheet1").Cells(1, 1) = "Ticker"
ThisWorkbook.Sheets("Sheet1").Cells(1, 2) = "Qty"
ThisWorkbook.Sheets("Sheet1").Cells(1, 3) = "Premium"
ThisWorkbook.Sheets("Sheet1").Cells(1, 4) = "Date"
ThisWorkbook.Sheets("Sheet1").Cells(1, 5) = "Time"
ThisWorkbook.Sheets("Sheet1").Cells(1, 6) = "Period"
ThisWorkbook.Sheets("Sheet1").Cells(1, 7) = "Strike"
ThisWorkbook.Sheets("Sheet1").Cells(1, 8) = "Option"
ThisWorkbook.Sheets("Sheet1").Cells(1, 9) = "Price"

'Open same file for reading
Set fso = CreateObject("Scripting.FileSystemObject")
Set input_obj = fso.opentextfile(my_path & input_file, 1)

i = 2
data_str = input_obj.readline
While Not input_obj.AtEndOfStream
    If data_str = "" Then 'Check for empty line
        data_str = input_obj.readline
    ElseIf data_str <> "" Then
        If Mid(data_str, 1, 1) = "[" Then 'Check for "["
            data_str = input_obj.readline
        End If
    End If
    ' 1st line - ticker
    data_array = Split(data_str, " ")
    ThisWorkbook.Sheets("Sheet1").Cells(i, 1) = data_array(0)
  
    ' 2nd line - Date and time
    data_str = input_obj.readline
    data_array = Split(data_str, " ")
    ThisWorkbook.Sheets("Sheet1").Cells(i, 4) = data_array(0) & " " & data_array(1) & " " & data_array(2)
    ThisWorkbook.Sheets("Sheet1").Cells(i, 5) = data_array(3) & " " & data_array(4)
  
    ' 3rd line - Qty, Premium
    data_str = input_obj.readline
    data_array = Split(data_str, " ")
    ' Qty - first data
    ThisWorkbook.Sheets("Sheet1").Cells(i, 2) = Replace(data_array(0), ",", "")
    ' Premium - numeric after "for"
    For j = LBound(data_array) To UBound(data_array)
        If data_array(j) = "for" Then
            If InStr(data_array(j + 1), "-") = 0 Then
                ThisWorkbook.Sheets("Sheet1").Cells(i, 3) = data_array(j + 1)
            Else
                data_array2 = Split(data_array(j + 1), "-")
                ThisWorkbook.Sheets("Sheet1").Cells(i, 3) = data_array2(0)
            End If
            Exit For
        End If
    Next j
    ' Period - 2nd data
    ThisWorkbook.Sheets("Sheet1").Cells(i, 6).NumberFormat = "@"
    ThisWorkbook.Sheets("Sheet1").Cells(i, 6) = ""
    For k = 1 To Len(data_array(1))
        If InStr("0123456789", Mid(data_array(1), k, 1)) = 0 Then
            ThisWorkbook.Sheets("Sheet1").Cells(i, 6) = ThisWorkbook.Sheets("Sheet1").Cells(i, 6) & Mid(data_array(1), k, 1)
        End If
    Next k
    ' Strike - 3rd data
    ThisWorkbook.Sheets("Sheet1").Cells(i, 7) = data_array(2)
    ' Option - 4th data (put/call)
    ThisWorkbook.Sheets("Sheet1").Cells(i, 8) = data_array(3)
    ' Price - numeric after "stock"
    For j = LBound(data_array) To UBound(data_array)
        If data_array(j) = "Stock" Then
            If InStr(data_array(j + 1), "-") = 0 Then
                ThisWorkbook.Sheets("Sheet1").Cells(i, 9) = Mid(data_array(j + 1), 1, Len(data_array(j + 1)) - 1)
            Else
                data_array2 = Split(data_array(j + 1), "-")
                ThisWorkbook.Sheets("Sheet1").Cells(i, 9) = data_array2(0)
            End If
            Exit For
        End If
    Next j
    ' Read next record
    If Not input_obj.AtEndOfStream Then
        data_str = input_obj.readline
    End If
    i = i + 1
Wend

input_obj.Close
Set input_obj = Nothing
Set fso = Nothing

End Sub
Thanks Peter, it worked like magic!

ngl, I was looking through the code and I couldn't have done it in such a short amount of time myself as I don't usually deal with splitting data. It was an eye opener as to how you managed to implement it elegantly

The 32000 row input limit is not an issue at all; it's more than sufficient for my use case :)
 
Important Forum Advisory Note
This forum is moderated by volunteer moderators who will react only to members' feedback on posts. Moderators are not employees or representatives of HWZ Forums. Forum members and moderators are responsible for their own posts. Please refer to our Community Guidelines and Standards and Terms and Conditions for more information.
Top