How do i find the sequence of digit in a long string which is numbers only

Hi. I have this problem. How can i write code that would search for a sequence of digits in a string that is numbers only. For instance for this string 123124567 i want to search for n-digit sequence and if it repeats one or more time show it.

I'll show it on example of what i want to achive For 123124567 Show all 2-digit sequences (that are in a row) so output should be 12 23 31 12 24 45 56 67 And from it show those that repeat one or more time in this case 12

Im r begginger but i would like to teach myself to to this as i am working on bigger project where i need to show every number sequences in very long string. I tried to use the Regular Expressions for it but then i got the results like: 12 31 24 56 and it wasnt checking every possible two digits sequence like 12 23 31 12 24 45 56 67

var textfield;
var output;
var submit;

function setup() {
  noCanvas();
  textfield = select("#input");
  output = select('#output');
  submit = select("#submit");
  submit.mousePressed(newText);
}

function newText() {
  var s = textfield.value();

  var r = /\d\d/g; //here i would need to adjust it so to given digit length
  var matches = s.match(r);

  for (var i = 0; i < matches.length; i++) {
    createP(matches[i]);
   //at this point the results are wrong not showing all possible sequences
  }

  }

Answers

  • You can do this easily if you use a histogram concept. In a histogram, your x axis is a set of bins. For example, let's assume you are looking at the 2-digit values. 2 digits gives you number from 00 to 99. Now if you run your data, the string will be analyzed every 2 digits and based on that value, it will increase the bin content by one. Let me illustrate the concept:

    Data:
    12 13 15 18 12 11 21

    Bins and its contents:
    11:1
    12:2
    13:1
    14:0
    15:1
    16:0
    17:0
    18:1
    19:0
    20:0
    21:1

    Working with higher number of digits will require to create larger number of bins. I guess there is a n-digit sequence that could put stress in your system. Instead of creating every possible digit combination, you could just keep track of the digit combinations that you are finding during the process. You keep track on a list for example and keep the found n-digit combinations ordered. This concept will work only if you are working with sparse data. If you don't have sparse data, this concept will do as bad as defining all the bins in the n-digit sequence.

    How big is you data set and how many digits are you aiming to count for?

    One more detail in my example above. The data I provided, I have 12 11 21. How many 12's do you see there? If we break the sequence every two digits in a 2-digit analysis, the answer would be one 12. However we can see two in that set. Would that be still part of the challenge?

    Kf

  • Ye i think the problem will lay here, cause my string is 11120 digits long. http://pastebin.com/Da7b6kCR My goal on this string is to find as many digit sequences as possible. Starting from 2-digit ending up to like 155-digit or more, cause if u ctrl+f "27585765197278894388721512889521961800311457278572611857642197096805796366125275705845217652197278304648765159564611414519889975112161518005953724348562510" u will see it repeats. It is really hard to do it manually at i wanted to write program that would do it for me, thats why i started to learn programming.

    What i realized now is that i could have problem with specifing the "n-digit", cause 155 is already huge number and probably there are higher digit numbers than that, so also im not sure yet how i want to present the output, cause what i am aiming for is to kinda split the string into the most probably output. What i mean is that this string is a cipher in a game, which im trying to decipher, but first step would be to recognise the same digit patterns and split(color) the string so u can see the similarities in it. So im not sure how i want to "bite" this :P

  • I don't know any algorithms myself to do this task. I will let other forum goers add their comments. The challenge with 155 digits is that you cannot represent it with numbers anymore but you will need to work with strings. However, I can see there is some redundancy in the task at hand. For example, if you are looking for repeat sequences of 155 digits long, then it must be true that you can find those sequences with 8 digits long. The solution of all 8-digits numbers (for example) will contain all n-digit repeat sequences for n larger than 8. Your task of looking repeat sequences in the n-digit domain (for higher n values) will be easily identified within the identified set from low digit numbers.

    In Linux, you could probably write a script that uses the grep function to search for repeat sequences. That could be an alternative approach.

    Kf

  • edited December 2016

    I decided to solve it this way. I'll search for all possible n-digits and then i want to compare them to themselves to show the duplicate ones and output it.

    function setup(){
    
    var str1 = "12341256" //lets work on small one 
    var n = 2; //later ill add slider or smth but from now im searching for 2digitsnumbers
    var a = 0;
    var b = n;
    while (a <= str1.length & b < str1.length+1) {
      res = str1.substring(b, a);
      a = a + 1
      b = b + 1
      createP(res);
    }
    }
    

    The output i get is 12 23 34 41 12 25 56 So my target is 12. But how do i now search for any duplicated(repeated) 2digitnumbers?

  • Looks like a fun (possibly homework) problem :P

    Seeing as you've posted your attempt here's a simple approach that works for searches for 2 characters:

    var str = "12112341256";
    var needle = "";
    var results = {};
    
    for(var i=0; i<str.length-1; i++) {
      needle = str.substr(i, 2);
    
      if (!results[needle]) {
        var re = new RegExp(needle,"g");
        var found = str.match(re);
        results[needle] = found.length;
    
      }
    
    }
    console.log(results);
    

    Note that there's no check to ensure needle is a specified length - which may be a problem when you reach the end of the string. If you change the length of the needle you'll need to adjust the limit of the loop appropriately. Also note how results for each needle are stored - by using the needle as the key of an object - to ensure a search isn't repeated ;)

  • Hmm well i managed to do this this way, and its working

    var n = 2; //currently searching for all 2-digit numbers but in future it should be adjustable to n-digits
    var a = 0;
    var b = n;
    var arr =[]; 
    
    function newText(input) {
      var str1 = textfield.value();
      while (a <= str1.length & b < str1.length+1) { 
        res = str1.substring(b, a);
        arr[a]=res;
        push();
        a = a + 1
        b = b + 1   
        //results are stored in array
      }
       //here i written function that compares them to find duplicates
        var potnij = arr.slice().sort(); 
        var takisam = [];
    
        for (var i = 0; i < arr.length - 1; i++) {
            if (potnij[i + 1] == potnij[i]) {
                takisam.push(potnij[i]);
                //dziel = splitTokens(str1, takisam);
            }
    
        }
        console.log(takisam);
    
    }
    

    I made this problem not cause of homework (i study civil engineering) but i wanted to approach cipher problem. In game, there are plenty of books written in cipher which is numbers only. http://pastebin.com/Da7b6kCR < this string is all books together it is 11120digit long!

    What i realized on this specific cipher is that there are alot of digit sequences that repeat many times. From my code i was able to quickly find the longest digit sequence(i think) that repeats and the longest one is 304 digit long! :D

    I assume that i would be somehow in future able to decipher the digits to letters and then to whole words and sentences. But first step would be finding those digits that repeat, thats why i had this idea to write this code. Those that repeat must be the words or whole sentences.

    But if u think about it, i can also determine the words or sentences that haven't repeated. How? Simply. Here's how: 111156781111 U can see that if i find the 1111, the digits(or rather the word) that has been used somewhere else(its duplicated) after i slice string to this 1111 5678 1111 i can assume that 5678is another word, a sequence of digits, that wasn't repeated but yet, is a word.

    Method will work, but currently i dont know how i want to present the data. From now i can only search for n-digit sequences that repeat, and only when i give "n". I need to automatize so it will show all repeats, starting from 2-digits, and ending up on 304-digit numbers.

    Problem will lay here cause i dont know yet how to determine the position of splits. splitTokens() gives me array of digits that has been splited, but i dont know at what position they was split.

    I want to be able to put " "(space) at the input string, to show where the duplicates were. :P

Sign In or Register to comment.